FAQ
Last updated
Was this helpful?
Last updated
Was this helpful?
As of version 2.9.9, .
There no limit on public data, as shown in our .
In practice, users have been successful with packages at 2TB of total size with up to 40,000 individual files.
We are constantly upping the amount of data users can put into Quilt. Contact us if you have questions about large data.
Local builds (without quilt push
) - on your machine
qultdata.com user - in S3; if your package is private only you can read the data
Teams user - in a dedicated S3 bucket
Running your own registry - up to you :-)
You can use .
For developers only:
Installation: pip install quilt --user
, the User icon (upper right) > About > Restart server
Prefer Python 3 kernels
quilt login
doesn't present a textbox for my login code in JupyterTry a Python 3 kernel
ImportError
on import of data packageEnsure that that the package has been installed via quilt install
.
ImportError
when accessing package contentspyarrow
module, used by quilt
, may fail to import because of missing DLLs:
!pip install quilt
in Jupyter doesn't workInstalling packages inside of Python kernels can be flaky. The reason? Jupyter's Python kernels are disconnected from virtual environments.
To work around this issue, Install Quilt from Jupyter Terminal, or from your operating system terminal.
quilt
not foundWhen working with virtual environments like conda create
, jupyter
can be installed in the root
environment. If you then install and run quilt
in another environment, foo
, Jupyter will not be able to find quilt.
Install quilt
in the root
environment, or install Jupyter in foo
(run which jupyter
in Jupyter's Terminal to ensure that you're using the environment local Jupyter).
Alternatively, pip install quilt
from Jupyter's Terminal.
index_col
This keyword argument should be temporarily avoided in build.yml
as it causes pyarrow
to hiccup on serialization.
Quilt 2.8 changes where data packages are stored on your local machine.
As a result, Quilt will no longer look for packages in quilt_packages directories. You will need to reinstall any previously installed packages. Locally built packages can be rebuilt. Or, to migrate existing packages to the new store without rebuilding, first revert to an earlier version of Quilt, then push your packages to the Quilt registry.
Once your packages are stored at the registry, you can upgrade to quilt 2.8.0 (or later) and re-install them.
ArrowNotImplementedError
when saving a large dataframeThere does not appear to be a way to save a dataframe with a string column whose size is over 2GB. It is possible, however, to split it up into multiple dataframes (which will then get merged into one when accessed).
Suppose the problematic dataframe is called big_data
, it comes from big_data.csv
, and the root of your package is in my_dir
.
First, delete the dataframe from the build file, my_dir/build.yml
. (If you were building directly from a directory, then run quilt generate my_dir
first.)
Build a temporary package that contains the rest of the data:
Open a Python shell or write a script, and manually build the final package:
Symbolic links on Windows have a few quirks to be aware of.
Ensure Windows is fully updated (known related bugs exist)
Escalate administrator privileges ("run as admin"), or validate user privileges
If UAC is on
If user is not an administrator, they must have the Create Symbolic Links
privilege
If user is an administrator, they must escalate privileges, even if they have the Create Symbolic Links
privilege
This means if you want a user to create symlinks without requiring escalation, they may not be an administrator.
If UAC is off
Any user with the Create Symbolic Links
privilege may do so
Folder-level privileges may interfere with symlinking
Verify there are no folder-specific restrictions on privileges
Symlink type may be disabled, as it is by default for remote->remote symlinks
Use fsutil
(from an elevated command prompt) to evaluate and/or enable acceptable symlink types
fsutil behavior query SymlinkEvaluation
will display the current state of symlink evaluation
Use fsutil behavior set SymlinkEvaluation R2R:1
to enable (for example) remote-to-remote symlinks
Segmentation fault (core dumped)
Seen on Ubuntu 18.04, Google Cloud Platform.
sudo pip install quilt # ¯\_(ツ)_/¯
TypeError: data type "mixed-integer" not understood
when reading a DataFrame from a packageThis error occurs when trying to round-trip Pandas DataFrames that have a column name that is a number to Parquet using Arrow 0.9. This can occur during quilt build
for package nodes built using the Pandas "skiprows" parameter in read_csv or read_excel to skip a source file's header row (usually row 0). For example, this build.yml file, skips the header row in source.xlsx:
quilt.login
You can copy your login session to a remote machine. Your session is stored in a file called auth.json
in your local settings directory. If you copy it to the proper location on your remote machine, it will be as if you had logged in from that machine.
You can find your local settings directory on any machine that has quilt
installed by running this Python snippet:
Please note that this secret token confers full access to your Quilt account, so proceed with caution and keep your secrets safe.
Tokens do expire, so you may have to load a new token on your remote machines every 90 days or so. If you're having authentication issues on your remote machines, please try performing a fresh login on your local machine, and uploading your new token to your remote machines.
See also for further install options.
Make sure you have installed .
For details, including how to make !pip install quilt
work, see .
Unfortunately, this is caused by a .
See for relevant instructions
fsutil
: For advanced users only. See the [Microsoft documentation on fsutil]())