Frequently Asked Questions
Last updated
Was this helpful?
Last updated
Was this helpful?
Use a for more control over which files
set_dir()
includes.
Quilt packages are one level of abstraction above S3 object versions. Object versions track mutations to a single file, whereas a quilt package references acollection files and assigns this collection a unique version.
It is strongly recommended that you enable object versioning on the S3 buckets that you push Quilt packages to. Object versioning ensures that mutations to every object are tracked, and provides some protection against deletion.
Visit and use on PyPI.
quilt3
collect anonymous usage statistics?Yes, to find bugs and prioritize features.
You can disable anonymous usage collection with an environment variable:
Or call quilt3.disable_telemetry()
to persistently disable anonymous usage statistics.
Yes:
Go to CloudFormation > Stacks > YourQuiltStack > Outputs
Copy the row labeled TemplateBuildMetadata
"git_revision" is your template version
This information is also available in the footer of the main page of the Catalog.
push
takes a long time. Can I speed it up?Yes. Follow these steps:
Run your compute in the same region as your S3 bucket (as opposed to a local machine or foreign region)—I/O is much faster.
Use a larger instance with more vCPUs.
Increase
If you are using Quilt Catalog 1.51 (released Feb 2024), you can enable theChunkedChecksums
CloudFormation parameter so it will calculate the
checksums in parallel, or reuse them if already existing in S3. Parallel
checksums are also available by default in quilt3
v6 or later (pre-released
Feb 2024).
The Command Line Interface (CLI) API
You can script the Quilt CLI directly from your shell environment and chain it with your R scripts to create a unified workflow:
You may have a test data package that you wish to delete at some point to ensure
your data repository is clean and organized. Please do this very carefully!
In favor of immutability, Quilt makes deletion a
bit tricky. First, note that quilt3.Package.delete
only deletes thepackage manifest, not the underlying objects. If you wish to delete
the entire package and its objects, delete the objects first.
Warning: the objects you delete will be lost forever. Ditto for the package revision.
To delete, first browse the package then walk it, deleting its entry objects as follows:
You can then follow the above with q3.delete_package(pname, registry=reg, top_hash=p.top_hash)
.
Be sure to run
quilt3 logout
if you've previously logged in.
Select among multiple profiles in your shell as follows:
The S3 permissions needed by quilt3
are similar to
This allows for extremely granular querying of your data package name, metadata, and contents and includes logical operators, comparison functions, conditional expressions, mathematical functions, bitwise functions, date and time functions and operators, regular expression functions, and aggregate functions. Please review the references linked below to learn more.
regexp_extract_all(string, pattern)
Return the substring(s) matched by the regular expression pattern
in string
There are [many considerations and
Yes. Quilt is built on top of Amazon S3, and has the same character limitations. Although any UTF-8 character is supported in an object key name (filename), using certain characters can result in problems with some applications and protocols. The following guideline will help you maximize compliance. For a comprehensive list of safe characters, characters that might require special handling, and characters to avoid, please review the official Amazon S3 documentation linked below.
Alphanumeric characters:
0-9
a-z
A-Z
Special characters:
Exclamation point (!
)
Hyphen (-
)
Underscore (_
)
Period (.
)
Asterisk (*
)
Single quote ('
)
Open parenthesis ((
)
Close parenthesis ()
)
Optional additional features (such as automated data packaging) require additional IPs.
Amazon S3 is a key-value store with prefixes but no true "folders". In the Quilt Catalog Bucket view, as in AWS Console, only objects have a "Last modified" value, whereas package entries and prefixes do not.
above its default to match your available vCPUs.
In the scientific computing community, the is commonly used as an alternative, or companion, to Python. It is a language and environment for statistical computing and graphics, and is available as Free Software under the .
Currently there are no plans to release a Quilt package for distribution through the . However, you can still use Quilt with R, using either:
The package provides a set of tools for interoperability between Python and R by embedding a Python session within your R session.
Configure
and quilt3
will use the same for its API calls.
but quilt3
does not need either s3:GetBucketNotification
or s3:PutBucketNotification
.
Amazon Athena supports a subset of Data Defintion Language (DDL) and Data Manipulation Language (DML) statements, functions, operators, and data types, based on and .
limitations]() when writing Amazon Athena queries.
For more details, see in the Amazon S3 documentation.
Currently, a full size, multi-Availability Zone deployment (without)
requires at least 256 IPs. This means a minimum CIDR block of /24
.