LogoLogo
HomeGitHub RepoBook Demo
dev
dev
  • About Quilt
  • Architecture
  • Mental Model
  • Metadata Management
  • Metadata Workflows
  • Quilt Platform (Catalog) User
    • About the Catalog
    • Bucket Browsing
    • Document Previews
    • Embeddable iFrames
    • Packaging Engine
    • Query
    • Quilt+ URIs
    • Qurator Omni
    • Search
    • Visualization & Dashboards
    • Advanced
      • Athena
      • Elasticsearch
      • Removing Stacks
  • Quilt Platform Administrator
    • Admin Settings UI
    • Catalog Configuration
    • Cross-Account Access
    • Enterprise Installs
    • quilt3.admin Python API
    • Advanced
      • Package Events
      • Private Endpoints
      • Restrict Access by Bucket Prefix
      • S3 Events via EventBridge
      • SSO Permissions Mapping
      • Tabulator
      • Troubleshooting
        • SSO Redirect Loop
    • Best Practices
      • GxP for Security & Compliance
      • Organizing S3 Buckets
  • Quilt Python SDK
    • Installation
    • Quick Start
    • Editing a Package
    • Uploading a Package
    • Installing a Package
    • Getting Data from a Package
    • Example: Git-like Operations
    • API Reference
      • quilt3
      • quilt3.Package
      • quilt3.Bucket
      • quilt3.hooks
      • Local Catalog
      • CLI, Environment
      • Known Limitations
      • Custom SSL Certificates
    • Advanced
      • Browsing Buckets
      • Filtering a Package
      • .quiltignore
      • Manipulating Manifests
      • Materialization
      • S3 Select
    • More
      • Changelog
      • Contributing
      • Frequently Asked Questions
      • Troubleshooting
  • Quilt Ecosystem Integrations
    • Benchling Packager
    • Event-Driven Packaging
    • Nextflow Plugin
Powered by GitBook
On this page
  • Git-like operations for datasets and Jupyter notebooks
  • Why not use Git?
  • Pre-requisites
  • Install a package
  • Creating your first package
  • List the packages in a bucket
  • Learn more

Was this helpful?

  1. Quilt Python SDK

Example: Git-like Operations

PreviousGetting Data from a PackageNextAPI Reference

Last updated 2 years ago

Was this helpful?

Git-like operations for datasets and Jupyter notebooks

quilt3 provides a simple command-line for versioning large datasets and storing them in Amazon S3. There are only two commands you need to know:

  • push creates a new package revision in an S3 bucket that you designate

  • install downloads data from a remote package to disk

Why not use Git?

In short, neither Git nor Git LFS have the capacity or performance to function as a repository for data. S3, on the other hand, is widely used, fast, supports versioning, and currently stores some trillions of data objects.

Similar concerns apply when baking datasets into Docker containers: images bloat and slow container operations down.

Pre-requisites

You will need either an AWS account, credentials, and an S3 bucket, OR a Quilt enterprise stack with at least one bucket. In order to read from and write to S3 with quilt3, you must first do one of the following:

  • OR, if and only if your company runs a Quilt enterprise stack, run the following:

    pip install quilt3
    quilt3 config https://yourquilt.yourcompany.com
    quilt3 login

Install a package

A Quilt package contains any collection of data (usually as files), metadata, and documentation that you specify.

Let's get a data package from S3 and write it quilt-hurdat/data .

mkdir reef-check
cd reef-check
quilt3 install \
    "akarve/reef-check" \
    --registry s3://quilt-example \
    --dest .

Now you've got data in the current working directory.

ls
CA-06-california-counties.json	quilt_summarize.json  urchins-interactive.json
README.md			reef-check.ipynb      urchins2006-2019.parquet

Creating your first package

Now let's imagine that we've modified this data locally. We save our Jupyter notebook and push the results back to Quilt:

# Be sure to substitute YOUR_NAME and YOUR_BUCKET with the desired strings
quilt3 push \
    YOUR_NAME/reef-check \
    --dir . \
    --registry s3://YOUR_BUCKET \
    --message "Initial commit of reef data"

Quilt will then print out something like the following:

Package YOUR_NAME/reef-check@ea334b7 pushed to s3://YOUR_BUCKET
Successfully pushed the new package to https://yourquilt.yourocmpany.com/b/YOUR_NAME/packages/akarve/reef-check

List the packages in a bucket

quilt3 list-packages s3://YOUR_BUCKET

In the Quilt catalog, you will now see a new package revision, complete with a README, datagrid preview, and an interactive visualization in Altair.

Learn more

You can see an example of this package live .

Those are the basics of reading and writing Quilt packages with the CLI. See the for more.

Configure your AWS credentials
here
CLI reference