LogoLogo
HomeGitHub RepoBook Demo
dev
dev
  • About Quilt
  • Architecture
  • Mental Model
  • Metadata Management
  • Metadata Workflows
  • Quilt Platform (Catalog) User
    • About the Catalog
    • Bucket Browsing
    • Document Previews
    • Embeddable iFrames
    • Packaging Engine
    • Query
    • Quilt+ URIs
    • Qurator Omni
    • Search
    • Visualization & Dashboards
    • Advanced
      • Athena
      • Elasticsearch
      • Removing Stacks
  • Quilt Platform Administrator
    • Admin Settings UI
    • Catalog Configuration
    • Cross-Account Access
    • Enterprise Installs
    • quilt3.admin Python API
    • Advanced
      • Package Events
      • Private Endpoints
      • Restrict Access by Bucket Prefix
      • S3 Events via EventBridge
      • SSO Permissions Mapping
      • Tabulator
      • Troubleshooting
        • SSO Redirect Loop
    • Best Practices
      • GxP for Security & Compliance
      • Organizing S3 Buckets
  • Quilt Python SDK
    • Installation
    • Quick Start
    • Editing a Package
    • Uploading a Package
    • Installing a Package
    • Getting Data from a Package
    • Example: Git-like Operations
    • API Reference
      • quilt3
      • quilt3.Package
      • quilt3.Bucket
      • quilt3.hooks
      • Local Catalog
      • CLI, Environment
      • Known Limitations
      • Custom SSL Certificates
    • Advanced
      • Browsing Buckets
      • Filtering a Package
      • .quiltignore
      • Manipulating Manifests
      • Materialization
      • S3 Select
    • More
      • Changelog
      • Contributing
      • Frequently Asked Questions
      • Troubleshooting
  • Quilt Ecosystem Integrations
    • Benchling Packager
    • Event-Driven Packaging
    • Nextflow Plugin
Powered by GitBook
On this page
  • Initializing a package
  • Adding data to a package
  • Deleting data in a package
  • Adding metadata to a package

Was this helpful?

  1. Quilt Python SDK

Editing a Package

PreviousQuick StartNextUploading a Package

Last updated 2 years ago

Was this helpful?

Data in Quilt is organized in terms of data packages. A data package is a logical group of files, directories, and metadata.

Initializing a package

To edit a new empty package, use the package constructor:

import quilt3
p = quilt3.Package()

To edit a preexisting package, we need to first make sure to install the package:

import quilt3
quilt3.Package.install(
    "examples/hurdat",
    "s3://quilt-example",
)
Loading manifest: 100%|██████████| 5/5 [00:00<00:00, 5902.48entries/s]

Successfully installed package 'examples/hurdat', tophash=f8d1478 from s3://quilt-example

Use browse to edit the package:

p = quilt3.Package.browse('examples/hurdat')
Loading manifest: 100%|██████████| 5/5 [00:00<00:00, 9920.30entries/s]

For more information on accessing existing packages see the section "".

Adding data to a package

Use the set and set_dir commands to add individual files and whole directories, respectively, to a Package:

# Create test directories
import quilt3
from pathlib import Path
from os import chdir
TEST_DIR = "test_workflow"
SUB_DIR = "subdir"

# create test directories
Path(TEST_DIR).mkdir(exist_ok=True)
Path(TEST_DIR, SUB_DIR).mkdir(exist_ok=True)
chdir(TEST_DIR) # %cd TEST_DIR/ if in Jupyter

# add entries individually using `set`
# ie p.set("foo.csv", "/local/path/foo.csv"),
# p.set("bar.csv", "s3://bucket/path/bar.csv")

# create test data
with open("data.csv", "w") as f:
    f.write("id, value\na, 42")

p = quilt3.Package()
p.set("data.csv", "data.csv")
p.set("banner.png", "s3://quilt-example/imgs/banner.png")

# or grab everything in a directory at once using `set_dir`
# ie p.set_dir("stuff/", "/path/to/stuff/"),
# p.set_dir("things/", "s3://path/to/things/")

# create logical directory in package
p.set_dir("stuff/", SUB_DIR)
p.set_dir("imgs/", "s3://quilt-example/imgs/")
(remote Package)
 └─banner.png
 └─data.csv
 └─imgs/
   └─banner.png
 └─stuff/

The first parameter to these functions is the logical key, which will determine where the file lives within the package. So after running the commands above our package will look like this:

p
(remote Package)
 └─banner.png
 └─data.csv
 └─imgs/
   └─banner.png
 └─stuff/

The second parameter is the physical key, which states the file's actual location. The physical key may point to either a local file or a remote object (with an s3:// path).

If the physical key and the logical key are the same, you may omit the second argument:

import quilt3
p = quilt3.Package()
p.set("data.csv")
(local Package)
 └─data.csv

Another useful trick. Use "." to set the contents of the package to that of the current directory:

# create a test file in test directory
with open("new_data.csv", "w") as f:
    f.write("id, value\na, 42")

# set the contents of the package to that of the current directory
p.set_dir(".", ".")
(local Package)
 └─data.csv
 └─new_data.csv

Deleting data in a package

Use delete to remove entries from a package:

p.delete("data.csv")
(local Package)
 └─new_data.csv

Note that this will only remove this piece of data from the package. It will not delete the actual data itself.

Adding metadata to a package

Packages support metadata anywhere in the package. To set metadata on package entries or directories, use the meta argument:

import quilt3
p = quilt3.Package()
p.set("data.csv", "new_data.csv", meta={"type": "csv"})
p.set_dir("subdir/", "subdir/", meta={"origin": "unknown"})
(local Package)
 └─data.csv
 └─subdir/

You can also set metadata on the package as a whole using set_meta.

# set metadata on a package
p.set_meta({"package-type": "demo"})
(local Package)
 └─data.csv
 └─subdir/
Installing a Package