LogoLogo
HomeGitHub RepoBook Demo
dev
dev
  • About Quilt
  • Architecture
  • Mental Model
  • Metadata Management
  • Metadata Workflows
  • Quilt Platform (Catalog) User
    • About the Catalog
    • Bucket Browsing
    • Document Previews
    • Embeddable iFrames
    • Packaging Engine
    • Query
    • Quilt+ URIs
    • Qurator Omni
    • Search
    • Visualization & Dashboards
    • Advanced
      • Athena
      • Elasticsearch
      • Removing Stacks
  • Quilt Platform Administrator
    • Admin Settings UI
    • Catalog Configuration
    • Cross-Account Access
    • Enterprise Installs
    • quilt3.admin Python API
    • Advanced
      • Package Events
      • Private Endpoints
      • Restrict Access by Bucket Prefix
      • S3 Events via EventBridge
      • SSO Permissions Mapping
      • Tabulator
      • Troubleshooting
        • SSO Redirect Loop
    • Best Practices
      • GxP for Security & Compliance
      • Organizing S3 Buckets
  • Quilt Python SDK
    • Installation
    • Quick Start
    • Editing a Package
    • Uploading a Package
    • Installing a Package
    • Getting Data from a Package
    • Example: Git-like Operations
    • API Reference
      • quilt3
      • quilt3.Package
      • quilt3.Bucket
      • quilt3.hooks
      • Local Catalog
      • CLI, Environment
      • Known Limitations
      • Custom SSL Certificates
    • Advanced
      • Browsing Buckets
      • Filtering a Package
      • .quiltignore
      • Manipulating Manifests
      • Materialization
      • S3 Select
    • More
      • Changelog
      • Contributing
      • Frequently Asked Questions
      • Troubleshooting
  • Quilt Ecosystem Integrations
    • Benchling Packager
    • Event-Driven Packaging
    • Nextflow Plugin
Powered by GitBook
On this page
  • Slicing through a package
  • Downloading package data to disk
  • Downloading package data into memory
  • Getting entry locations
  • Getting metadata

Was this helpful?

  1. Quilt Python SDK

Getting Data from a Package

PreviousInstalling a PackageNextExample: Git-like Operations

Last updated 2 years ago

Was this helpful?

The examples in this section use the aleksey/hurdat :

import quilt3
p = quilt3.Package.browse('aleksey/hurdat', 's3://quilt-example')
p
Loading manifest: 100%|██████████| 7/7 [00:00<00:00, 8393.40entries/s]





(remote Package)
 └─.gitignore
 └─.quiltignore
 └─notebooks/
   └─QuickStart.ipynb
 └─quilt_summarize.json
 └─requirements.txt
 └─scripts/
   └─build.py

Slicing through a package

Use dict key selection to slice into a package tree:

# returns PackageEntry("requirements.txt")
p["requirements.txt"]
PackageEntry('s3://quilt-example/aleksey/hurdat/requirements.txt?versionId=bQtxuZlaylNVHi0GmxkSMofT5qXJvP95')
<!--pytest-codeblocks:cont-->
# returns (remote Package)
p["notebooks"]
(remote Package)
 └─QuickStart.ipynb

Slicing into a Package directory returns another Package rooted at that subdirectory. Slicing into a package entry returns an individual PackageEntry.

Downloading package data to disk

To download a subset of files from a package directory to a dest, use fetch:

# download a subfolder
p["notebooks"].fetch()

# download a single file
p["notebooks"]["QuickStart.ipynb"].fetch()

# download everything
p.fetch()
Copying objects: 100%|██████████| 36.7k/36.7k [00:01<00:00, 22.7kB/s]
100%|██████████| 36.7k/36.7k [00:01<00:00, 24.1kB/s]
Copying objects: 100%|██████████| 39.9k/39.9k [00:02<00:00, 16.5kB/s]





(local Package)
 └─.gitignore
 └─.quiltignore
 └─notebooks/
   └─QuickStart.ipynb
 └─quilt_summarize.json
 └─requirements.txt
 └─scripts/
   └─build.py

fetch will default to downloading the files to the current directory, but you can also specify an alternative path:

p["notebooks"]["QuickStart.ipynb"].fetch("./references/")
100%|██████████| 36.7k/36.7k [00:01<00:00, 22.5kB/s]





PackageEntry('file:///Users/gregezema/Documents/programs/quilt/docs/Walkthrough/references/')

Downloading package data into memory

Alternatively, you can download data directly into memory:

p["quilt_summarize.json"].deserialize()
['notebooks/QuickStart.ipynb']

To apply a custom deserializer to your data, pass the function as a parameter to the function. For example, to load a hypothetical yaml file using yaml.safe_load:

import yaml
# returns a dict
p["quilt_summarize.json"].deserialize(yaml.safe_load)
['notebooks/QuickStart.ipynb']

The deserializer should accept a byte stream as input.

Getting entry locations

You can get the path to a package entry or directory using get:

# returns /path/to/pkg/root/notebooks/QuickStart.ipynb
p["notebooks"]["QuickStart.ipynb"].get()
's3://quilt-example/aleksey/hurdat/notebooks/QuickStart.ipynb?versionId=PH.9gsCH6LM9RQIqsy1U4X6H6s.VoQ_B'

Getting metadata

Metadata is available using the meta property.

# get entry metadata
p["notebooks"]["QuickStart.ipynb"].meta

# get directory metadata
p["notebooks"].meta

# get package metadata
p.meta
demo package