LogoLogo
HomeGitHub RepoBook Demo
quilt-2-master
quilt-2-master
  • Introduction
  • Get started
    • Introduction
    • Install
    • Quick start
    • Work with packages
    • Articles
    • Examples
    • Run a Registry
  • API
    • Overview
    • Command line
    • Python
    • build.yml
    • quilt.yml requirements
    • Teams
    • R
  • Advanced Usage
    • Environment variables
    • Shared drives
    • Checks
    • Package composition
    • PySpark
  • Support
    • FAQ
    • Contact
  • Developer
    • Contributing
    • Fragment store
Powered by GitBook
On this page
  • Core: build, push, and install packages
  • quilt.build("USER/PACKAGE", "PATH")
  • quilt.push("USER/PACKAGE", is_public=False, is_team=False)
  • quilt.install(PKG [, hash="HASH", tag="TAG", version="VERSION"] [, force=False] [, meta_only=False])
  • quilt.delete("USER/PACKAGE")
  • Versioning
  • quilt.log(USER/PACKAGE)
  • quilt.version_list(USER/PACKAGE)
  • quilt.version_add(USER/PACKAGE, VERSION, HASH)
  • quilt.tag_list(USER/PACKAGE)
  • quilt.tag_add(USER/PACKAGE, TAG, HASH)
  • quilt.tag_remove(USER/PACKAGE, TAG)
  • Access
  • quilt.login([TEAM])
  • quilt.access_list("USER/PACKAGE")
  • quilt.access_add("USER/PACKAGE", "USER_OR_GROUP")
  • quilt.access_remove("USER/PACKAGE", "USER_OR_GROUP")
  • Local storage
  • quilt.ls()
  • quilt.rm("USER/PACKAGE")
  • Registry search
  • quilt.search("SEARCH STRING")
  • Filtering
  • pkg._filter(DICT_OR_LAMBDA)
  • Export a package or subpackage
  • quilt.export("USER/PACKAGE")
  • quilt.export("USER/PACKAGE", "DEST")
  • quilt.export("USER/PACKAGE", "DEST", force=True)
  • quilt.export("USER/PACKAGE", "DEST", symlinks=False)
  • quilt export USER/PACKAGE [DEST] [--symlinks] (command-line)
  • Import and use data
  • quilt.load("USR/PKG", hash=None)
  • Using packages
  • Work with package contents
  • Enumerate package contents
  • Edit a package

Was this helpful?

  1. API

Python

Core: build, push, and install packages

quilt.build("USER/PACKAGE", "PATH")

PATH may be a build.yml file or a directory. If a directory is given, Quilt will internally generate a build file (useful, e.g. for directories of images). build.yml is for users who want fine-grained control over parsing. |

quilt.push("USER/PACKAGE", is_public=False, is_team=False)

Stores the package in the registry

quilt.install(PKG [, hash="HASH", tag="TAG", version="VERSION"] [, force=False] [, meta_only=False])

Installs a package or sub-package.

  • force=True - skips yes/no prompt in the case of overwrite

  • meta_only=True - install only the package metadata (useful for filtering large packages)

    PKG may be any one of the following:

  • A string of the form "USR/PKG[/SUBPATH...]"

  • An anonymous package returned by filter: pkg._filter(...)

quilt.delete("USER/PACKAGE")

Removes the package from the registry. Does not delete local data.

Versioning

quilt.log(USER/PACKAGE)

Display push history

quilt.version_list(USER/PACKAGE)

Display versions of a package

quilt.version_add(USER/PACKAGE, VERSION, HASH)

Associate a version with a hash

quilt.tag_list(USER/PACKAGE)

List available tags

quilt.tag_add(USER/PACKAGE, TAG, HASH)

Associate a tag with a hash

quilt.tag_remove(USER/PACKAGE, TAG)

Remove a tag

Access

quilt.login([TEAM])

Authenticate to a registry

quilt.access_list("USER/PACKAGE")

List user who have access to a package

quilt.access_add("USER/PACKAGE", "USER_OR_GROUP")

Grant read access to a user or group (one of public or team)

quilt.access_remove("USER/PACKAGE", "USER_OR_GROUP")

Remove read access

Local storage

quilt.ls()

List installed packages

quilt.rm("USER/PACKAGE")

Remove a package from local storage (but not from the registry)

Registry search

quilt.search("SEARCH STRING")

Search registry for packages by user or package name |

Filtering

Note that you can quilt.install("USR/PKG", meta_only=True) if you wish to filter a large package based solely on its metadata. This avoids downloading the primary data to disk.

pkg._filter(DICT_OR_LAMBDA)

Package root nodes have a _filter method that accepts either a dictionary or a lambda.

_filter always returns nodes in the same position as they are found in the parent package. (The parent package in pkg._filter(...) is pkg.) Therefore, in addition to nodes that match the filter, _filter will include the following:

  • All descendants of a matching node (so that matching groups include all descendants)

  • All ancestors of a matching node (so that the position in the tree remains unchanged)

The return value of _filter can be passed to install as shown below.

Filter with a dict

Dictionary filters support two properties, name and meta:

import quilt

pkg = wine._filter({'name': 'README'})  # Just the readme
pkg = wine._filter({'meta': {'foo': 'bar'}})  # The group we created earlier
pkg = wine._filter({'meta': {'_system': {'transform': 'csv'}}})  # Dataframes created from CSVs
# install the filtered subste of the data
quilt.install(pkg);

Filter with a lambda function

Lambda filters accept the node object and its name. It provides more flexibility, but requires more care when accessing values:

pkg = wine._filter(lambda node, name: node._meta.get('_system', {}).get('filepath', '').endswith('.data'))

Export a package or subpackage

quilt.export("USER/PACKAGE")

Export data to current directory.

quilt.export("USER/PACKAGE", "DEST")

Export data to specified destination.

quilt.export("USER/PACKAGE", "DEST", force=True)

Overwrite files at destination.

quilt.export("USER/PACKAGE", "DEST", symlinks=False)

Use symlinks to the Quilt package directory instead of copies of files. This saves disk storage space and reduces disk I/O. See note below.

quilt export USER/PACKAGE [DEST] [--symlinks] (command-line)

Export data, using symbolic inks where possible.

If a node references raw (file) data, symbolic links may be used instead of copying data when exporting.

Caution when using symbolic links

  • When using any OS

    • If a file is edited, it may corrupt the local quilt repository. Preventing this is up to the user.

  • When using Windows

    • Symbolic links may not be supported

    • Symbolic links may require special permissions

    • Symbolic links may require administrative access (even if an administrator has the appropriate permissions)

Import and use data

For a package in the public cloud:

from quilt.data.USER import PACKAGE

For a package in a team registry:

from quilt.team.TEAM_NAME.USER import PACKAGE

Note, since Python module loads are cached by name, importing different versions of the same package using import syntax will fail. Use quilt.load instead.

quilt.load("USR/PKG", hash=None)

Returns the specified package. You can use quilt.load to simultaneously load different versions of the same package.

PACKAGE = quilt.load("USER/PACKAGE")

Specify a specific version to load by passing a package hash value:

PACKAGE = quilt.load("USER/PACKAGE", hash=HASH)

Using packages

Packages contain three types of nodes:

  • PackageNode - the root of the package tree

  • GroupNode - like a folder; may contain one or more GroupNode or DataNode objects

  • DataNode - a leaf node in the package; contains actual data

Work with package contents

  • List node contents with dot notation: PACKAGE.NODE.ANOTHER_NODE

  • Retrieve the contents of a DataNode with _data(), or simply (): PACKAGE.NODE.ANOTHER_NODE()

    • Columnar data (XLS, CSV, TSV, etc.) returns as a pandas.DataFrame

    • All other data types return a string to the path of the object in the package store

    • Provide a custom deserialzer by passing a function to data(asa=FUNCTION) with the signature function(NODE, LIST_OF_FILE_PATHS). A single node can contain data in multiple files (e.g., a DataFrame stored as a set of Parquet files). Calling data(asa=FUNCTION) on a GroupNode calls FUNCTION with the GroupNode object and a list of the paths to all of the objects in all of the child nodes.

Display package images in Jupyter notebooks

from quilt.data.akarve import BSDS300 as bsd
from quilt.asa.img import plot

bsd.images.test(asa=plot(figsize=(20, 20)))

Convert package nodes into Pytorch Datasets

from quilt.data.akarve import BSDS300 as bsd
from quilt.asa.pytorch import dataset

my_dataset = pkg.mixed.img(asa=dataset(
    include=is_image,
    node_parser=node_parser,
    input_transform=input_transform(crop_size, upscale_factor),
    target_transform=target_transform(crop_size)
))

Enumerate package contents

  • quilt.inspect("USER/PACKAGE") shows package columns, types, and shape

  • NODE._keys() returns a list of all children

  • NODE._data_keys() returns a list of all data children (leaf nodes containing actual data)

  • NODE._group_keys() returns a list of all group children (groups are like folders)

  • NODE._items() returns a generator of the node's children as (name, node) pairs.

  • NODE is iterable: for child in NODE:...

Example

from quilt.data.uciml import wine
In [7]: wine._keys()
Out[7]: ['README', 'raw', 'tables']
In [8]: wine._data_keys()
Out[8]: ['README']
In [9]: wine._group_keys()
Out[9]: ['raw', 'tables']

Edit a package

PACKAGENODE._set(PATH, VALUE)

Sets a child node. PATH is an array of strings, one for each level of the tree. VALUE is the new value. If it's a Pandas dataframe, it will be serialized. A string will be interpreted as a path to a file that contains the data to be packaged. Common columnar formats will be serialized into data frames. All other file formats, e.g. images, will be copied as-is.

GROUPNODE._add_group(NAME) adds an empty GroupNode with the given name to the children of GROUPNODE.

NODE._meta allows attaching metadata to a node

Attach JSON metadata to any group or data node by modifying the _meta attribute.

The '_system' key is reserved; anything assigned to it may get overwritten. Currently, data nodes contain two keys under '_system':

  • 'filepath': the original path of the file this node was built from

  • 'tranform': transform applied to the file

Example

import pandas as pd
import quilt
quilt.build('USER/PKG') # create new, empty package
from quilt.data.USER import PKG as pkg
pkg._set(['data'], pd.DataFrame(data=[1, 2, 3]))
pkg._set(['foo'], "example.txt")
pkg._meta['author'] = "me"
quilt.build('USER/PKG', pkg)

This adds a child node named data to the new empty package, with the new DataFrame as its value. Then it adds the contents of example.txt to a node called foo. Finally, it commits this change to disk by building the package with the modified object.

PreviousCommand lineNextbuild.yml

Last updated 5 years ago

Was this helpful?

.

.

See for a full code sample.

quiltdata/pytorch-examples
Install [img] extras
Install [pytorch,torchvision] extras