# quilt3.Package

## Package() <a href="#package" id="package"></a>

In-memory representation of a package

### manifest

Provides a generator of the dicts that make up the serialized package.

### top\_hash

Returns the top hash of the package.

Note that physical keys are not hashed because the package has the same semantics regardless of where the bytes come from.

**Returns**

A string that represents the top hash of the package

### Package.\_\_repr\_\_(self, max\_lines=20) <a href="#package.__repr" id="package.__repr"></a>

String representation of the Package.

### Package.install(name, registry=None, top\_hash=None, dest=None, dest\_registry=None, \*, path=None) <a href="#package.install" id="package.install"></a>

Installs a named package to the local registry and downloads its files.

**Arguments**

* **name(str)**: Name of package to install.
* **registry(str)**: Registry where package is located. Defaults to the default remote registry.
* **top\_hash(str)**: Hash of package to install. Defaults to latest.
* **dest(str)**: Local path to download files to.
* **dest\_registry(str)**: Registry to install package to. Defaults to local registry.
* **path(str)**: If specified, downloads only `path` or its children.

### Package.resolve\_hash(name, registry, hash\_prefix) <a href="#package.resolve_hash" id="package.resolve_hash"></a>

Find a hash that starts with a given prefix.

**Arguments**

* **name (str)**: name of package
* **registry (str)**: location of registry
* **hash\_prefix (str)**: hash prefix with length between 6 and 64 characters

### Package.browse(name, registry=None, top\_hash=None) <a href="#package.browse" id="package.browse"></a>

Load a package into memory from a registry without making a local copy of the manifest.

**Arguments**

* **name(string)**: name of package to load
* **registry(string)**: location of registry to load package from
* **top\_hash(string)**: top hash of package version to load

### Package.\_\_contains\_\_(self, logical\_key) <a href="#package.__contains" id="package.__contains"></a>

Checks whether the package contains a specified logical\_key.

**Returns**

True or False

### Package.\_\_getitem\_\_(self, logical\_key) <a href="#package.__getitem" id="package.__getitem"></a>

Filters the package based on prefix, and returns either a new Package or a PackageEntry.

**Arguments**

* **prefix(str)**: prefix to filter on

**Returns**

PackageEntry if prefix matches a logical\_key exactly otherwise Package

### Package.fetch(self, dest='./') <a href="#package.fetch" id="package.fetch"></a>

Copy all descendants to `dest`. Descendants are written under their logical names *relative* to self.

**Arguments**

* **dest**: where to put the files (locally)

**Returns**

A new Package object with entries from self, but with physical keys pointing to files in `dest`.

### Package.keys(self) <a href="#package.keys" id="package.keys"></a>

Returns logical keys in the package.

### Package.walk(self) <a href="#package.walk" id="package.walk"></a>

Generator that traverses all entries in the package tree and returns tuples of (key, entry), with keys in alphabetical order.

### Package.load(readable\_file) <a href="#package.load" id="package.load"></a>

Loads a package from a readable file-like object.

**Arguments**

* **readable\_file**: readable file-like object to deserialize package from

**Returns**

A new Package object

**Raises**

file not found json decode error invalid package exception

### Package.set\_dir(self, lkey, path=None, meta=None, update\_policy='incoming', unversioned: bool = False) <a href="#package.set_dir" id="package.set_dir"></a>

Adds all files from `path` to the package.

Recursively enumerates every file in `path`, and adds them to the package according to their relative location to `path`.

**Arguments**

* **lkey(string)**: prefix to add to every logical key, use '/' for the root of the package.
* **path(string)**: path to scan for files to add to package. If None, lkey will be substituted in as the path.
* **meta(dict)**: user level metadata dict to attach to lkey directory entry.
* **update\_policy(str)**: can be either 'incoming' (default) or 'existing'. If 'incoming', whenever logical keys match, always take the new entry from set\_dir. If 'existing', whenever logical keys match, retain existing entries and ignore new entries from set\_dir.
* **unversioned(bool)**: when True, do not retrieve VersionId for S3 physical keys.

**Returns**

self

**Raises**

* `PackageException`: When `path` doesn't exist.
* `ValueError`: When `update_policy` is invalid.

### Package.get(self, logical\_key) <a href="#package.get" id="package.get"></a>

Gets object from logical\_key and returns its physical path. Equivalent to self\[logical\_key].get().

**Arguments**

* **logical\_key(string)**: logical key of the object to get

**Returns**

Physical path as a string.

**Raises**

* `KeyError`: when logical\_key is not present in the package
* `ValueError`: if the logical\_key points to a Package rather than PackageEntry.

### Package.readme(self) <a href="#package.readme" id="package.readme"></a>

Returns the README PackageEntry

The README is the entry with the logical key 'README.md' (case-sensitive). Will raise a QuiltException if no such entry exists.

### Package.set\_meta(self, meta) <a href="#package.set_meta" id="package.set_meta"></a>

Sets user metadata on this Package.

### Package.build(self, name, registry=None, message=None, \*, workflow=Ellipsis) <a href="#package.build" id="package.build"></a>

Serializes this package to a registry.

**Arguments**

* **name**: optional name for package
* **registry**: registry to build to defaults to local registry
* **message**: the commit message of the package
* **workflow**: workflow ID or `None` to skip workflow validation. If not specified, the default workflow will be used.
* **For details see**: <https://docs.quilt.bio/advanced-usage/workflows>

**Returns**

The top hash as a string.

### Package.dump(self, writable\_file) <a href="#package.dump" id="package.dump"></a>

Serializes this package to a writable file-like object.

**Arguments**

* **writable\_file**: file-like object to write serialized package.

**Returns**

None

**Raises**

fail to create file fail to finish write

### Package.set(self, logical\_key, entry=None, meta=None, serialization\_location=None, serialization\_format\_opts=None, unversioned: bool = False) <a href="#package.set" id="package.set"></a>

Returns self with the object at logical\_key set to entry.

**Arguments**

* **logical\_key(string)**: logical key to update
* **entry(PackageEntry OR string OR object)**: new entry to place at logical\_key in the package. If entry is a string, it is treated as a URL, and an entry is created based on it. If entry is None, the logical key string will be substituted as the entry value. If entry is an object and quilt knows how to serialize it, it will immediately be serialized and written to disk, either to serialization\_location or to a location managed by quilt. List of types that Quilt can serialize is available by calling `quilt3.formats.FormatRegistry.all_supported_formats()`
* **meta(dict)**: user level metadata dict to attach to entry
* **serialization\_format\_opts(dict)**: Optional. If passed in, only used if entry is an object. Options to help Quilt understand how the object should be serialized. Useful for underspecified file formats like csv when content contains confusing characters. Will be passed as kwargs to the FormatHandler.serialize() function. See docstrings for individual FormatHandlers for full list of options -
* **https**: //github.com/quiltdata/quilt/blob/master/api/python/quilt3/formats.py
* **serialization\_location(string)**: Optional. If passed in, only used if entry is an object. Where the serialized object should be written, e.g. "./mydataframe.parquet"
* **unversioned(bool)**: when True, do not retrieve VersionId for S3 physical keys.

**Returns**

self

### Package.delete(self, logical\_key) <a href="#package.delete" id="package.delete"></a>

Returns self with logical\_key removed.

**Returns**

self

**Raises**

* `KeyError`: when logical\_key is not present to be deleted

### Package.push(self, name, registry=None, dest=None, message=None, selector\_fn=None, \*, workflow=Ellipsis, force: bool = False, dedupe: bool = False) <a href="#package.push" id="package.push"></a>

Creates a new package, or a new revision of an existing package in a package registry in Amazon S3.

By default, any files not currently in the destination bucket are copied to the destination S3 bucket at a path matching logical key structure. Files in the destination bucket are not copied even if they are not located in in the location matching the logical key. After objects are copied, a new package manifest is package manifest is created that points to the objects in their new locations.

The optional parameter `selector_fn` allows callers to choose which files are copied to the destination bucket, and which retain their existing physical key. When using selector functions, it is important to always copy local files to S3, otherwise the resulting package will be inaccessible to users accessing it from Amazon S3.

The Package class includes two additional built-in selector functions:

* `Package.selector_fn_copy_all` copies all files to the destination path regardless of their current location.
* `Package.selector_fn_copy_local` copies only local files to the destination path. Any PackageEntry's with physical keys pointing to objects in other buckets will retain their existing physical keys in the resulting package.

If we have a package with entries:

* `pkg["entry_1"].physical_key = s3://bucket1/folder1/entry_1`
* `pkg["entry_2"].physical_key = s3://bucket2/folder2/entry_2`

And, we call `pkg.push("user/pkg_name", registry="s3://bucket2")`, the file referenced by `entry_1` will be copied, while the file referenced by `entry_2` will not. The resulting package will have the following entries:

* `pkg["entry_1"].physical_key = s3://bucket2/user/pkg_name/entry_1`
* `pkg["entry_2"].physical_key = s3://bucket2/folder1/entry_2`

Quilt3 Versions 6.3.1 and earlier copied all files to the destination path by default. To match this behavior in later versions, callers should use `selector_fn=Package.selector_fn_copy_all`.

Using the same initial package and push, but adding `selector_fn=Package.selector_fn_copy_all` will result in both files being copied to the destination path, producing the following package:

* `pkg["entry_1"].physical_key = s3://bucket2/user/pkg_name/entry_1`
* `pkg["entry_2"].physical_key = s3://bucket2/user/pkg_name/entry_2`

Note that push is careful to not push data unnecessarily. To illustrate, imagine you have a PackageEntry: `pkg["entry_1"].physical_key = "/tmp/package_entry_1.json"`

If that entry would be pushed to `s3://bucket/prefix/entry_1.json`, but `s3://bucket/prefix/entry_1.json` already contains the exact same bytes as '/tmp/package\_entry\_1.json', `quilt3` will not push the bytes to S3, no matter what `selector_fn('entry_1', pkg["entry_1"])` returns.

By default, push will not overwrite an existing package if its top hash does not match the parent hash of the package being pushed. Use `force=True` to skip the check.

**Arguments**

* **name**: name for package in registry
* **dest**: where to copy the objects in the package. Must be either an S3 URI prefix (e.g., s3://$bucket/$key) in the registry bucket, or a callable that takes logical\_key and package\_entry, and returns an S3 URI. (Changed in 6.0.0a1) previously top\_hash was passed to the callable dest as a third argument.
* **registry**: registry where to create the new package
* **message**: the commit message for the new package
* **selector\_fn**: An optional function that determines which package entries should be copied to S3. The function takes in two arguments, logical\_key and package\_entry, and should return False if that PackageEntry should not be copied to the destination registry during push. If for example you have a package where the files are spread over multiple buckets and you add a single local file, you can use selector\_fn to only push the local file to S3 (instead of pushing all data to the destination bucket).
* **workflow**: workflow ID or `None` to skip workflow validation. If not specified, the default workflow will be used.
* **For details see**: <https://docs.quilt.bio/advanced-usage/workflows>
* **force**: skip the top hash check and overwrite any existing package
* **dedupe**: don't push if the top hash matches the existing package top hash; return the current package

**Returns**

A new package that points to the copied objects.

### Package.rollback(name, registry, top\_hash) <a href="#package.rollback" id="package.rollback"></a>

Set the "latest" version to the given hash.

**Arguments**

* **name(str)**: Name of package to rollback.
* **registry(str)**: Registry where package is located.
* **top\_hash(str)**: Hash to rollback to.

### Package.diff(self, other\_pkg) <a href="#package.diff" id="package.diff"></a>

Returns three lists -- added, modified, deleted.

Added: present in other\_pkg but not in self. Modified: present in both, but different. Deleted: present in self, but not other\_pkg.

**Arguments**

* **other\_pkg**: Package to diff

**Returns**

added, modified, deleted (all lists of logical keys)

### Package.map(self, f, include\_directories=False) <a href="#package.map" id="package.map"></a>

Performs a user-specified operation on each entry in the package.

**Arguments**

* **f(x, y)**: function The function to be applied to each package entry. It should take two inputs, a logical key and a PackageEntry.
* **include\_directories**: bool Whether or not to include directory entries in the map.

Returns: list The list of results generated by the map.

### Package.filter(self, f, include\_directories=False) <a href="#package.filter" id="package.filter"></a>

Applies a user-specified operation to each entry in the package, removing results that evaluate to False from the output.

**Arguments**

* **f(x, y)**: function The function to be applied to each package entry. It should take two inputs, a logical key and a PackageEntry. This function should return a boolean.
* **include\_directories**: bool Whether or not to include directory entries in the map.

**Returns**

A new package with entries that evaluated to False removed

### Package.verify(self, src, extra\_files\_ok=False) <a href="#package.verify" id="package.verify"></a>

Check if the contents of the given directory matches the package manifest.

**Arguments**

* **src(str)**: URL of the directory
* **extra\_files\_ok(bool)**: Whether extra files in the directory should cause a failure.

**Returns**

True if the package matches the directory; False otherwise.

## PackageEntry(physical\_key, size, hash\_obj, meta) <a href="#packageentry" id="packageentry"></a>

Represents an entry at a logical key inside a package.

**\_\_init\_\_**

Creates an entry.

**Arguments**

* **physical\_key**: a URI (either `s3://` or `file://`)
* **size(number)**: size of object in bytes
* **hash({'type'**: string, 'value': string}): hash object
* **for example**: {'type': 'SHA256', 'value': 'bb08a...'}
* **meta(dict)**: metadata dictionary

**Returns**

a PackageEntry

### **slots**

Built-in immutable sequence.

If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable's items.

If the argument is a tuple, the return value is the same object.

### PackageEntry.as\_dict(self) <a href="#packageentry.as_dict" id="packageentry.as_dict"></a>

Returns dict representation of entry.

### PackageEntry.set\_meta(self, meta) <a href="#packageentry.set_meta" id="packageentry.set_meta"></a>

Sets the user\_meta for this PackageEntry.

### PackageEntry.set(self, path=None, meta=None) <a href="#packageentry.set" id="packageentry.set"></a>

Returns self with the physical key set to path.

**Arguments**

* **path(string)**: new path to place at logical\_key in the package Currently only supports a path on local disk
* **meta(dict)**: metadata dict to attach to entry. If meta is provided, set just updates the meta attached to logical\_key without changing anything else in the entry

**Returns**

self

### PackageEntry.get(self) <a href="#packageentry.get" id="packageentry.get"></a>

Returns the physical key of this PackageEntry.

### PackageEntry.get\_cached\_path(self) <a href="#packageentry.get_cached_path" id="packageentry.get_cached_path"></a>

Returns a locally cached physical key, if available.

### PackageEntry.get\_bytes(self, use\_cache\_if\_available=True) <a href="#packageentry.get_bytes" id="packageentry.get_bytes"></a>

Returns the bytes of the object this entry corresponds to. If 'use\_cache\_if\_available'=True, will first try to retrieve the bytes from cache.

### PackageEntry.get\_as\_json(self, use\_cache\_if\_available=True) <a href="#packageentry.get_as_json" id="packageentry.get_as_json"></a>

Returns a JSON file as a `dict`. Assumes that the file is encoded using utf-8.

If 'use\_cache\_if\_available'=True, will first try to retrieve the object from cache.

### PackageEntry.get\_as\_string(self, use\_cache\_if\_available=True) <a href="#packageentry.get_as_string" id="packageentry.get_as_string"></a>

Return the object as a string. Assumes that the file is encoded using utf-8.

If 'use\_cache\_if\_available'=True, will first try to retrieve the object from cache.

### PackageEntry.deserialize(self, func=None, \*\*format\_opts) <a href="#packageentry.deserialize" id="packageentry.deserialize"></a>

Returns the object this entry corresponds to.

**Arguments**

* **func**: Skip normal deserialization process, and call func(bytes), returning the result directly.
* **\*\*format\_opts**: Some data formats may take options. Though normally handled by metadata, these can be overridden here.

**Returns**

The deserialized object from the logical\_key

**Raises**

physical key failure hash verification fail when deserialization metadata is not present

### PackageEntry.fetch(self, dest=None) <a href="#packageentry.fetch" id="packageentry.fetch"></a>

Gets objects from entry and saves them to dest.

**Arguments**

* **dest**: where to put the files Defaults to the entry name

**Returns**

None

### PackageEntry.\_\_call\_\_(self, func=None, \*\*kwargs) <a href="#packageentry.__call" id="packageentry.__call"></a>

Shorthand for self.deserialize()
