# Work with packages

## Read a package

Packages may contain data of any size or type. A given package *instance*--specified by a hash, tag, or version--is *immutable* for reproducibility.

### Install a data package from  user `uciml`

```bash
$ quilt install uciml/iris
```

> Note: most Quilt commands are available *both on the command line and in Python*.

You can install a package as follows:

```python
import quilt
quilt.install("uciml/iris")
```

### Import the package

```
$ python
>>> from quilt.data.uciml import iris
>>> iris
<PackageNode 'Users/YOU/quilt_packages/uciml/iris'>
raw/
tables/
README
>>> iris.tables.bezdek_iris() # this is a pandas DataFrame
    sepal_length  sepal_width  petal_length  petal_width  label
0  5.1           3.5          1.4           0.2          Iris-setosa
1  4.9           3.0          1.4           0.2          Iris-setosa
2  4.7           3.2          1.3           0.2          Iris-setosa
...
```

Read more about the `uciml/iris` package on its [landing page](https://quiltdata.com/package/uciml/iris), or [browse packages on Quilt](https://quiltdata.com/search/?q=).

## Edit a package

Start by installing and importing the package you wish to modify:

```python
import quilt
quilt.install("uciml/wine")
from quilt.data.uciml import wine
```

Alternatively, you can build an empty package and import it for editing:

```python
import quilt
quilt.build("USER/FOO")
from quilt.data.USER import FOO
```

> **Update**: As of version 2.9.9, easiest method to edit a package is to use [subpackage build and push](https://github.com/quiltdata/examples/blob/master/Numpy%2C%20easy%20package%20edit.ipynb).

### Edit dataframe nodes

Use the Pandas API to edit existing dataframes:

```python
df = wine.tables.wine()
hue = df['Hue']
df['HueNormalized'] = (hue - hue.min())/(hue.max() - hue.min())
```

### Add package nodes

Use the `_set` helper method on the top-level package node to create new groups and data nodes:

```python
import pandas as pd
df = pd.DataFrame(dict(x=[1, 2, 3]))
# insert a dataframe at wine.mygroup.data()
wine._set(["mygroup", "data"], df) 
# insert a file at wine.mygroup.anothergroup.blob()
wine._set(["mygroup", "anothergroup", "blob"], "localpath/file.txt") #
```

### Delete package nodes

Use `del` to delete attributes:

```python
del wine.raw.wine
```

### Edit metadata

Use the `_meta` attribute to attach any JSON-serializable dictionary of metadata to a group or a data node:

```python
wine.mygroup._meta['foo'] = 'bar'
wine.mygroup._meta['created'] = time.time()
```

Data nodes contain a built-in key `_meta['_system']` with information such as the original file path. You may access it, but any modifications to it may be lost.

### Persist changes

At this point, your changes only exist in memory. To persist your changes, read on to learn about `build` and `push`.

## Build a package

Building a package creates a local bundle of serialized data. `$ quilt ls` displays your local packages and their location on disk.

There are three ways to build data packages with Quilt:

1. Implicitly with `quilt build USR/PKG DIRECTORY`. Implicit builds are good for taking quick snapshots of unstructured data like images or text files. Quilt serializes columnar formats formats (xls, csv, tsv, etc.) to data frames; all other files will be copied "as is".
2. Explicitly with `quilt build USR/PKG FILE.YML`. Explicit builds allow fine-grained control over package names, types, and contents.
3. One the fly, in Python

Each of the above methods for building packages is supported in [Python](https://github.com/quiltdata/quilt/tree/38ebf1261a117ba68a2c9e643216cee9923658db/docs/api.md) and on the [command line](https://github.com/quiltdata/quilt/tree/38ebf1261a117ba68a2c9e643216cee9923658db/docs/api.md).

### Implicit builds

To implicitly build a package of unserialized data:

```bash
quilt build USR/PKG DIRECTORY
```

Everything in `DIR` and it's subdirectories will be packaged into `USR/PKG`.

To publish your package:

```bash
quilt push USR/PKG --public
```

Users on Individual and Business plans can omit the `--public` flag to create private packages.

### Explicit builds

Explicit builds cue from a YAML file, conventionally called `build.yml`.

```bash
quilt build USR/PKG BUILD.YML
```

`build.yml` specifies the structure and contents of a package.

#### `quilt generate` creates a `build.yml` file

An easy way to create a `build.yml` file is as follows:

```bash
quilt generate DIR
```

The above command creates `build.yml` and `README.md` files that you can modify to your liking. A `README.md` file is highly recommended as it populates your package landing page with documentation. See the API section for more on how README markdown is converted to HTML.

See [`build.yml` syntax](https://docs.quiltdata.com/api/build.yml) for more.

**Directory and file naming in quilt generate**

* Directories and files that start with a numeric character or underscore will be prefixed with the letter `n`. If a name collision results, the build will fail with an error.
* If two files have the same path and root name, but different file extensions (`foo.txt`, `foo.csv`), the extensions will be appended as follows: `foo_txt`, `foo_csv`. If, after appending, there remains a name collision, the build will fail with an error.

### Build on the fly

```python
# start with an empty package
quilt.build("akarve/foo")
# put some data in it
import pandas as pd
from quilt.data.akarve import foo
df = pd.DataFrame(data=[1,2,3])
foo._set(['bar'], df)
foo.bar()
# Output:
# 0
# 0    1
# 1    2
# 2    3
```

### Valid package names

Package handles take the form `USER_NAME/PACKAGE_NAME`. The package name and all of its children must be valid Python identifiers:

* Start with a letter
* Contain only alphanumerics and underscore

The above criteria ensure that packages can be accessed with Python's dot operator.

## Push a package

Pushing a package stores a built package in a server-side registry. Push a package to back up changes or share your package with others.

```bash
$ quilt login # requires free account
$ quilt push USR/PKG --public
```

Or, in Python:

```python
# log in to the registry (requires a free account)
quilt.login()
# push it to the registry
quilt.push("USR/PKG", is_public=True)
```

Users on Individual and Business plans can omit ~~is\_public=True~~ to create private packages.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.quilt.bio/quilt-2-master/get-started/step-by-step.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
