Working with Manifests
Every data package is backed by a manifest. A manifest is a self-contained reference sheet for a package, containing all of the files data and metadata necessary to work with a package.
Every time you save a data package to a registry you also save its manifest. You can inspect the manifest yourself using the manifest
property:
import quilt3
import os
# create test file
with open("data.csv", "w") as f:
f.write("id, value\na, 42")
p = quilt3.Package().set("data.csv", "data.csv")
list(p.manifest)
[{'version': 'v0'},
{'logical_key': 'data.csv',
'physical_keys': ['file:///.../data.csv'],
'size': 15,
'hash': None,
'meta': {}}]
Manifests saved to disk are in the jsonl/ndjson format, e.g. JSON strings separated by newlines (). They are represented as a list
of dict
fragments in-memory.
Manifest specification
The first item in the manifest contains the manifest version number (version
), the package metadata (user_meta
), and a package commit message (message
).
version
is used to ensure backwards compatibility should the serialization format change. There is currently only one valid version
: v0
.
user_meta
is used to store any user-defined package metadata. In this case this package has no package metadata (yet) so user_meta
is omitted.
message
stores the package commit message. It will be None
if the package is pushed without a commit message. The field is omitted if the package was never subject to a push
.
Every item after that is a manifest entry. Entries may be files/objects or directories. All files in the package will have a corresponding entry, as will all directories with metadata. Directories without metadata are omitted.
The manifest fields are as follows:
logical_key
- The path to the entry within the package.physical_keys
- A list of files. Currently this field will always have a single entry. This field is omitted if the entry is a directory.size
- The size of the entry in raw bytes. This field is omitted if the entry is a directory.hash
- Materialized packages record a content hash for every entry in the package. This field is used to ensure package immutability (the tophash is partly a hash of these hashes).If the hash is present it will be a
dict
fragment of the form{'type': 'SHA256', 'value': '...'}
. Un-materialized package entries have ahash
ofNone
, as in our example. Directory entries omit this field.meta
- Package entry metadata. Package entries lacking metadata will have ameta
of{}
(emptydict
).
Saving and loading manifests
In almost all cases you should be using registries, build
, and push
to handle sending manifests to and fro. However, there may be advanced use cases where you want to save or load a manifest directly. For that, you can use the low-level manifest API:
p = quilt3.Package().set("data.csv", "data.csv")
p.set_meta({"key": "value"})
# Saving manifest to jsonl file
with open("example.jsonl", "w") as f:
p.dump(f)
# Loading maifest
p.load(open("example.jsonl", "r"))
Loading manifest: 100%|██████████| 2/2 [00:00<00:00, 2141.59entries/s]
(local Package)
└─data.csv
Last updated
Was this helpful?