Mental Model

This guide explains the fundamental concepts behind Quilt's data management system. Think of it as your roadmap to understanding how Quilt organizes, versions, and manages data.

🎯 The Big Picture

Quilt treats data like code - with versioning, immutability, and collaboration built-in. Instead of managing individual files scattered across storage systems, you work with packages that bundle related data together with metadata and provenance.

Traditional Approach          →    Quilt Approach
├── file1.csv                      📦 myteam/customer-data
├── file2.json                     ├── 📄 customers.csv
├── file3.parquet                  ├── 📄 transactions.json  
└── README.txt                     ├── 📄 analytics.parquet
                                   ├── 📄 README.md
                                   └── 🏷️  metadata + version hash

📦 Core Concept: Packages

What is a Package?

A package is Quilt's fundamental unit of data organization. Think of it as a versioned, immutable collection of related files with a clear identity and history.

Key Properties:

  • Immutable: Once created, package contents never change

  • Versioned: Each change creates a new version with a unique hash

  • Named: Human-readable names like myteam/customer-analytics

  • Tracked: Complete history and provenance of all changes

Package Anatomy

Every package consists of:

Real-World Example

🗂️ The Manifest System

Understanding Manifests

The manifest is Quilt's "table of contents" - it maps user-friendly names to actual file locations and includes integrity information.

Manifest Entry Structure:

Logical vs Physical Keys

Aspect
Logical Key
Physical Key

Purpose

User-friendly name

Actual storage location

Example

"data/customers.csv"

"s3://bucket/a1b2c3/customers.csv?versionId=xyz"

Stability

Stable across versions

Changes with storage

Usage

Code references

Internal system use

Example Manifest Entry

Why This Matters:

  • Portability: Move data between storage systems without breaking code

  • Integrity: Cryptographic hashes ensure data hasn't been corrupted

  • Metadata: Rich context about each file's purpose and properties

  • Versioning: Track exactly what changed between package versions

🏢 Registries: Where Packages Live

Registry Concept

A registry is where Quilt stores package manifests and optionally the data itself. Think of it as a "database" of packages.

Supported Registry Types:

  • 🌐 S3 Buckets: Cloud-native, scalable, with built-in versioning

  • 💻 Local Disk: For development and testing

  • 🔮 Future: GCP, Azure, NAS (on roadmap)

Registry Examples

🌊 Buckets as Branches

The Git Analogy

In Quilt, S3 buckets function like Git branches - each represents a different stage or environment in your data lifecycle.

Three-Bucket Minimum:

  1. 🔴 Raw Bucket (s3://company-raw)

    • Ingested data, minimal processing

    • Experimental datasets

    • Temporary analysis results

  2. 🟡 Staging Bucket (s3://company-staging)

    • Validated and cleaned data

    • Ready for testing and QA

    • Pre-production datasets

  3. 🟢 Production Bucket (s3://company-prod)

    • Fully validated, production-ready data

    • Used by live applications and dashboards

    • Strict access controls and governance

Package Promotion Workflow

🔄 Immutability and Versioning

Why Immutability Matters

Immutable packages mean that once created, a package version never changes. This provides:

  • Reproducibility: Analyses can be exactly repeated

  • Audit Trail: Complete history of all changes

  • Rollback Safety: Easy to revert to previous versions

  • Parallel Work: Teams can work simultaneously without conflicts

Version Management

🎯 Practical Mental Model

Think of Quilt Like...

If you're familiar with...
Think of Quilt as...

Git

Git for data - versioning, branching (buckets), immutable commits (packages)

Docker

Container images for data - immutable, portable, with manifests

Package Managers

npm/pip for datasets - named packages, versions, dependencies

Databases

Schema-aware data warehouse with built-in versioning and lineage

Key Principles to Remember

  1. 📦 Package-Centric: Always think in terms of related collections, not individual files

  2. 🔒 Immutable: Versions never change - create new versions instead of modifying

  3. 🏷️ Named & Hashed: Every package has a human name and cryptographic identity

  4. 🌊 Bucket Workflows: Use different buckets for different data lifecycle stages

  5. 📋 Manifest-Driven: Logical names abstract away physical storage details

🚀 Next Steps

Now that you understand Quilt's mental model:

  1. Try It: Follow the Quick Start to create your first package

  2. Learn Workflows: Explore package workflows

  3. Set Up Team Access: Configure collaboration featuresarrow-up-right

  4. Advanced Topics: Learn about schemas and validation


Remember: Quilt transforms chaotic data management into organized, versioned, collaborative workflows. The mental model is simple - treat your data like code, and Quilt handles the complexity!

Last updated

Was this helpful?