Mental Model
This guide explains the fundamental concepts behind Quilt's data management system. Think of it as your roadmap to understanding how Quilt organizes, versions, and manages data.
🎯 The Big Picture
Quilt treats data like code - with versioning, immutability, and collaboration built-in. Instead of managing individual files scattered across storage systems, you work with packages that bundle related data together with metadata and provenance.
Traditional Approach → Quilt Approach
├── file1.csv 📦 myteam/customer-data
├── file2.json ├── 📄 customers.csv
├── file3.parquet ├── 📄 transactions.json
└── README.txt ├── 📄 analytics.parquet
├── 📄 README.md
└── 🏷️ metadata + version hash📦 Core Concept: Packages
What is a Package?
A package is Quilt's fundamental unit of data organization. Think of it as a versioned, immutable collection of related files with a clear identity and history.
Key Properties:
Immutable: Once created, package contents never change
Versioned: Each change creates a new version with a unique hash
Named: Human-readable names like
myteam/customer-analyticsTracked: Complete history and provenance of all changes
Package Anatomy
Every package consists of:
Real-World Example
🗂️ The Manifest System
Understanding Manifests
The manifest is Quilt's "table of contents" - it maps user-friendly names to actual file locations and includes integrity information.
Manifest Entry Structure:
Logical vs Physical Keys
Purpose
User-friendly name
Actual storage location
Example
"data/customers.csv"
"s3://bucket/a1b2c3/customers.csv?versionId=xyz"
Stability
Stable across versions
Changes with storage
Usage
Code references
Internal system use
Example Manifest Entry
Why This Matters:
✅ Portability: Move data between storage systems without breaking code
✅ Integrity: Cryptographic hashes ensure data hasn't been corrupted
✅ Metadata: Rich context about each file's purpose and properties
✅ Versioning: Track exactly what changed between package versions
🏢 Registries: Where Packages Live
Registry Concept
A registry is where Quilt stores package manifests and optionally the data itself. Think of it as a "database" of packages.
Supported Registry Types:
🌐 S3 Buckets: Cloud-native, scalable, with built-in versioning
💻 Local Disk: For development and testing
🔮 Future: GCP, Azure, NAS (on roadmap)
Registry Examples
🌊 Buckets as Branches
The Git Analogy
In Quilt, S3 buckets function like Git branches - each represents a different stage or environment in your data lifecycle.
Recommended Bucket Strategy
Three-Bucket Minimum:
🔴 Raw Bucket (
s3://company-raw)Ingested data, minimal processing
Experimental datasets
Temporary analysis results
🟡 Staging Bucket (
s3://company-staging)Validated and cleaned data
Ready for testing and QA
Pre-production datasets
🟢 Production Bucket (
s3://company-prod)Fully validated, production-ready data
Used by live applications and dashboards
Strict access controls and governance
Package Promotion Workflow
🔄 Immutability and Versioning
Why Immutability Matters
Immutable packages mean that once created, a package version never changes. This provides:
✅ Reproducibility: Analyses can be exactly repeated
✅ Audit Trail: Complete history of all changes
✅ Rollback Safety: Easy to revert to previous versions
✅ Parallel Work: Teams can work simultaneously without conflicts
Version Management
🎯 Practical Mental Model
Think of Quilt Like...
Git
Git for data - versioning, branching (buckets), immutable commits (packages)
Docker
Container images for data - immutable, portable, with manifests
Package Managers
npm/pip for datasets - named packages, versions, dependencies
Databases
Schema-aware data warehouse with built-in versioning and lineage
Key Principles to Remember
📦 Package-Centric: Always think in terms of related collections, not individual files
🔒 Immutable: Versions never change - create new versions instead of modifying
🏷️ Named & Hashed: Every package has a human name and cryptographic identity
🌊 Bucket Workflows: Use different buckets for different data lifecycle stages
📋 Manifest-Driven: Logical names abstract away physical storage details
🚀 Next Steps
Now that you understand Quilt's mental model:
Try It: Follow the Quick Start to create your first package
Learn Workflows: Explore package workflows
Set Up Team Access: Configure collaboration features
Advanced Topics: Learn about schemas and validation
Remember: Quilt transforms chaotic data management into organized, versioned, collaborative workflows. The mental model is simple - treat your data like code, and Quilt handles the complexity!
Last updated
Was this helpful?

