Introduction
Datamon helps build ML pipelines by adding versioning, auditing and lineage tracking to cloud storage tools (e.g. Google GCS, AWS S3). This is not a replacement for these tools, but rather a way to manage their inputs and outputs.
Datamon works by providing a git-like interface to manage data efficiently: your data buckets are organized in repositories of versioned and tagged bundles of files.
Installation and Setup
Version Information
# Check Datamon version
d2 version
# Output:
# Version: 2.3.0
# BuildDate: 2020-11-10T11:29:17Z
# Commit: 17ce7d6
# Working tree: clean
Configuration
# Set config for different environments
d2 config set --config global-onec-co-datamon-config --context dev
d2 config set --config global-onec-co-datamon-config --context staging
d2 config set --config workshop-config
Context Management
# List all contexts
d2 context list
# Output:
# [dev prod staging]
# Get current context
d2 context
# or
d2 get context
# Get specific context
d2 context get --context dev
# Output:
# Model Version: 0
# Name: dev
# WAL: dev-onec-co-datamon-metadata-wal
# ReadLog: dev-onec-co-datamon-readlog
# Blob: global-onec-co-datamon-blob
# Metadata: dev-onec-co-datamon-metadata
# Version Metadata: dev-onec-co-datamon-vmetadata
# Create new context
d2 context create test
Repository Management
# Get repository details
d2 repo get zenrin-estat-residential
# or
d2 repo get --repo zenrin-estat-residential
# List repositories
d2 repo list | grep zenrin-estat-residential
# Create new repositories
d2 repo create --repo ntd-road-source-dev --description "raw download of ntd road data"
d2 repo list | grep ntd
# Create repository in specific context
d2 repo create --context staging --repo ntd-road-source-staging --description "the original ntd road data"
d2 repo list --context staging | grep ntd
Bundle Management
Uploading Bundles
# Upload a bundle
d2 bundle upload --path folder-to-upload --repo mkang-test-repo --message "my first upload"
# Upload to specific repository
d2 bundle upload --path ~/occ/prod/01-built-object-service/tmp/ntd/RI --repo ntd-road-source-dev
d2 bundle upload --path ~/occ/prod/01-built-object-service/tmp/ntd/RI --repo ntd-road-source-dev --message "upload RI"
Listing Bundles
# List all bundles
d2 bundle list
# List bundles in a repository
d2 bundle list --repo mkang-test-repo
d2 bundle list --repo ntd-road-source-dev
d2 bundle list --repo zenrin-estat-residential
d2 bundle list --repo resilience-japan-hazard-maps
d2 bundle list --repo Seattle-Sample-Corelogic-Run-Data
# List bundles in different contexts
d2 bundle list --repo ntd-road-source-staging --context staging
d2 bundle list --repo ntd-road-source-dev --context dev
Bundle Operations
# List files in a bundle
d2 bundle list files --repo mkang-test-repo --bundle 1fySBuavEhqWAXnYnZEiDCNm8TC
d2 bundle list files --repo mkang-test-repo --bundle 1fySBuavEhq 2>/dev/null | grep file
# Mount bundle
d2 bundle mount --repo mkang-test-repo --mount ~/mnt --daemonize
d2 bundle mount --repo mkang-test-repo --label 1fySBuavEhqWAXnYnZEiDCNm8TC --mount ~/mnt --daemonize
# Download bundle
d2 bundle download --repo mkang-test-repo --destination .