Introduction
Datamon helps build ML pipelines by adding versioning, auditing and lineage tracking to cloud storage tools (e.g. Google GCS, AWS S3). This is not a replacement for these tools, but rather a way to manage their inputs and outputs.
Datamon works by providing a git-like interface to manage data efficiently: your data buckets are organized in repositories of versioned and tagged bundles of files.
Installation and Setup
Version Information
## Check Datamon version
d2 version
## Output:
## Version: 2.3.0
## BuildDate: 2020-11-10T11:29:17Z
## Commit: 17ce7d6
## Working tree: clean
Configuration
## Set config for different environments
d2 config set --config global-onec-co-datamon-config --context dev
d2 config set --config global-onec-co-datamon-config --context staging
d2 config set --config workshop-config
Context Management
## List all contexts
d2 context list
## Output:
## [dev prod staging]
## Get current context
d2 context
## or
d2 get context
## Get specific context
d2 context get --context dev
## Output:
## Model Version: 0
## Name: dev
## WAL: dev-onec-co-datamon-metadata-wal
## ReadLog: dev-onec-co-datamon-readlog
## Blob: global-onec-co-datamon-blob
## Metadata: dev-onec-co-datamon-metadata
## Version Metadata: dev-onec-co-datamon-vmetadata
## Create new context
d2 context create test
Repository Management
## Get repository details
d2 repo get zenrin-estat-residential
## or
d2 repo get --repo zenrin-estat-residential
## List repositories
d2 repo list | grep zenrin-estat-residential
## Create new repositories
d2 repo create --repo ntd-road-source-dev --description "raw download of ntd road data"
d2 repo list | grep ntd
## Create repository in specific context
d2 repo create --context staging --repo ntd-road-source-staging --description "the original ntd road data"
d2 repo list --context staging | grep ntd
Bundle Management
Uploading Bundles
## Upload a bundle
d2 bundle upload --path folder-to-upload --repo mkang-test-repo --message "my first upload"
## Upload to specific repository
d2 bundle upload --path ~/occ/prod/01-built-object-service/tmp/ntd/RI --repo ntd-road-source-dev
d2 bundle upload --path ~/occ/prod/01-built-object-service/tmp/ntd/RI --repo ntd-road-source-dev --message "upload RI"
Listing Bundles
## List all bundles
d2 bundle list
## List bundles in a repository
d2 bundle list --repo mkang-test-repo
d2 bundle list --repo ntd-road-source-dev
d2 bundle list --repo zenrin-estat-residential
d2 bundle list --repo resilience-japan-hazard-maps
d2 bundle list --repo Seattle-Sample-Corelogic-Run-Data
## List bundles in different contexts
d2 bundle list --repo ntd-road-source-staging --context staging
d2 bundle list --repo ntd-road-source-dev --context dev
Bundle Operations
## List files in a bundle
d2 bundle list files --repo mkang-test-repo --bundle 1fySBuavEhqWAXnYnZEiDCNm8TC
d2 bundle list files --repo mkang-test-repo --bundle 1fySBuavEhq 2>/dev/null | grep file
## Mount bundle
d2 bundle mount --repo mkang-test-repo --mount ~/mnt --daemonize
d2 bundle mount --repo mkang-test-repo --label 1fySBuavEhqWAXnYnZEiDCNm8TC --mount ~/mnt --daemonize
## Download bundle
d2 bundle download --repo mkang-test-repo --destination .