Skip to content

Data Transfer

Move records — and optionally files — between environments. Two commands serve different use-cases:

Command Scope Entities Files
grace data copy Flat entity copy with dependency resolution devices, locations, vendors, videos No
grace data transfer Video-centric deep copy with child entities videos + annotation_runs, segmentations, video_steps, devices, collectors, locations, vendors Optional GCS blob copy

grace data copy

Copy records from one environment to another — typically from production to development for testing and debugging.

Core semantics

grace data copy is additive and safe by design:

Behavior Description
Create only New records are created in the target environment
Skip existing Records that already exist (by UUID) are silently skipped
No updates Existing records in the target are never modified
No deletes Nothing is ever removed from the target
Dependency resolution Required parent records are discovered and copied automatically

UUIDs are preserved

Records keep their original UUIDs when copied. This means you can trace a record back to its production origin and re-running the same copy is idempotent.

Basic usage

grace data copy \
  --source prod \
  --target dev \
  --entity videos \
  --limit 50

This copies up to 50 videos from prod to dev, along with any devices, locations, and vendors they depend on.

Workflow

Every copy follows the same four-step flow:

1. Connect

The CLI authenticates against both the source and target environments. Both must have stored credentials (see Authentication).

2. Plan

The planner:

  1. Fetches matching records from the source
  2. Resolves dependency chains (videos depend on devices, locations, vendors)
  3. Checks which records already exist in the target
  4. Builds a summary of what needs to be created

3. Review

A summary table is printed:

        Copy plan: prod → dev
┏━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Entity   ┃ Selected ┃ Dependencies  ┃ Already in target ┃ To create ┃
┡━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ devices  │        0 │             5 │                 3 │         2 │
│ locations│        0 │             3 │                 3 │         0 │
│ vendors  │        0 │             2 │                 1 │         1 │
│ videos   │       50 │             0 │                12 │        38 │
└──────────┴──────────┴───────────────┴───────────────────┴───────────┘
  • Selected: Records matching your query
  • Dependencies: Parent records pulled in automatically
  • Already in target: Skipped (UUIDs already present)
  • To create: Records that will be written

4. Execute

After confirmation, records are created in dependency order with a progress bar:

Creating records... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 41/41

Dry run

Preview without writing anything:

grace data copy \
  --source prod --target dev \
  --entity videos --limit 20 \
  --dry-run

The plan table is shown, but no records are created. Use this to understand the scope of a copy before committing to it.

Filtering

Narrow down which source records to copy:

grace data copy \
  --source prod --target dev \
  --entity videos \
  --filter "collection_meta.data_source:eq:internal" \
  --filter "createdAt:gte:2025-06-01" \
  --limit 100

Filter syntax is key:operator:value. Dot notation works for JSONB fields. Use the in operator with bracket syntax to match multiple values: id:in:[uuid-1,uuid-2,uuid-3].

Transforms

Transforms modify records during copy — useful when certain fields don't make sense in the target environment.

--clear-storage-meta

Nulls out the storage_meta field on video records. Use this when copying videos whose cloud storage paths don't exist in the target environment:

grace data copy \
  --source prod --target dev \
  --entity videos --limit 50 \
  --clear-storage-meta

Without this flag, videos are copied with their original storage_meta intact.

When to use

Always use --clear-storage-meta when copying from production to a dev environment that doesn't share the same GCS bucket. This prevents downstream tools from attempting to access non-existent files.

Dependency resolution

When copying videos, the planner automatically resolves dependencies:

videos → devices
       → locations
       → vendors

Each video references a device, location, and vendor by UUID. The planner fetches these parent records from the source and creates any that are missing in the target — before creating the videos.

Devices, locations, and vendors have no dependencies themselves, so copying those entity types directly is always a flat operation.

Supported entities

Entity Dependencies Notes
devices None Flat copy
locations None Flat copy
vendors None Flat copy
videos devices, locations, vendors Auto-resolves parents

grace data transfer

Deep, video-centric transfer that copies a video plus all of its child entities (annotation runs, segmentations, video steps) and optionally the referenced GCS files.

When to use transfer vs copy

Scenario Command
Populate a dev environment with raw video records grace data copy
Replicate a complete video pipeline result (annotations, steps, files) grace data transfer
Debug a specific video's processing in a lower environment grace data transfer

Core semantics

grace data transfer shares the additive/idempotent guarantee of copy, with additional behaviors:

Behavior Description
Device resolution Devices are matched across environments by device_no (serial number), not UUID
Device ID remapping collection_meta.device_id on videos is rewritten to point at the target device UUID
Collector resolution Collectors are matched by name (e.g. collector-01#0000); missing ones are created as minimal Worker records
Child entity fetch annotation_runs, segmentations, and video_steps linked to selected videos are fetched and diffed
GCS file copy With --copy-files, referenced GCS blobs are copied to the target bucket
URI rewriting When copying files, gs:// URIs in storage_meta and result_ref/debug_ref are rewritten to point at the target bucket

Why device_no instead of UUID?

Device UUIDs differ between environments since devices are registered independently in each environment. The device_no serial number is the stable natural key that identifies the same physical device across environments.

Basic usage

grace data transfer \
  --source prod \
  --target dev \
  --limit 20

This transfers up to 20 videos from prod to dev with all their child entities. Devices are resolved by device_no, collectors by name.

Transfer with file copy

grace data transfer \
  --source prod \
  --target dev \
  --limit 10 \
  --copy-files

This also copies all GCS blobs referenced in storage_meta (videos) and result_ref/debug_ref (video steps), and rewrites the URIs to point at the derived target bucket.

GCS prerequisites

--copy-files requires:

  1. The gcs optional dependency: pip install 'grace-cli[gcs]'
  2. Application Default Credentials: gcloud auth application-default login

The CLI validates both before proceeding.

Bucket derivation

By default, the target GCS bucket is derived by replacing the source environment name in the bucket name:

Source bucket Source env Target env Derived target bucket
co-prod-data prod dev co-dev-data
co-prod-annotations prod dev co-dev-annotations

Override this with --target-gcs-bucket:

grace data transfer \
  --source prod --target dev \
  --limit 5 \
  --copy-files \
  --target-gcs-bucket my-custom-bucket

Workflow

1. Connect

Authenticate against both source and target environments.

2. Plan

The planner:

  1. Fetches matching videos from the source (respecting --filter and --limit)
  2. Extracts device_id, collector_id, location_id, vendor_id from each video's collection_meta
  3. Resolves devices by device_no — builds a remap table of source UUID → target UUID
  4. Resolves collectors by name — flags missing ones for creation
  5. Fetches and diffs locations and vendors (direct UUID match)
  6. Fetches child entities (annotation_runs, segmentations, video_steps) for all selected video IDs
  7. Diffs each entity type against the target

3. Review

      Transfer plan: prod → dev
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Entity                  ┃ Selected ┃ Already in target ┃ To create ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ devices (by device_no)  │        5 │                 3 │         2 │
│ collectors (by name)    │        - │                 - │         1 │
│ locations               │        3 │                 3 │         0 │
│ vendors                 │        2 │                 1 │         1 │
│ videos                  │       20 │                 4 │        16 │
│ annotation_runs         │       30 │                 0 │        30 │
│ segmentations           │       45 │                 0 │        45 │
│ video_steps             │      160 │                 0 │       160 │
└─────────────────────────┴──────────┴───────────────────┴───────────┘
  GCS blobs to copy: 87

4. Execute

After confirmation, the executor:

  1. Creates missing devices in the target
  2. Creates missing collectors as Worker records
  3. Copies GCS blobs (if --copy-files) with multi-threaded parallelism
  4. Creates locations, vendors, videos (with device ID remapping + URI rewriting)
  5. Creates annotation_runs, segmentations, video_steps in dependency order

Dry run

grace data transfer \
  --source prod --target dev \
  --limit 20 \
  --copy-files \
  --dry-run

Shows the full plan including GCS blob count, but writes nothing.

Verbose output

Add --verbose (or -v) to see every record and file that will be transferred:

grace data transfer \
  --source prod --target dev \
  --limit 5 \
  --copy-files \
  --dry-run --verbose

This prints per-entity detail tables showing each record's action (create or skip) along with key identifiers (video ID, step key, GCS URI, etc.).

Combine --verbose with --dry-run to inspect the full plan before executing, or use --verbose without --dry-run to see the details right before the confirmation prompt.

Filtering

grace data transfer \
  --source prod --target dev \
  --filter "collection_meta.data_source:eq:internal" \
  --filter "createdAt:gte:2025-06-01" \
  --limit 50

Use the in operator to select specific video IDs:

grace data transfer \
  --source prod --target dev \
  --filter "id:in:[uuid-1,uuid-2,uuid-3]"

Filters apply to the video query. Child entities are fetched for all matched videos — they cannot be filtered independently.

Existing records (idempotency)

By default, when a video already exists in the target environment, the video record itself is skipped. Its child entities (annotation_runs, segmentations, video_steps) are still diffed independently — if the video exists but has new child records in the source, those child records will be created.

Re-running the same transfer is always safe: everything that already exists is skipped.

Update mode (--update)

Add --update to compare existing video records field-by-field against the source and patch those that differ:

grace data transfer \
  --source prod --target dev \
  --limit 20 \
  --update

When enabled:

  • Video records that exist in the target are fetched and compared (ignoring server-managed timestamps like created_at, updated_at)
  • If fields differ, the video is queued for update via PATCH /videos/batch
  • The summary table gains a To update column
  • All other entity types (annotation_runs, segmentations, video_steps) remain create-or-skip regardless of this flag

Why only videos?

annotation_runs are immutable by design (re-running QC creates a new run). segmentations are always created fresh alongside annotation runs. video_steps use upsert semantics that maintain an audit trail via video_step_events — a raw PATCH would bypass that.

Transforms applied automatically

grace data transfer applies two transforms transparently:

  1. RemapDeviceIdTransform — rewrites collection_meta.device_id on video records using the device_no-based remap table.
  2. RewriteGcsUriTransform (when --copy-files) — rewrites all gs:// URIs in storage_meta.gcs (videos) and result_ref/debug_ref (video steps) to point at the target bucket.

Entity creation order

Records are created in strict dependency order to satisfy foreign key constraints:

devices → collectors → locations → vendors → videos → annotation_runs → segmentations → video_steps

Partial failure

If a batch fails mid-transfer, the CLI stops creating further records of that entity type and reports how many were created before the failure. Already-created records are not rolled back — re-running the same transfer will skip them.

GCS blob copy failures are non-fatal: the CLI warns and continues with record creation.


Option reference

grace data copy

Option Short Required Description
--source -s Yes Source environment name
--target -t Yes Target environment name
--entity Yes Entity type (devices, locations, vendors, videos)
--filter -f No Filter expression (key:op:value), repeatable
--limit -l No Max records to copy
--dry-run No Show plan without executing
--clear-storage-meta No Null out storage_meta on videos

grace data transfer

Option Short Required Description
--source -s Yes Source environment name
--target -t Yes Target environment name
--filter -f No Filter expression (key:op:value), repeatable
--limit -l No Max videos to transfer
--dry-run No Show plan without executing
--verbose -v No Show detailed per-record and per-file lists
--copy-files No Also copy GCS blobs to target bucket
--target-gcs-bucket No Override derived target bucket name
--update No Compare existing videos field-by-field and update those that differ

Scripting patterns

JSON output for automation

grace data copy \
  --source prod --target dev \
  --entity videos --limit 10 \
  --dry-run --output json

Non-interactive execution

Combine --dry-run for preview and then run without prompts:

# Preview first
grace data copy --source prod --target dev --entity videos --limit 50 --dry-run

# Execute (Ctrl+C if the plan looks wrong)
echo "y" | grace data copy --source prod --target dev --entity videos --limit 50

Transfer a specific video by ID

grace data transfer \
  --source prod --target dev \
  --filter "id:eq:a1b2c3d4-5678-90ab-cdef-1234567890ab" \
  --copy-files

Transfer videos from a CSV file

Given a CSV with a video_id column, extract the IDs and pass them via the in filter:

# Extract top 50 IDs from CSV (skip header), join with commas
IDS=$(tail -n +2 videos.csv | head -50 | paste -sd',' -)

grace data transfer \
  --source prod --target dev \
  --filter "id:in:[$IDS]" \
  --copy-files