Server engine · Python · Linux

DataBridge Core

The autonomous ingest engine running on every datacenter server. Detects labeled devices, copies to NVMe buffer, generates SHA-256 manifests, uploads to cloud, verifies integrity, and reports status. All without human intervention.

Five lanes running at once

Each server runs five parallel ingest lanes. While Lane 1 uploads a 2 TB HDD over Direct Line, Lanes 2 and 3 copy SD cards to buffer, Lane 4 runs post-upload verification, and Lane 5 waits for the next device. Every USB port stays productive. Throughput is bounded by your network, not your software.

$ databridge-core --status
Lane 1UPLOADINGHDD-0417-B · 1.84 TB / 2.01 TB · 91.5% · 847 MB/s
Lane 2BUFFERINGSD-1092-A · 44.2 GB / 128 GB · 34.5% · 290 MB/s
Lane 3BUFFERINGSD-1093-A · 112 GB / 256 GB · 43.8% · 305 MB/s
Lane 4VERIFYINGHDD-0416-C · 4,218 / 4,218 files · 100% · all ETags match
Lane 5IDLEWaiting for next device...
Server NMBL2-07 · uptime 14d 6h · buffer 2.1 TB free / 3.8 TB · next heartbeat in 8s

Buffer-first architecture

Data is never uploaded directly from the source device. It is first copied from the plugged-in HDD or SD card to a local NVMe buffer drive. This serves two purposes: it protects against device removal during upload, and it frees the USB port so the operator can unplug the device and plug in the next one immediately.

The buffer drive is a high-endurance NVMe SSD sized to hold multiple concurrent copies. On a typical server with a 3.84 TB buffer, Core can hold two full 2 TB HDDs or dozens of SD cards simultaneously while uploads proceed in the background.

Copy flow
USB Device
NVMe Buffer
Cloud
1.Device detected, label matched against backend
2.Full copy to NVMe buffer with SHA-256 hashing
3.USB device released — operator can unplug
4.Upload from buffer to cloud storage
5.Post-upload verification, buffer freed

SHA-256 manifests for every byte

During the buffer copy, Core computes a SHA-256 hash for every file and writes a manifest alongside the data. This manifest is the single source of truth for integrity verification — it travels with the data through upload, post-upload check, and long-term audit. If a hash does not match at any stage, you know exactly which file and exactly when it diverged.

manifest.sha256 — generated during buffer copy
a7f3c2d1e8b4...9f0a  footage/cam_01/20260411_143022.mp4
b2e8f1a9c3d7...4e2b  footage/cam_01/20260411_143512.mp4
c9d4a6f2e1b8...7c3d  footage/cam_02/20260411_150201.mp4
d1f7b3c8a2e9...5d4e  metadata/sensor_log.json
e4a2d9f1c6b3...8f5a  metadata/gps_track.gpx
5 files · 847.3 GB total · SHA-256 · generated 2026-04-11T14:52:18Z

Upload to any cloud

The StorageBackend abstraction handles the differences between cloud providers. It manages provider-specific authentication, multipart upload chunking, retry logic on transient failures, and ETag extraction for post-upload verification. Switch providers by changing a config line — Core does not care where data lands.

AWS S3

Default for most deployments

Google Cloud Storage

GCS JSON API with resumable uploads

Cloudflare R2

S3-compatible, zero egress fees

Azure Blob Storage

Block blob with SAS token auth

Backblaze B2

S3-compatible, cost-optimized archival

Wasabi

S3-compatible, no egress or API fees

MinIO

Self-hosted S3-compatible object storage

Any S3-compatible endpoint

Post-upload verification

After every upload completes, Core retrieves the S3 multipart ETag for each object and compares it against the local manifest. This is not a spot check — it is every file, every byte, every time. A single mismatch triggers automatic re-upload of the affected file with fresh hashing.

The verification step is what makes DataBridge suitable for mission-critical datasets. When your data trains a foundation model or serves as evidence in a legal proceeding, you need proof that what arrived in cloud is bit-for-bit identical to what left the source device. The manifest and verification log provide that proof.

Verification output — Lane 4
PASS footage/cam_01/20260411_143022.mp4
PASS footage/cam_01/20260411_143512.mp4
PASS footage/cam_02/20260411_150201.mp4
PASS metadata/sensor_log.json
PASS metadata/gps_track.gpx
5/5 verified · 0 mismatches · batch HDD-0416-C complete

USB flap detection

Operators unplug devices at inconvenient times. Cables come loose. USB hubs reset under load. Core handles all of it gracefully. A flap filter with a configurable grace window distinguishes between a momentary electrical glitch and a genuine removal. A kernel log scraper watches dmesg for USB disconnect events in real-time. When a real removal is confirmed, Core pauses the affected lane, marks the partial copy, pushes an alert to the backend, and waits for the device to reappear — or for the operator to acknowledge.

Flap filter

500ms grace window. If the device reappears within the window, Core resumes without interruption. No false alarms from USB hub resets.

Kernel log scraper

Watches dmesg output continuously. Catches disconnect events that udev misses. Correlates kernel timestamps with lane state.

Alert push

On confirmed removal, pushes a structured alert to the backend API. Watch picks it up. Ops shows it on the local console. No data corruption, no silent failures.

Structured JSON logging

Every action Core takes is logged as a structured JSON line. Severity levels, ISO timestamps, lane IDs, device labels, file paths, byte counts, durations. Machine-parseable by design. DataBridge Watch aggregates these logs from every server in the fleet and makes them searchable, filterable, and alertable.

No grepping through syslog. No parsing human-readable sentences. Every log entry is a structured record that your monitoring stack can ingest directly.

stdout — JSON lines
{"ts":"2026-04-11T14:52:18Z", "level":"INFO", "lane":2, "event":"buffer_start", "device":"SD-1092-A", "size_bytes":137438953472}
{"ts":"2026-04-11T14:52:19Z", "level":"INFO", "lane":1, "event":"upload_progress", "device":"HDD-0417-B", "pct":91.5, "rate_mbps":847}
{"ts":"2026-04-11T14:52:20Z", "level":"WARN", "lane":3, "event":"usb_flap", "device":"SD-1093-A", "grace_ms":500}

Auto-recovery

Network interruptions, cloud API rate limits, transient 503s, buffer drive I/O errors — Core retries with exponential backoff and jitter. Each lane maintains its own state machine. A failure in Lane 3 does not affect Lanes 1, 2, 4, or 5. After a server restart, Core reads persisted lane state from disk and picks up where it left off. No manual intervention required.

Exponential backoff with jitter

First retry after 1s, then 2s, 4s, 8s — up to a 5-minute ceiling. Random jitter prevents thundering herd when the cloud endpoint recovers.

Persisted lane state

Lane progress, manifest hashes, upload cursor, and partial multipart upload IDs are written to disk after every significant state change. Survives power loss.

Lane isolation

Each lane is an independent state machine. A crash or stall in one lane is contained. The other four continue without interruption.

Multipart resume

Large file uploads use S3 multipart. If a connection drops mid-upload, Core resumes from the last completed part — not from byte zero.

Port 9125 status daemon

Core exposes a lightweight HTTP endpoint on port 9125 that serves real-time lane status, buffer utilization, upload throughput, and device inventory. DataBridge Ops reads this endpoint every few seconds to give the on-site operator a live view of what is happening on their server.

The daemon is read-only and local-only by default. No authentication overhead for the LAN, no attack surface from the internet. Just a clean JSON response that any HTTP client can consume.

GET http://localhost:9125/status
{
  "server": "NMBL2-07",
  "uptime_s": 1231560,
  "buffer_free_gb": 2148,
  "buffer_total_gb": 3840,
  "lanes": [
    {"id": 1, "state": "uploading",
     "device": "HDD-0417-B",
     "progress": 0.915},
    {"id": 2, "state": "buffering",
     "device": "SD-1092-A",
     "progress": 0.345},
    ...
  ]
}

Where Core fits in the platform

Core is the engine. It does not label devices — Tag does that upstream. It does not show operators what is happening — Ops does that via port 9125. It does not aggregate fleet-wide metrics — Watch does that by collecting Core's structured logs. It does not manage the network — Direct Line provides the dedicated private path. Each module has one job and does it well.

A device from plug to verified

This is what a single lane does from the moment a labeled device is plugged in to the moment the data is verified in cloud. No operator interaction required after the initial plug.

Lane 2 lifecycle — SD-1092-A
14:52:17 DETECT USB device connected · /dev/sdc · label SD-1092-A matched
14:52:18 BUFFER Copy started · 128 GB · 4,218 files · target /buffer/lane-2/
14:59:43 BUFFER Copy complete · 128 GB in 7m 25s · 290 MB/s avg · manifest written
14:59:44 RELEASE USB device released · operator may unplug
14:59:45 UPLOAD Upload started · s3://vseek-ingest/SD-1092-A/ · 4,218 files
15:02:11 UPLOAD Upload complete · 128 GB in 2m 26s · 878 MB/s avg
15:02:12 VERIFY ETag verification started · 4,218 objects
15:02:38 PASS 4,218 / 4,218 verified · 0 mismatches · buffer freed · lane idle
Total wall time: 10m 21s · device held for 7m 26s · upload + verify: 2m 53s after unplug

See it running on your data

Tell us about your ingest volumes and cloud targets. We will show you how Core handles it.