DataBridge Core
The autonomous ingest engine running on every datacenter server. Detects labeled devices, copies to NVMe buffer, generates SHA-256 manifests, uploads to cloud, verifies integrity, and reports status. All without human intervention.
Five lanes running at once
Each server runs five parallel ingest lanes. While Lane 1 uploads a 2 TB HDD over Direct Line, Lanes 2 and 3 copy SD cards to buffer, Lane 4 runs post-upload verification, and Lane 5 waits for the next device. Every USB port stays productive. Throughput is bounded by your network, not your software.
Buffer-first architecture
Data is never uploaded directly from the source device. It is first copied from the plugged-in HDD or SD card to a local NVMe buffer drive. This serves two purposes: it protects against device removal during upload, and it frees the USB port so the operator can unplug the device and plug in the next one immediately.
The buffer drive is a high-endurance NVMe SSD sized to hold multiple concurrent copies. On a typical server with a 3.84 TB buffer, Core can hold two full 2 TB HDDs or dozens of SD cards simultaneously while uploads proceed in the background.
SHA-256 manifests for every byte
During the buffer copy, Core computes a SHA-256 hash for every file and writes a manifest alongside the data. This manifest is the single source of truth for integrity verification — it travels with the data through upload, post-upload check, and long-term audit. If a hash does not match at any stage, you know exactly which file and exactly when it diverged.
Upload to any cloud
The StorageBackend abstraction handles the differences between cloud providers. It manages provider-specific authentication, multipart upload chunking, retry logic on transient failures, and ETag extraction for post-upload verification. Switch providers by changing a config line — Core does not care where data lands.
AWS S3
Default for most deployments
Google Cloud Storage
GCS JSON API with resumable uploads
Cloudflare R2
S3-compatible, zero egress fees
Azure Blob Storage
Block blob with SAS token auth
Backblaze B2
S3-compatible, cost-optimized archival
Wasabi
S3-compatible, no egress or API fees
MinIO
Self-hosted S3-compatible object storage
Any S3-compatible endpoint
Post-upload verification
After every upload completes, Core retrieves the S3 multipart ETag for each object and compares it against the local manifest. This is not a spot check — it is every file, every byte, every time. A single mismatch triggers automatic re-upload of the affected file with fresh hashing.
The verification step is what makes DataBridge suitable for mission-critical datasets. When your data trains a foundation model or serves as evidence in a legal proceeding, you need proof that what arrived in cloud is bit-for-bit identical to what left the source device. The manifest and verification log provide that proof.
USB flap detection
Operators unplug devices at inconvenient times. Cables come loose. USB hubs reset under load. Core handles all of it gracefully. A flap filter with a configurable grace window distinguishes between a momentary electrical glitch and a genuine removal. A kernel log scraper watches dmesg for USB disconnect events in real-time. When a real removal is confirmed, Core pauses the affected lane, marks the partial copy, pushes an alert to the backend, and waits for the device to reappear — or for the operator to acknowledge.
Flap filter
500ms grace window. If the device reappears within the window, Core resumes without interruption. No false alarms from USB hub resets.
Kernel log scraper
Watches dmesg output continuously. Catches disconnect events that udev misses. Correlates kernel timestamps with lane state.
Alert push
On confirmed removal, pushes a structured alert to the backend API. Watch picks it up. Ops shows it on the local console. No data corruption, no silent failures.
Structured JSON logging
Every action Core takes is logged as a structured JSON line. Severity levels, ISO timestamps, lane IDs, device labels, file paths, byte counts, durations. Machine-parseable by design. DataBridge Watch aggregates these logs from every server in the fleet and makes them searchable, filterable, and alertable.
No grepping through syslog. No parsing human-readable sentences. Every log entry is a structured record that your monitoring stack can ingest directly.
Auto-recovery
Network interruptions, cloud API rate limits, transient 503s, buffer drive I/O errors — Core retries with exponential backoff and jitter. Each lane maintains its own state machine. A failure in Lane 3 does not affect Lanes 1, 2, 4, or 5. After a server restart, Core reads persisted lane state from disk and picks up where it left off. No manual intervention required.
Exponential backoff with jitter
First retry after 1s, then 2s, 4s, 8s — up to a 5-minute ceiling. Random jitter prevents thundering herd when the cloud endpoint recovers.
Persisted lane state
Lane progress, manifest hashes, upload cursor, and partial multipart upload IDs are written to disk after every significant state change. Survives power loss.
Lane isolation
Each lane is an independent state machine. A crash or stall in one lane is contained. The other four continue without interruption.
Multipart resume
Large file uploads use S3 multipart. If a connection drops mid-upload, Core resumes from the last completed part — not from byte zero.
Port 9125 status daemon
Core exposes a lightweight HTTP endpoint on port 9125 that serves real-time lane status, buffer utilization, upload throughput, and device inventory. DataBridge Ops reads this endpoint every few seconds to give the on-site operator a live view of what is happening on their server.
The daemon is read-only and local-only by default. No authentication overhead for the LAN, no attack surface from the internet. Just a clean JSON response that any HTTP client can consume.
{
"server": "NMBL2-07",
"uptime_s": 1231560,
"buffer_free_gb": 2148,
"buffer_total_gb": 3840,
"lanes": [
{"id": 1, "state": "uploading",
"device": "HDD-0417-B",
"progress": 0.915},
{"id": 2, "state": "buffering",
"device": "SD-1092-A",
"progress": 0.345},
...
]
}Where Core fits in the platform
Core is the engine. It does not label devices — Tag does that upstream. It does not show operators what is happening — Ops does that via port 9125. It does not aggregate fleet-wide metrics — Watch does that by collecting Core's structured logs. It does not manage the network — Direct Line provides the dedicated private path. Each module has one job and does it well.
A device from plug to verified
This is what a single lane does from the moment a labeled device is plugged in to the moment the data is verified in cloud. No operator interaction required after the initial plug.
See it running on your data
Tell us about your ingest volumes and cloud targets. We will show you how Core handles it.