Quality Assurance
Phases 07–09: Every file verified, every server monitored, every status change logged. Integrity and visibility from upload to archive.
Integrity Verification
During the buffer copy phase, DataBridge Core generates a SHA-256 hash for every file on the source device and writes a manifest file alongside the buffered data. This manifest is the ground truth — it captures the exact byte-level state of every file at the moment it was read from the source device. After the cloud upload completes via S3 multipart upload, Core compares the S3 object ETags against the local manifest to verify that every byte arrived correctly.
This verification is not optional, not configurable, and not skippable. It runs on every upload, every file, every time. If a mismatch is detected — whether from a network corruption event, a storage media read error, or an S3 eventual consistency issue — the affected files are automatically re-uploaded and re-verified. The verification result (pass or fail, file count, total bytes, per-file hash comparison, timestamp) is posted to the backend API and permanently logged in the audit trail. For teams training foundation models on egocentric or multimodal data, a single corrupted file can poison an entire training run. DataBridge treats data integrity as a first-class requirement, not an afterthought.
The manifest itself is also uploaded to S3 alongside the data, stored at a predictable path (e.g., s3://bucket/HA-B047-SD-0193/.ha_manifest.json). This means the integrity proof travels with the data — even if the backend database were lost, the manifest in S3 provides a complete record of what was uploaded and its expected hash values. Chain of custody from source device to cloud archive is unbroken and independently verifiable.
Monitoring & Alerting
DataBridge Watch provides 24/7 fleet-wide visibility from a single web dashboard. Every ingest server reports its status — active lanes, throughput per lane, buffer drive utilization, queue depth, network health — via structured JSON logs aggregated in real time over WebSocket connections. The dashboard shows a server grid where green means healthy, yellow means warning (e.g., disk usage above 85%), and red means critical (server unreachable, upload failures, integrity check failures).
Automated alerts fire when predefined conditions are met: an upload stalls for more than 4 hours, a server goes offline (no heartbeat for 5 minutes), disk utilization exceeds 90%, an integrity check fails, or an error rate spikes above threshold. Alerts are classified by severity — critical, warning, info — and route to the appropriate channel: critical alerts page the on-call engineer via PagerDuty or Slack, warnings create tickets, info events are logged for trend analysis. For managed and turnkey customers, our operations team receives these alerts and handles incident response directly.
The audit trail logs every status change for every device that passes through the pipeline: labeled, queued, copying, uploading, verifying, completed, failed, returned. Each log entry records the actor (which operator or system process), the timestamp, the old and new status, and any associated metadata. This creates a complete, immutable history of every device's journey — from the moment it was labeled in the field to its final verified state in cloud storage.
Ongoing Operations
A DataBridge deployment is not a one-time installation — it is an ongoing operation. For managed and turnkey customers, our engineering team provides continuous operational support: software updates deployed across the fleet with zero downtime (rolling updates, one server at a time), capacity planning as your data volumes grow (adding servers, upgrading network circuits, expanding cloud storage), monthly operational reviews covering throughput trends, error rates, and optimization opportunities.
We monitor cloud costs and recommend optimizations — S3 Intelligent-Tiering for data accessed infrequently after ingest, lifecycle policies to transition verified archives to Glacier after 90 days, right-sizing Direct Connect circuits based on actual utilization patterns. For teams that started with a Managed tier and later want to bring operations in-house, we provide a structured handover including documentation, runbooks, and training sessions for your operations staff. The software continues to work identically whether operated by our team or yours.
Data integrity, by default
Every file SHA-256 verified. Every transfer audited. Join the waitlist for early access.
Join the waitlist