The Pattern
The failure mode is consistent: an organization connects PLCs, sensors, and SCADA systems to a cloud platform or data lake. Data starts flowing. Within weeks, problems surface - missing timestamps, inconsistent units, duplicate records, unexplainable gaps. The data is there, but it is not usable.
The instinct is to fix this downstream - add ETL pipelines, build reconciliation logic, normalize in the lake. This rarely works. The data was broken before it left the plant.
Where Failure Starts
Industrial data pipelines fail at the source for predictable reasons:
Protocol fragmentation. OPC UA, Modbus, S7, BACnet, and MQTT each have different data representations, timing models, and addressing schemes. Connecting them through a generic gateway strips context that cannot be reconstructed later.
Absent structure. Raw telemetry carries no operational context. A value of 72.3 means nothing without knowing the asset, location, unit of measure, and quality indicator. Adding this context after the fact requires manual mapping that does not scale.
Inconsistent timing. Polling-based collection introduces jitter. Network-dependent delivery creates gaps. Without deterministic acquisition at the source, temporal relationships between data points are unreliable.
The First-Mile Gap
Between OT systems and every downstream consumer, there is a gap where data must be acquired, structured, and made reliable. Most architectures ignore this gap or delegate it to middleware that lacks the protocol depth and deterministic behavior required.
This is the first-mile problem. It cannot be solved from the cloud. It must be solved at the edge, at the point of origin.
What Control Looks Like
A properly structured first mile means:
- Every data point carries timestamp, quality, and source provenance from the moment of acquisition
- Protocol-specific metadata is preserved, not discarded
- ISA-95 context is applied at ingestion, not reconstructed downstream
- Delivery to every destination is buffered and guaranteed
When the first mile is structured, downstream systems receive reliable, well-prepared data. When it is not, every downstream system inherits the same problems.