The Anti-Pattern
The common approach: connect industrial systems to a data lake. Dump everything. Structure it later with ETL pipelines and data engineering.
This fails for industrial data because:
- Raw telemetry lacks the context needed to interpret it correctly
- Protocol-specific metadata is lost in generic serialization
- Temporal relationships between data points are corrupted by inconsistent collection timing
- Reconstructing meaning from raw values requires knowledge that exists only at the source
What Structure Means
Structured industrial data carries:
- Identity: tag_id mapped to an ISA-95 hierarchy position
- Timestamp: acquired at the source with consistent precision
- Value: normalized to the correct type and unit
- Quality: indicator of data reliability from the source protocol
- Provenance: which device, protocol, and pipeline produced this data point
This is not metadata added later. This is context that must be present at the moment of acquisition.
Structure at the Source
When data is structured at the source - by the first-mile data plane - every downstream system receives consistent, interpretable data. Data engineers do not need to reverse-engineer meaning. ML pipelines receive clean feature inputs. Dashboards display correct values without per-source transformation logic.
The cost of structuring at the source is paid once. The cost of not structuring is paid by every consumer, indefinitely.
Where Structured Data Lands
When the first-mile data plane structures data before it leaves the edge, the destination format becomes a delivery choice - not a reconstruction project. KŌJŌ Stack delivers structured data to:
- Amazon S3 and S3 Tables - JSONL, CSV, or Apache Parquet with Iceberg table format for time-travel and schema evolution
- Google Cloud Storage - the same formats with BigQuery external-table compatibility, enabling serverless analytics without ETL
- Apache Parquet - a shared encoder produces identical columnar files regardless of cloud destination, decoupling data structure from cloud vendor
The key insight: if data is structured at the source, the lakehouse destination is a routing decision. If data is not structured at the source, the lakehouse becomes an expensive normalization pipeline.