Name: KŌJŌ Stack
Author: KŌJŌ Stack

The Anti-Pattern

The common approach: connect industrial systems to a data lake. Dump everything. Structure it later with ETL pipelines and data engineering.

This fails for industrial data because:

Raw telemetry lacks the context needed to interpret it correctly
Protocol-specific metadata is lost in generic serialization
Temporal relationships between data points are corrupted by inconsistent collection timing
Reconstructing meaning from raw values requires knowledge that exists only at the source

What Structure Means

Structured industrial data carries:

Identity: tag_id mapped to an ISA-95 hierarchy position
Timestamp: acquired at the source with consistent precision
Value: normalized to the correct type and unit
Quality: indicator of data reliability from the source protocol
Provenance: which device, protocol, and pipeline produced this data point

This is not metadata added later. This is context that must be present at the moment of acquisition.

Structure at the Source

When data is structured at the source - by the first-mile data plane - every downstream system receives consistent, interpretable data. Data engineers do not need to reverse-engineer meaning. ML pipelines receive clean feature inputs. Dashboards display correct values without per-source transformation logic.

The cost of structuring at the source is paid once. The cost of not structuring is paid by every consumer, indefinitely.

Where Structured Data Lands

When the first-mile data plane structures data before it leaves the edge, the destination format becomes a delivery choice - not a reconstruction project. KŌJŌ Stack delivers structured data to:

Amazon S3 and S3 Tables - JSONL, CSV, or Apache Parquet with Iceberg table format for time-travel and schema evolution
Google Cloud Storage - the same formats with BigQuery external-table compatibility, enabling serverless analytics without ETL
Apache Parquet - a shared encoder produces identical columnar files regardless of cloud destination, decoupling data structure from cloud vendor

The key insight: if data is structured at the source, the lakehouse destination is a routing decision. If data is not structured at the source, the lakehouse becomes an expensive normalization pipeline.

Structure Data Before It Reaches the Lake

The Anti-Pattern

What Structure Means

Structure at the Source

Where Structured Data Lands

Explore how these concepts apply to your environment