The Multi-Cloud Reality
Industrial organizations rarely operate within a single cloud provider. Acquisitions bring AWS accounts. Regional operations may run on GCP. On-premises historians coexist with cloud data lakes. The question is not whether data will need to reach multiple clouds - it is how.
Most industrial data platforms are tightly coupled to a single cloud. Data must be exported, transformed, and re-ingested to move between providers. This creates vendor lock-in at the data layer - the most expensive place to be locked in.
Open Formats as the Common Denominator
The solution is to structure data in open, vendor-neutral formats at the point of origin - before it reaches any cloud:
- Apache Parquet - columnar format optimized for analytical queries, supported by every major data platform
- Apache Iceberg - table format that adds schema evolution, time travel, and partition evolution on top of Parquet files
- JSONL and CSV - for systems that need human-readable or streaming-compatible formats
When data leaves the edge in these formats, the destination is a delivery choice - not an architectural dependency.
Parquet Everywhere
KŌJŌ Stack uses a shared Apache Parquet encoder across all cloud storage destinations. The same structured, columnar files are produced whether data is delivered to Amazon S3, AWS S3 Tables (Iceberg), or Google Cloud Storage.
This means:
- A query in Amazon Athena and a query in BigQuery operate on identically structured data
- Schema evolution at the edge propagates consistently to all destinations
- Switching or adding a cloud destination does not require re-engineering the data pipeline
The encoder handles ISA-95 namespace mapping, data type preservation, and metadata embedding - once, at the edge.
Cloud Destinations
KŌJŌ Stack delivers structured industrial data to multiple lakehouse destinations:
- Amazon S3 - batch delivery in JSONL, CSV, or Parquet with configurable partitioning
- AWS S3 Tables - Apache Iceberg table format with Parquet files, enabling time travel and schema evolution via the S3 Tables API
- Google Cloud Storage - the same formats with BigQuery external-table compatibility, enabling serverless analytics without data movement
- Apache Kafka - for event streaming architectures that feed real-time lakehouse ingestion
Each destination has independent buffering and delivery guarantees. A network outage affecting one destination does not impact delivery to others.
What Changes
When the data plane structures data at the edge and delivers it in open formats:
- Industrial data is decoupled from any single cloud provider
- Multi-cloud analytics operate on identical, consistent datasets
- Adding a new cloud destination is a configuration change - not a data engineering project
- Compliance and data sovereignty requirements are met by controlling where structured data is delivered
The lakehouse is not a product - it is an architectural pattern. The data plane determines what enters the lakehouse and in what form. Control the first mile, and the lakehouse serves every downstream consumer - regardless of which cloud it runs on.