Name: KŌJŌ Stack
Author: KŌJŌ Stack

The Multi-Cloud Reality

Industrial organizations rarely operate within a single cloud provider. Acquisitions bring AWS accounts. Regional operations may run on GCP. On-premises historians coexist with cloud data lakes. The question is not whether data will need to reach multiple clouds - it is how.

Most industrial data platforms are tightly coupled to a single cloud. Data must be exported, transformed, and re-ingested to move between providers. This creates vendor lock-in at the data layer - the most expensive place to be locked in.

Open Formats as the Common Denominator

The solution is to structure data in open, vendor-neutral formats at the point of origin - before it reaches any cloud:

Apache Parquet - columnar format optimized for analytical queries, supported by every major data platform
Apache Iceberg - table format that adds schema evolution, time travel, and partition evolution on top of Parquet files
JSONL and CSV - for systems that need human-readable or streaming-compatible formats

When data leaves the edge in these formats, the destination is a delivery choice - not an architectural dependency.

Parquet Everywhere

KŌJŌ Stack uses a shared Apache Parquet encoder across all cloud storage destinations. The same structured, columnar files are produced whether data is delivered to Amazon S3, AWS S3 Tables (Iceberg), or Google Cloud Storage.

This means:

A query in Amazon Athena and a query in BigQuery operate on identically structured data
Schema evolution at the edge propagates consistently to all destinations
Switching or adding a cloud destination does not require re-engineering the data pipeline

The encoder handles ISA-95 namespace mapping, data type preservation, and metadata embedding - once, at the edge.

Cloud Destinations

KŌJŌ Stack delivers structured industrial data to multiple lakehouse destinations:

Amazon S3 - batch delivery in JSONL, CSV, or Parquet with configurable partitioning
AWS S3 Tables - Apache Iceberg table format with Parquet files, enabling time travel and schema evolution via the S3 Tables API
Google Cloud Storage - the same formats with BigQuery external-table compatibility, enabling serverless analytics without data movement
Apache Kafka - for event streaming architectures that feed real-time lakehouse ingestion

Each destination has independent buffering and delivery guarantees. A network outage affecting one destination does not impact delivery to others.

What Changes

When the data plane structures data at the edge and delivers it in open formats:

Industrial data is decoupled from any single cloud provider
Multi-cloud analytics operate on identical, consistent datasets
Adding a new cloud destination is a configuration change - not a data engineering project
Compliance and data sovereignty requirements are met by controlling where structured data is delivered

The lakehouse is not a product - it is an architectural pattern. The data plane determines what enters the lakehouse and in what form. Control the first mile, and the lakehouse serves every downstream consumer - regardless of which cloud it runs on.

S3, GCS, and Iceberg: Delivering Industrial Data to the Lakehouse

The Multi-Cloud Reality

Open Formats as the Common Denominator

Parquet Everywhere

Cloud Destinations

What Changes

Explore how these concepts apply to your environment