Tacnode
Back to Blog
Data Engineering

Enterprise Integration Patterns for Streaming and AI Architectures [2026]

Publish-subscribe, content-based routing, CDC, event sourcing — the patterns haven’t changed, but the architectures have. How each enterprise integration pattern applies in modern streaming, event-driven, and AI agent systems.

Boyd Stowe
Boyd Stowe
Solutions Engineering
16 min read
Message flow diagram showing enterprise integration patterns including routing, filtering, enrichment, and pub-sub distribution to AI agents and ML models

TL;DR: Enterprise integration patterns — codified by Hohpe & Woolf in 2003 — are proven solutions for connecting systems and routing data between them. The core messaging patterns (publish-subscribe, content-based routing, pipes-and-filters, aggregator, scatter-gather) evolved from ESBs to event streaming platforms like Kafka. Data integration patterns (CDC, event sourcing, materialized views) determine freshness: batch ETL gives hours, CDC gives seconds, streaming materialized views give sub-second. AI agents are the new integration consumers — every pattern applies directly to agent architecture design.

Every modern enterprise runs on integration. Dozens of disparate systems — backend databases, web services, external APIs, SaaS platforms, data warehouses, ML models — need to exchange data continuously, and the business processes built on top of them depend on that data arriving fresh and correct. When integration works, data flows from where it’s produced to where it’s needed without friction. When it breaks, decisions stall, pipelines fail, and teams waste weeks debugging data inconsistencies between disparate systems that can’t agree on the current state of reality.

The architectural patterns for solving these problems were codified over twenty years ago. In 2003, Gregor Hohpe and Bobby Woolf published Enterprise Integration Patterns, cataloging 65 patterns for connecting disparate systems through messaging. The book became the standard reference for integration architecture — and its patterns are more relevant now than ever.

Why? Because the systems doing the integrating have changed. The consumers of integrated data now include AI agents that make autonomous decisions, streaming pipelines that process millions of events per second, and ML models that need fresh features at inference time. The patterns endure. The implementations have evolved from centralized middleware to event streaming platforms, from batch file transfers to log-based CDC.

This guide covers what are enterprise integration patterns, how the core patterns work in modern data architectures, and how to choose the right integration design pattern for the business needs and latency requirements of your specific use case.

What Are Enterprise Integration Patterns?

Enterprise integration patterns are proven solutions to common problems in connecting software systems, providing a shared vocabulary for architects and engineers to design how disparate systems exchange data. Originally codified by Hohpe and Woolf across four integration styles (file transfer, shared database, RPC, messaging), these patterns — publish-subscribe, content-based routing, pipes-and-filters, change data capture, event sourcing — now power modern streaming architectures, event-driven systems, and AI agent coordination.

Enterprise integration patterns provide proven solutions to common challenges in connecting external systems and internal services. The patterns provide a shared vocabulary — a common language that architects, engineers, and data teams can use to design, discuss, and document how systems interact and exchange data.

Hohpe and Woolf identified four fundamental integration styles, each representing a different approach to connecting two systems or more:

File Transfer — systems exchange data by writing and reading files. One system produces a file (CSV, JSON, XML), another system picks it up. Simple but slow, with no real-time capability and significant freshness problems. Still common in legacy batch ETL pipelines.

Shared Database — multiple systems read from and write to the same database. Tight coupling, schema contention, and scaling bottlenecks make this approach fragile at scale. But it’s tempting because it’s simple — until it isn’t.

Remote Procedure Call (RPC) — one system calls another directly, synchronous request-response. REST APIs, gRPC, and GraphQL are modern forms of RPC exposed through an application programming interface. This style works well for point-to-point interactions between two systems but creates tight coupling between caller and callee. If the downstream service is slow or unavailable, the upstream system blocks or fails.

Messaging — systems communicate by sending messages through an intermediary (a message broker or event streaming platform). The producer doesn’t need to know who consumes the message or when. This decoupling is the foundation of scalable, resilient integration.

Messaging won. Not because the other styles disappeared — file transfer and RPC are still everywhere — but because messaging is the only style that scales to hundreds of producers and consumers while maintaining loose coupling and supporting real-time data flow. Modern event streaming platforms like Apache Kafka are the direct evolution of the messaging patterns Hohpe and Woolf described.

The enterprise integration patterns built on top of messaging — routing, transformation, aggregation, splitting — form the toolkit for designing data flows between any set of systems. Understanding these integration design patterns is essential for anyone building data pipelines, event-driven architectures, or real-time AI systems.

Core Messaging Patterns

The messaging patterns are the foundation layer. Every other enterprise integration pattern builds on these four primitives:

The shift from classic to modern implementations isn’t just a technology swap. It reflects a fundamental change in how integration messaging works. Classic message brokers (ActiveMQ, RabbitMQ, JMS) were designed for transactional messaging — guaranteed delivery of individual messages between known senders and receivers. Modern event streaming platforms are designed for continuous data flow — high-throughput, ordered, replayable streams of events consumed by an unknown and evolving set of consumers.

This distinction matters because it changes which enterprise integration patterns are most important. In a transactional messaging world, the critical patterns are guaranteed delivery, message acknowledgment, and dead letter queues. In a streaming world, the critical patterns are partitioning, consumer groups, exactly-once semantics, and stream processing topologies.

PatternWhat It DoesClassic ImplementationModern Implementation
Message ChannelA named conduit that carries messages from sender to receiverJMS queue or topicKafka topic, Pulsar topic, Kinesis stream
Message RouterReads a message and decides which channel to forward it to based on content or rulesESB routing engineStream processor with conditional logic, Kafka Streams branching
Message TranslatorConverts a message from one format to another so different systems can communicateXSLT transformation in ESBSchema registry with Avro/Protobuf evolution, stream processing transformations
Message EndpointThe connection point where an application sends or receives messagesJMS producer/consumerKafka producer/consumer client, CDC connector, webhook receiver

Asynchronous Messaging Patterns

Asynchronous messaging replaced synchronous RPC as the default for connecting backend systems at scale. The reason is structural: a synchronous call blocks the caller until the callee responds, creating coupling between distributed systems that should be independent. When the downstream service is slow, unavailable, or simply busy, the upstream system blocks or fails. Asynchronous messaging architectures remove that coupling — the producer hands a message to a broker and continues; the consumer reads when it’s ready.

Three asynchronous messaging patterns dominate modern integration architecture:

Message Queue. A producer writes a message to a named queue; a single consumer reads it. Messages are processed once, in order within a partition, and removed from the queue when acknowledged. This pattern fits work distribution — order processing, payment settlement, background jobs — where each message represents a unit of work that should be handled exactly once. Modern messaging solutions like Amazon SQS, RabbitMQ queues, and Kafka with consumer groups all implement this pattern.

Publish-Subscribe. One producer writes; multiple consumers each read a copy. This is the foundational pattern for event-driven distributed applications. A single order event flows to the fulfillment system, the analytics warehouse, the fraud detection pipeline, and the customer notification service — each consumer independent, each able to fail or scale without affecting the others.

Request-Reply (Async). The asynchronous variant of RPC. The requester sends a message to a request channel and includes a return address; the responder processes the request and sends the reply to that address. The requester continues working until the reply arrives. This pattern enables call-and-response interactions across distributed systems without the tight coupling of synchronous APIs.

The shift to asynchronous messaging architectures is what made modern integration platforms scalable. When ten systems each call ten other systems synchronously, latency compounds and any single failure cascades. When those same systems communicate through a robust messaging system with built-in fault tolerance — dead letter queues, retry policies, idempotency keys — each runs at its own pace and failures stay isolated.

Integration Design Patterns for Event-Driven Systems

These are the enterprise integration patterns that matter most when you’re building on event streaming and real-time data pipelines. Each pattern solves a specific routing, transformation, or composition problem:

Publish-Subscribe. The most important enterprise messaging pattern for modern architectures. A producer publishes events to a topic. Any number of consumers subscribe and receive a copy. The producer doesn’t know or care who consumes the events. This is the pattern that enables decoupled, scalable data distribution — one event stream feeding dashboards, ML models, search indexes, and AI agents simultaneously.

Classic implementation: JMS topics with durable subscribers. Modern implementation: Kafka topics with consumer groups. The key difference is retention — Kafka retains events for days or weeks, allowing new consumers to replay history. Classic pub-sub was fire-and-forget.

Content-Based Router. Inspects each message, evaluates business rules against its content, and routes it to different channels based on the result. An order event might route to the fraud detection pipeline if the amount exceeds $10,000, to the standard fulfillment pipeline otherwise. In event-driven architectures, this is implemented as branching logic in a stream processor — read from one topic, evaluate conditions, write to different output topics.

Pipes and Filters. Decomposes a complex processing task into a sequence of independent stages, each performing a single transformation. Event enters the pipeline, passes through validation, enrichment, transformation, and aggregation stages, and exits as a processed result. Each stage is independently deployable and scalable. This is the architecture of every modern streaming data pipeline — a directed acyclic graph (DAG) of processing steps connected by message channels.

Aggregator. Collects related messages and combines them into a single composite message. In streaming systems, this typically means windowed aggregations — collect all events within a 5-minute window and compute a summary. Critical for building real-time analytics, session tracking, and feature computation for ML models.

Splitter. The inverse of the aggregator — takes a single composite message and breaks it into individual messages. An order containing multiple line items becomes individual item events. A batch API response becomes individual records. Splitters are essential at the boundary between batch and streaming systems, decomposing bulk data into the granular events that streaming pipelines consume.

Scatter-Gather. Broadcasts a request to multiple recipients, then aggregates their responses into a single result. In multi-agent architectures, this pattern enables parallel execution — fan a question out to multiple specialized agents, collect their answers, and synthesize a response. It’s also the pattern behind parallel enrichment: enrich an event by querying multiple services concurrently, then merge the results.

Data Integration Patterns

While messaging patterns move events between systems in real time, data integration patterns solve the broader problem of keeping data synchronized across multiple sources — operational databases, data warehouses, feature stores, search indexes, and other downstream data sources. These patterns determine how fresh, complete, and consistent your data is across the organization, and they shape the business processes that depend on that data.

Change Data Capture (CDC). Captures row-level changes (inserts, updates, deletes) from a source database and publishes them as events in standard data formats like Avro, Protobuf, or JSON. Instead of running a batch job that queries the full table every hour, CDC streams individual changes as they happen — typically by reading the database’s write-ahead log. The result: downstream systems see changes within seconds instead of hours.

CDC is the modern replacement for batch ETL in most data integration scenarios. It eliminates the staleness inherent in batch processing, reduces load on source databases (no more expensive full-table scans), and produces a stream of events that any number of downstream consumers can process. Tools like Debezium, Fivetran, and Airbyte have made CDC accessible to teams that previously relied on nightly batch jobs.

Event Sourcing. Instead of storing the current state of an entity, store the complete sequence of events that produced that state. A customer account isn’t a row with a balance — it’s a sequence of deposits, withdrawals, and transfers. The current state of the domain model is derived by replaying the event log.

Event sourcing gives you a complete audit trail, the ability to reconstruct state at any point in time (similar to time travel queries), and the flexibility to build new projections from historical events. The tradeoff is complexity — rebuilding state from thousands of events requires careful design, and eventual consistency between the event log and materialized views demands robust handling.

Materialized View. Precompute and store query results derived from a source data stream. Rather than running an expensive join or aggregation at query time, maintain a continuously updated view that reflects the latest state. In streaming architectures, materialized views are built by stream processors that consume events and update a serving layer — a database table, a cache, or a search index.

This pattern is critical for data freshness. A materialized view backed by a streaming pipeline can be fresh to within seconds. The same view built by a batch job is only as fresh as the last run. For AI and ML use cases where feature freshness directly impacts model accuracy, streaming materialized views are a requirement, not an optimization.

PatternLatencyComplexityBest For
Batch ETLHoursLowHistorical analytics, large backfills, compliance reporting
Change Data CaptureSecondsMediumReal-time replication, streaming analytics, fresh feature serving
Event SourcingMilliseconds (writes)HighAudit trails, temporal queries, event-driven microservices

Designing, Building, and Deploying Messaging Solutions

Designing, building, and deploying messaging solutions involves more than choosing a pattern — it requires getting the operational details right. Even a well-designed integration architecture fails in production when teams underestimate the work of operating it.

Topology design. How many topics or queues, what partitioning strategy, what message granularity? Topics that are too coarse-grained force consumers to filter every message; topics that are too fine-grained explode operational overhead. Most teams converge on event-type-per-topic with key-based partitioning, but the right answer depends on access patterns and consumer concurrency requirements. Production deployments commonly span multiple cloud services across regions, and the topology must account for cross-region replication and failover.

Schema and contract management. Producers and consumers need agreement on message format. Without enforced contracts, schema drift breaks consumers silently — a producer renames a field and downstream systems start dropping data. A schema registry combined with data contracts enforces compatibility at deployment time. Avro and Protobuf are the dominant data formats for streaming because they support schema evolution without breaking existing consumers.

Error handling and partial failures. Production messaging implementations must handle the cases where things go wrong — a consumer crashes mid-message, a downstream API is rate-limited, an upstream producer emits a malformed payload. Dead letter queues capture unprocessable messages for inspection. Retry policies with exponential backoff handle transient failures. Idempotency keys ensure that retries don’t duplicate side effects. Building and deploying messaging solutions without these primitives means data loss in production.

Load balancing and consumer scaling. A single consumer rarely keeps up with a high-throughput stream. Consumer groups distribute partitions across multiple instances; load balancing across consumers happens automatically when partitioning is correct. The constraint: ordering is preserved only within a partition. If global ordering matters — rare, but real for some financial use cases — you’re limited to a single consumer regardless of throughput needs.

Observability and operational visibility. Every production deployment of messaging solutions needs metrics on consumer lag, throughput per topic, and broker health. Without these, integration failures surface as downstream data freshness problems with no clear root cause. The major cloud services provide native observability for their managed messaging products, but the work of defining what to measure — and what alert thresholds map to real business outcomes — is owned by the integration team.

Designing, building, and deploying messaging solutions correctly is the difference between integration architecture that delivers business outcomes and integration that becomes its own operational burden. Modern enterprises running real-time data systems invest heavily in this layer because every dollar of decision quality depends on it.

Implementation Frameworks: Spring Integration, MuleSoft, Apache Camel

The enterprise integration patterns catalog is implementation-agnostic — the patterns describe what to build, not what to build it with. A handful of integration solutions emerged specifically to provide implementations of these patterns as composable building blocks: Spring Integration, MuleSoft, and Apache Camel. Each integration solution packages the integration design patterns from Hohpe and Woolf so developers configure integration logic rather than implement it from scratch.

Spring Integration. Built on the Spring Framework, Spring Integration provides Java-native implementations of every Hohpe-Woolf pattern as configurable components. Spring Integration message channels, routers, transformers, and aggregators connect through XML or annotation-based configuration. The Spring Integration model excels when integration logic lives inside Java applications — connecting internal services, transferring data between modules, or exposing application integration patterns through internal APIs. Spring Integration is particularly common in financial services and enterprise Java shops, where Spring Integration’s tight coupling with the rest of the Spring ecosystem reduces the cognitive overhead of mixing integration concerns with business logic.

MuleSoft (Anypoint Platform). A commercial integration platform that packages connectors for hundreds of SaaS and on-premise systems. MuleSoft positions itself as the modern descendant of the classic ESB — centralized integration tooling, but built on contemporary cloud services rather than legacy middleware. MuleSoft fits organizations with heterogeneous integration types (Salesforce ↔ NetSuite ↔ internal databases ↔ partner web services) that need a managed integration solution rather than per-integration custom code.

Apache Camel. An open-source integration framework that implements the Hohpe-Woolf patterns as a Domain-Specific Language. Camel routes describe how messages flow between systems using a fluent API. Camel runs embedded in applications, as standalone services, or on Kubernetes via Camel-K, and supports a long list of integration technologies — JMS, AMQP, Kafka, HTTP, file, FTP, JDBC, and dozens more — out of the box. Apache Camel fits teams that want declarative integration logic without committing to a specific commercial integration platform vendor.

These integration solutions solve a real problem — they reduce the cost of implementing enterprise integration patterns from “build it yourself” to “configure a known component.” What they don’t solve is the architectural gap between integration patterns running in framework code and the underlying data freshness requirements of modern real-time distributed applications. A Spring Integration route can move data from Kafka to a database, but it doesn’t change the freshness of that data once it lands. The framework handles the routing; the architectural decisions about where data lives and how fresh it stays are upstream of these tools.

From ESB to Event Streaming: How Integration Evolved

Understanding where enterprise integration patterns came from helps explain where they’re going.

Point-to-point integration (1990s). Each system connected directly to every other system it needed data from. N systems required up to N(N-1)/2 connections. This worked at small scale but became unmanageable as organizations added systems. Every new connection meant custom code, and changes to one system cascaded to every system connected to it.

Enterprise Service Bus (2000s). The ESB centralized integration logic and consolidated multiple integration technologies — message brokers, transformation engines, adapters for legacy and external systems — into a single middleware layer. Systems connected to the bus, and the bus handled routing, transformation, and orchestration. This solved the point-to-point problem but created a new one: the centralized middleware became a monolithic bottleneck. All business logic for integration lived in one place, owned by one team, deployed as one artifact. Changes were slow. Scaling was expensive. The bus was a single point of failure for how data moves through the organization.

Event streaming (2010s–present). Apache Kafka and similar platforms introduced a fundamentally different model: a distributed, persistent log of events. Producers write events to topics. Consumers read from topics at their own pace. The platform handles durability, ordering, and scalability. Integration logic moves out of a central bus and into the producers and consumers themselves — each team owns its own event processing.

This shift didn’t make enterprise integration patterns obsolete. It made them more accessible. The patterns — routing, transformation, aggregation, pub-sub — are the same. But instead of configuring how data moves through a monolithic ESB, teams implement these flows in stream processors, microservices, and data pipelines that they own and deploy independently.

The current frontier is the convergence of event streaming with real-time analytics and AI. Systems that can ingest events, process them through integration patterns, and serve the results with millisecond latency — not just to dashboards but to AI agents making real-time decisions. The integration design patterns stay the same. The performance requirements and the consumers have changed.

Enterprise Integration Patterns for AI Agents

AI agents are integration consumers. When an agent queries a database, calls an API, retrieves context from a vector store, and takes an action, it’s executing a chain of integration patterns. Understanding this reframes enterprise integration from a “backend plumbing” concern into a critical component of AI system design.

Agents as event consumers. An agent that subscribes to an event stream receives real-time context as events happen — a new customer order, a price change, a fraud alert. This is the publish-subscribe pattern applied to AI. Instead of an agent polling a database every 30 seconds (and getting stale data between polls), the agent consumes a continuous stream of events. Every decision is based on the latest available information.

Request-Reply for tool use. When an agent calls a tool — a database query, an API, a calculator — it’s executing the request-reply pattern. The agent sends a request, waits for a response, and incorporates the result into its reasoning. The integration design challenge is latency: if the tool call takes 5 seconds because the underlying data system is slow, the agent’s overall response time degrades. This is why low-latency data serving matters for agent architectures.

Scatter-Gather for multi-agent coordination. A coordinator agent fans a question out to multiple specialized agents — a research agent, a calculation agent, a compliance agent — and aggregates their responses. This is the scatter-gather pattern. The integration challenge is handling partial failures (what if one agent times out?) and merging heterogeneous responses into a coherent result.

Content-Based Routing for agent dispatch. A router examines an incoming request and dispatches it to the most appropriate agent based on content — customer service questions go to the support agent, technical questions go to the engineering agent, billing questions go to the finance agent. This is content-based routing applied to multi-agent systems.

The common thread: agents need fresh, reliable data delivered through well-designed integration patterns. An agent consuming stale data from a poorly designed integration pipeline makes stale decisions — confidently, at scale, with no indication that anything is wrong. The enterprise integration patterns that ensure data freshness and data quality in traditional architectures are even more critical when the consumer is an autonomous agent.

Choosing the Right Integration Pattern

The right enterprise integration pattern depends on your latency requirements, coupling tolerance, business needs, and the nature of the data flow. Here’s a decision framework for the most common integration scenarios:

In practice, most modern architectures use multiple patterns together. A typical real-time data platform might use CDC to capture changes from operational databases, pipes-and-filters to transform and enrich the events, publish-subscribe to distribute them to multiple consumers, and materialized views to serve the results with low latency.

The integration design pattern you choose also determines your data freshness profile. Batch ETL gives you freshness measured in hours. CDC gives you seconds. Streaming materialized views give you sub-second. If your downstream consumer is a dashboard refreshed daily, batch is fine. If it’s an AI agent making real-time fraud decisions, anything less than CDC with streaming materialized views is a liability.

One principle that holds across all enterprise integration patterns: favor asynchronous, event-driven communication over synchronous RPC wherever possible. Synchronous calls create tight coupling, cascading failures, and latency dependencies. Asynchronous messaging — built on the patterns described above — gives you loose coupling, independent scaling, and resilience. The systems that need real-time data get it through event streams, not through chains of synchronous API calls.

Integration ScenarioRecommended PatternKey Tradeoff
Sync operational DB to analytics warehouseChange Data Capture (CDC)Low latency, but requires schema compatibility and careful handling of deletes
Distribute events to multiple independent consumersPublish-SubscribeMaximum decoupling, but consumers must handle out-of-order delivery and idempotency
Route events to different processing pipelines by typeContent-Based RouterFlexible routing logic, but routing rules can become complex and hard to test
Build real-time features for ML modelsStreaming Materialized ViewSub-second freshness, but requires stream processing infrastructure and state management
Process events through multiple transformation stagesPipes and FiltersEach stage is independently deployable, but end-to-end latency increases with each hop
Aggregate responses from multiple services or agentsScatter-GatherParallel execution reduces total latency, but partial failure handling adds complexity
Maintain complete audit trail of state changesEvent SourcingFull history and temporal queries, but higher storage cost and rebuild complexity
Move large datasets on a scheduleBatch ETL (File Transfer)Simple and well-understood, but stale by design — only as fresh as the last run

The Tacnode Approach: Integration Patterns at Streaming Speed

Most organizations implement enterprise integration patterns by stitching together multiple systems — a CDC tool to capture changes, a streaming platform to move them, a stream processor to transform them, a serving layer to query them. Each hop adds latency, operational complexity, and freshness risk.

The Tacnode Context Lake collapses this stack into a single platform that implements the critical integration patterns natively:

Pub-sub ingestion. Events stream in from any source — CDC connectors, APIs, application events — and are immediately available to all consumers. No separate streaming platform required.

Real-time transformation and materialization. Integration logic — filtering, enrichment, aggregation — runs as continuous queries inside the platform. Results materialize into queryable tables with sub-second freshness.

Instant queryability. Integrated data is queryable the moment it arrives. No waiting for batch jobs, no stale caches, no pipeline delays. When an AI agent or ML model needs context, it queries current reality.

Data contracts at the boundary. Schema validation and quality checks enforce data quality at ingestion time — before bad data enters the integration pipeline.

For teams building real-time AI applications, this means the enterprise integration patterns they need — CDC, pub-sub, materialized views, content-based routing — work at the speed their agents and models require, without the operational burden of managing five separate systems.

Frequently Asked Questions

Key Takeaways

Enterprise integration patterns are a proven vocabulary for solving the recurring challenges of connecting systems and synchronizing data. They were codified two decades ago, but the problems they solve — routing, transformation, aggregation, decoupling — are more relevant now than ever.

Messaging is the dominant integration style. Publish-subscribe, content-based routing, pipes-and-filters, and aggregation are the patterns that power modern event-driven and streaming architectures. The shift from ESBs to event streaming platforms changed the implementation, not the patterns.

Data integration patterns — CDC, event sourcing, and materialized views — determine how fresh, complete, and consistent your data is across systems. CDC has replaced batch ETL as the standard for real-time data synchronization.

AI agents are integration consumers. Every pattern described in this guide — pub-sub for real-time context, request-reply for tool use, scatter-gather for multi-agent coordination, content-based routing for agent dispatch — applies directly to the design of agent architectures. Agents that consume data through well-designed integration patterns make better decisions. Agents that consume stale, poorly integrated data make confident mistakes.

The patterns are stable. The implementations keep evolving. Choose the integration design patterns that match your latency requirements, coupling tolerance, and consumer needs — and build them on infrastructure that can deliver the freshness your most demanding consumers require.

Enterprise Integration PatternsIntegration Design PatternsEvent-Driven ArchitectureMessaging PatternsStream ProcessingData Integration
Boyd Stowe

Written by Boyd Stowe

Former Couchbase and IBM. Two decades helping enterprises adopt new database paradigms.

Ready to see Tacnode Context Lake in action?

Book a demo and discover how Tacnode can power your AI-native applications.

Book a Demo