GrowthJanuary 26, 2023

CDPs: Streaming vs batch?

Many Customer Data Platform vendors describe themselves as real-time, but not all support the real-time use cases that businesses want to execute. This article breaks down the technical capabilities required support real-time use cases and shares tips on how to identify the right CDP solution for your needs.


Ever since Apache Kafka was open-sourced in January 2011, businesses have been implementing real-time architectures to power their systems and applications. In contrast from batch pipelines, real-time data streaming ingests, processes, and manages streams of data from one or more sources continuously. But while real-time data was once an innovation, it’s now a necessity. As consumers have come to expect real-time experiences such as triggered transactional messages, adaptive personalization, and instantaneous video content, it’s become imperative for businesses to implement real-time data systems. 

The increased desire for real-time has prompted many SaaS vendors to design real-time solutions for customers. As the adage goes, ‘during a gold rush, sell shovels.’ But when everyone is in a rush to dig gold, you can’t always trust the shovel salesmen. In an attempt to capitalize on market interest in real-time data, many vendors have made an effort to describe their offerings as “real-time” even though they aren’t able to process data in real time from end-to-end. This is particularly true in the Customer Data Platform market, in which numerous vendors label their solutions as “real-time” despite only some, not all, of their features being able to process data continuously. As a result, the term “real-time” has been diminished from a technical feat to a ubiquitous marketing buzzword, and buyers looking for solutions to support real-time use cases are finding it difficult to evaluate the market.

This article will break down what real-time streaming is, how real-time data can be used to drive business results, and explain where many offerings labeled as “real-time” fall short. Our aim is to make it easier for buyers to navigate the market, identify common traps, and choose the right solutions for their needs.

Not all real-time systems are real-time

While the concept of real-time has been interpreted liberally by many, for argument's sake we will refer to anything real time as 50 milliseconds or less. Real-time data streaming is the process of ingesting and processing a continuous stream of data. The key word here is continuous—in contrast to batch processing, during which data must be ingested and stored before it can be processed in groups, stream processing analyzes data continuously as it is ingested. 

There are four phases as the data lifecycle: ingestion, processing, access, and egress. To support real-time use cases, such as keeping 360-degree customer profiles up-to-date continuously and delivering adaptive customer experiences, all stages of the lifecycle need to be performed in real time. In other words, the system is defined by the weakest (or slowest) link — meaning if any of these processes are batch-based, the system, by definition, is not in fact real-time.

  • Ingestion: Data is collected from multiple sources, such as mobile apps, websites, OTT platforms, server-side environments. Real-time data ingestion requires data to be collected from all sources continuously.
  • Processing: Data is arranged, unified, sorted, and transformed to make it valuable to downstream systems. Real-time processing requires data to be modified instantaneously as it is ingested into the system, as opposed to at regular cadences.
  • Access: Data is retrieved, reviewed, queried, and or moved by data consuming teams, such as Marketing and Product. For data access to be real-time, actionable data needs to be made available to users within milliseconds of ingestion. All previously defined profiles, reports, and queries must be updated continuously as new data is ingested.
  • Egress: Data is streamed from a database to external systems. Real-time data egress requires that raw data, customer profiles, and audience list updates be shared with external systems dynamically. 

As data moves through these steps, the viability of a real-time system is dependent on all steps being performed continuously. Again, the strength of the end-to-end process is dependent on the weakest link of the chain. If a system is able to perform access and egress in real time, but can’t ingest and process data in real time, for example, it will not be able to support real-time use cases. Such a system would allow marketings to serve an ad to a customer in real time, but would prevent them from being able to personalize that experience successfully as the experiences being delivered are not up-to-date with the customer’s most recent actions and consent preferences. In this instance, an ad may be served in real time, but it may be delivered just after the customer has bought a product—or worse, just after they’ve opted-out of advertising experiences. For a real-time system to successfully support real-time use cases, every stage of the data lifecycle must be able to be executed in real time.

How does real-time architecture deliver business value?

Having real-time data streaming in place allows you to increase operational efficiency and deliver experiences that meet modern customer expectations. There are three primary use cases through which real-time streaming creates business value: integrating data across systems, continuously keeping customer profiles up-to-date, and powering personalized experiences that are up-to-date in real time.

1. Real-time customer data pipelines help save the company time and money

The value of a customer data pipeline above and beyond a generalized real-time data pipeline is that it solves for the nuances specific to the application of customer data. While generalized pipelines may offer greater configurability, the opinionated approach provides simplicity through a happy path. Both are going to be better than batch-based pipelines, however.

As teams adopt numerous best-in-class marketing tech applications, powering those tools with high-quality, real-time data from various sources and systems is critical for the value to be maximized. Thus it becomes critical to provide dial tone reliability, ensuring no latency or data loss is introduced into the value chain. With the ability to collect, transform, validate, and govern data in real time, data consumers are able to access the customer data when they need it.Speed thus becomes a competitive advantage whereby teams can operate with greater agility and speed to capitalize on opportunity as it happens. Additionally, data producers are able to implement data once and reduce overhead and maintenance costs.

Additionally, with high quality, real-time data available across the marketing tech stack, it becomes easy to improve governance and regulation of data flows,  protecting against potential privacy and compliance violations, in a way that batch systems are not designed for.

2. A continuous 360-degree customer view is the foundation for success

High-quality customer profiles are the foundation for effective personalization. And the most valuable view of the customer is based on their most recent interactions, independent of source or channel.

Real-time identity resolution architecture allows teams to resolve cross-channel data to 360-customer profiles as it is ingested. These profile updates can be used to inform targeting and personalization, improving the customer experience. For example, if a user views a product on your website, and then later in the day purchases the same product in your mobile app, resolving cross-channel data in real time will allow you to view the entire customer journey and inform follow-up actions such as confirmation emails, future product recommendations, and ad targeting.

As more and more consumer privacy regulations are being introduced and enforced across the world, solving for data privacy is becoming increasingly important. Real-time data governance and consent management solutions allow teams to keep customer profiles up-to-date as users update their consent preferences. These profile updates can be used to keep customer experiences in-line with users’ consent preference as they change. Without the ability to collect and apply consent updates in real time, brands risk engaging with customers immediately after they’ve opted-out, a mistake that is sure to break customer trust irreparably.  

3. Real-time personalization allows for rapid iteration and accelerated growth 

The most impactful application of real-time data is delivering personalized experiences that are up-to-date with users’ interests and preferences at any given point in time. Powering personalization with real-time data has shown to increase customer engagement, customer lifetime value, and ROI. There are several personalization use cases in particular that benefit from being powered with real-time data.

As users browse products and services on your website and mobile apps, delivering triggered experiences is a great way to maximize engagement. If a user adds a product to cart but ends the session without checking out, for example, triggering a cart reminder email based on session-end in real time is a great way to decrease cart abandonment rate. Fail to trigger this reminder email in real time and deliver it an hour later instead, and your potential customer may have already purchased a product from your competitor. Triggered experiences are also valuable post-purchase to confirm orders, provide delivery updates (particularly important for food delivery experiences), and alert fraudulent activity.

Real-time data is also valuable for personalizing website and app experiences. As users engage, leveraging behavioral data to tailor website banners, content cards, and product recommendations in real time is a fantastic way to increase engagement. And as product stock changes over time, updating recommendations based on what’s available for purchase at any point in time increases the chances that customers will be able to purchase successfully. 

When users engage across channels, personalizing the experience in one channel based on actions taken in another channel in real time is essential for delivering a consistent customer experience. When a customer places a food pickup order on web, for example, delivering mobile messaging to confirm their order and update on order status throughout the pickup experience increases the chances of a successful collection. On the flip side, customer data can be utilized cross-channel to suppress personalized offers based on user engagements in real time. For example, once a customer makes a purchase, real-time audience updates will instantaneously stop ads and emails offering that product from being sent to the customer.


When evaluating whether or not a tool can support real-time use cases, it’s important to confirm that it’s able to perform all stages of the data lifecycle in real time. 

Often, tools labeled as “real-time” are able to perform certain functions in real time, but are not able to process data throughout the entire lifecycle in real time. For example, they may be able to stream data to external systems in real time, but aren’t able to ingest data from multiple sources in real time. Such solutions lead to pain for buyers because they make it impossible to deliver personalized customer experiences that adapt as customers engage over time.

For a solution to support real-time use cases, such as keeping customer profiles up-to-date continuously and delivering adaptive customer experiences, all stages of the data lifecycle—ingestion, processing, access, and egress—need to be performed in real time.

Latest from mParticle

See all insights
Technical skills for marketers hero image


Most important data skills for marketers to master in 2023 (spoiler: none of them are SQL)



How to assess your organization's customer data maturity

Connected, by mParticle Episode 12


Connected, by mParticle Episode 12: The recipe for happy customers with Karan Gupta of Marley Spoon


How we improved performance and scalability by migrating to Apache Pulsar