mParticle Product—September 30, 2019

Real-time event processing with Kafka

Learn how mParticle's Kafka integration can help you stream customer data to systems and applications with event data forwarding, advanced filtering and compliance, distributed event notification, and event sourcing.

Overview

As consumers engage with digital properties, hundreds of billions of events are produced. These events can be used to trigger several applications or systems for specific workflows. Processing events between services at such a scale cannot be left to the traditional data pipelines, which are orchestrated using a request-response driven model.

Instead, a new approach increasingly being adopted by enterprise data architects is to use a data streaming platform, like Apache Kafka, to publish and subscribe to streams of records akin to a message queue or enterprise messaging system. The workflow enables data to be streamed at scale and in real time, all while ensuring fault tolerance.

mParticle’s integration with Kafka publishes customer event data from mParticle into Kafka-enabling systems and applications, making it possible to subscribe to real-time customer event data and react to streams of incoming user events.

Support for event-driven architectures

Using mParticle’s integration with Kafka, enterprise architects can easily scale their customer data for the following use cases:

Event data forwarding
mParticle captures user engagement data across your entire digital stack which can then be set up to be forwarded automatically to Kafka as events in the standard JSON format. All mParticle generated events are forwarded into a Kafka topic and are assigned an mParticle user ID as the partition key. This ensures that each user’s events are sent to the same partition and are received in order for durability and replay-ability. This automated event data pipeline takes out the manual processes and ensures that the customer data in your Kafka instances is always up to date.

Advanced filtering and compliance
mParticle provides a simple way to control the flow of data to your Kafka instances. This advanced filtering ensures that you send only the most pertinent customer data and help curb costs caused by unnecessary data importation. This granular control over event data forwarding also helps comply with data privacy regulations. mParticle keeps your entire data ecosystem compliant with GDPR data subject rights by managing and fulfilling your data subject deletion requests. Using this integration, companies can rest easy that they are making the most of their customer data while ensuring that they respect customer's privacy.

Distributed event notifications
Once the event data is streamed from mParticle into Kafka topics, it allows you to update distributed downstream systems and applications whenever a specified event occurs so that they can react to incoming user events in real time. The Kafka topics are available for subscription for a range of use cases including real-time processing, real-time monitoring, and loading into Hadoop or offline data warehousing systems for offline processing and reporting.

Event sourcing
mParticle’s integration with Kafka ensures that all changes to user states are stored as a sequence of events in Kafka which can then not only be queried but the event log can be used to reconstruct past states of that user’s data. This time-ordered sequence of records is a style of application design known as Event Sourcing.

Minimize downtime

mParticle tolerates integration downtime by maintaining your data for up to 30 days without an active connection. In the case of extended downtime of your Kafka implementation, we can perform a replay of your data to Kafka.

Summary of Setup Instructions

Enable the Kafka integration in the mParticle Directory.
During set-up you will need a list of comma-separated bootstrap servers that identify an initial subset of servers known as “Brokers,” in your Kafka cluster. “Brokers” do all the work in Kafka.
Kafka organizes messages into “Topics.” mParticle is a producer that pushes event data into a “Topic” hosted on a “Broker.” You have to provide the topic name during the configuration set-up within mParticle.
Kafka topics can be divided into “Partitions.” Events forwarded to a Kafka topic are assigned an mParticle user ID as the “Partitioning key.” “Brokers” hold multiple “Partitions,” but only one partition acts as leader of a topic at any given time.
Systems and applications act as consumers that pull event data from a “Topic” via a “Broker.”

Try it!

Skip the setup! Check out our Kafka documentation to learn how easy it is to get started with mParticle and Kafka. If you are using Kafka for real-time event data pipelines and want to learn how mParticle can help, contact us.

Learn more about mParticle's Pathways partner program here.

Links

Website Documentation

Kafka is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.