Why Zero-Waste is ushering in the era of the CDP 2.0Read blog series

Data strategyDecember 16, 2020

How to harness a CDP for machine learning: Part 1

Learn how an infrastructural Customer Data Platform can help you overcome common machine learning challenges in part one of this three-part series.

cdp machine learning

By making off-the-rack machine learning models accessible for anyone to use, cloud ML services like Amazon Personalize help make ML-driven customer experiences available to teams at any scale. Brands no longer need in-house data science and machine learning experts to get the benefit of propensity scoring or product recommendations.

Key challenges with machine learning

However, while models can be outsourced, your data can't. The effectiveness of machine learning insights will always be limited by the quality and completeness of the data they are based on. Cloud ML platforms–by themselves–leave three key challenges unsolved:

  • Collecting and supplying quality user data to train and update your model.
  • Making the insights gained from the model available where they are needed.
  • Knowing how well your ML-driven experiences are working.

These are infrastructure challenges, and they can only be overcome with a secure data infrastructure such as a Customer Data Platform (CDP). The goal of a CDP, like mParticle, is to get customer data from wherever it is, organize it into a single view of the customer, and make that view available to all services that need it. Instead of thinking about machine learning as just another data silo, a CDP can help you build machine learning insights into your core data infrastructure by connecting ML-driven learnings to additional external services for activation.

Part two of this post will include a detailed tutorial on spinning up a simple machine learning model using Amazon Personalize, and hooking it into the data infrastructure provided by mParticle. But first, let's dig into how a CDP can help you solve each of the three infrastructure challenges:

Collecting and supplying quality user data

To train an ML model, you need accurate data about user behavior, and lots of it. Let’s break data quality down into three components:

  1. Identity Resolution
    To be able to generate recommendations based on all actions of your user, across all channels and touchpoints, you need to be able to resolve user identity. Many off-the rack ML solutions skip this requirement, tracking activity occurring on a particular device and calculating insights for that device only. This method is convenient, but it doesn't reflect a customer's true history of interaction with your brand across your website, mobile apps, stores and support channels, and therefore can lead to incomplete insights.
    Identity resolution is a core capability of a CDP. For example, mParticle can resolve cookies, device identifiers, social IDs and emails to a single universal ID (mParticle ID), based on rules you define. By making the mParticle ID the primary key used to train your ML models, you can generate insights for a real customer, not for a phone or browser session.
  2. Consistency across platforms
    Once you solve the identity resolution challenge, you still need to map data from all those different sources to a single schema that you can use to train your model. This means bringing together multiple teams of developers across multiple languages and platforms, to collect data under a single schema.
    mParticle data plans make it easy to collect the specific user actions needed to train your model. mParticle also provides a suite of developer tools to help you instrument the plan perfectly, first time. These tools include linting plugins and SmartType, a tool for turning your data plan into type-safe libraries for each of your app platforms.
  3. Updating in real time
    Finally you need to upload all that data to your ML platform and keep on updating it in close to real time, or your recommendations will quickly become outdated.
    mParticle gives you the integration connections needed to connect user data to your ML platform without performing manual ETL jobs. These connections stream data in real time and can be managed without writing code.

Making ML insights available and actionable

Just as the data that powers an ML model can come from any platform, the insights that machine learning models generate are most valuable when they can be used to power personalized experiences for your website, apps, brick-and-mortar stores, call centers, etc.

Without modern customer data infrastructure, making ML actionable is a huge challenge. For example: say you've used ML to generate churn risk scores for your customers. Without the ability to automatically connect those insights to additional systems, can your call center automation system treat high risk customers differently? Do your customer support representatives know when they're speaking with a high churn risk customer? Can your website surface retention offers? Can you segment on churn risk in your ESP? Without the data connections provided by a CDP, making your ML scores available where they’re needed would require dedicated development work and additional cost.

mParticle solves this problem by helping you maintain a single master record of each customer, including your ML insights. mParticle then takes care of making your customer data available wherever it's needed.

Tracking success

Once you're able to generate ML insights based on quality data and deliver experiences based on those insights across all your channels, you're still left with the question: "Is this working?"

When you integrate an ML platform with mParticle, every relevant datapoint can be stored on the mParticle user profile–which services and campaigns are employed for each customer, whether they are part of an A/B test, what their current propensity scores or product recommendations are at the time of any action. Whenever mParticle forwards event data to another service, including analytics services like Amplitude, Mixpanel and Google Analytics, it enriches that event data with a complete set of user data, as context. This makes it simple to answer questions such as, "which of my three product recommendation recipes generates the most sales?" and "how often does a product recommendation lead to a product view?"

Next up

In part two of this series, I walk you through a real end-to-end example of integrating a simple machine learning campaign into the CDP infrastructure provided by mParticle.

Latest from mParticle

See all insights
What is a conversions API

Growth

What Is a Conversions API, and Why Marketers Need It Now

Buying a CDP Today

Growth

Part Eight: Buying a CDP Today

CDP 2.0 and Zero-Waste

Growth

Part Seven: CDP 2.0 and Zero-Waste