Should you build or buy a Customer Data Platform?
Customer Data Platforms are a critical piece of the modern data infrastructure. Learn what it takes to build a Customer Data Platform and how to determine whether building a solution or working with a leading vendor is the right path for your organization.
For marketing and product teams, customer data is the foundational asset that drives decision-making. High-quality customer data allows teams to understand their user journeys, launch targeted campaigns, improve the product experience, and more. To support these initiatives, engineers are frequently needed to instrument complex ETL (Extract, Transform, Load) tasks, pulling data from a source and placing it into a destination.
Customer Data Platforms (CDPs) have risen in popularity in recent years because they allow non-technical stakeholders to perform ETL tasks by accessing high quality customer data through a simple workflow in the UI and connecting that data to the tools they’re using. CDPs save organizations from using precious and expensive engineering resources while allowing non-technical teams to get the data they need where they need it, faster.
As a CDP is a foundational piece of data infrastructure, and handles arguably your most valuable asset, customer data, many have considered whether they’d be better off building their own solution.
This post will establish what’s required to build a CDP and walk through the respective benefits of building and buying a platform, with the aim of helping you understand which is the better path for you. First, let’s dive deeper into what a CDP is.
What is a Customer Data Platform, and what’s required to build one?
A Customer Data Platform is a centralized data infrastructure that aggregates a company’s customer data to build persistent customer profiles and optimize customer engagement through integration with tools and systems in your growth stack. There are four key functions of a CDP:
- Data connections: The ability to ingest first-party, individual-level customer data from multiple sources with a single API connection and forward user events, attributes and identities to external tools and systems through pre-built integrations
- Profile unification and data quality protection: The ability to protect data quality and unify events and attributes to unique profiles at the individual level as data is collected
- Segmentation: An interface that enables users to build and manage audience segments
- Activation: The ability to send audience segments and forward data to external tools and systems
The first function that a CDP must perform is data collection. To build the unified customer profiles that they promise, CDPs must be able to collect from multiple sources, such as mobile app, website, OTT devices, systems, and applications, via native SDKs, API connections and webhooks.
CDPs collect data from sources utilizing a geographically distributed Content Delivery Network (CDN) and a suite of tooling (Edge infrastructure) to support fast data ingestion speeds. Data is then routed into by message queue to be processed. A variety of services pick up the data from the queue to perform transformations, enrichment, validation against an established data model, and more. At this point, a real time view of incoming data should also be available within the UI so that users can monitor activity. Customer data is then stored for the long term in different data repositories depending on the type of data and the intended purpose.
Customer Data Platform data collection architecture.
Profile unification and data quality protection
As data is ingested into the CDP platform, it needs to be tied to unique customer profiles that can be accessed from within a UI and programmatically through a REST API. These profiles should contain events, attributes, device information, user consent and more. Different customer engagement channels (mobile, web, OTT, etc.) all track users with different kinds of identities, but all events should nonetheless be unified to a single customer profile in your CDP. Having a correct user profile with all information consolidated is the measure of quality for your data. Data quality is extremely important to CDP functionality because the quality of data forwarded to external systems has a big impact on how effective that data is when used for marketing and analytics purposes.
CDPs must be able resolve identities successfully by taking into account factors such as anonymous vs. logged in user behaviors, user tracking across multiple devices, user tracking across multiple apps, and most importantly, end-users’ privacy choices.
Giving engineers the ability to perform certain identity resolution operations programmatically is equally important. For example, an identity API can be used to search existing user records or perform user aliasing to make sure data being collected in-app is being sent to the correct user profile.
Customer Data Platform identity architecture.
As noted earlier, customer profiles are compromised if data tied to them is not accurate and consistent. Therefore, having a system in place to control data quality is important.
First you’ll need to establish a data schema that outlines the events you are expecting to collect and what you expect them to look like. This schema can be created within your CDP UI, and or by using an HTTP API. As data is being collected into the platform, best-in-class CDPs offer the ability to compare those events to your data schema for compile time and runtime data quality verification.
Customer Data Platform data quality management architecture.
Checking the quality of data automatically by dropping unplanned user events and attributes from ever being included in your user profiles or from being forwarded to downstream systems is also an advanced capability. An additional step can be to Quarantine this data for inspection and review, and then play it back once it’s fixed.
Customer Data Platform data point validation architecture.
A key function of Customer Data Platforms, especially for marketing team use cases, is centralized audience segmentation. A CDP should provide segmentation capabilities within the UI that make it easy for non-technical users to build cohorts using any customer-associated data you have collected.
CDP segmentation tools can vary depending on factors such as the kind of data being included in audiences segments, as well as how quickly audiences need to be calculated. To pull customer identities from the event and attribute conditions set in the audience builder, the CDP needs to be able to query the identities in the NoSQL identity database. For audience memberships to be updated in real time as new customers qualify into the audience (or existing customers qualify out), the CDP needs to be able to access a real time database, or hot storage database. For audience segments that do not need to be compiled in real time, records can be pulled in bulk from a cold storage database.
Advanced CDPs’ segmentation tools should include the capability to estimate the size of an audience as it is being built. If you are building your CDP to enable multiple workspaces, you’ll need to include the capability for users to specify whether an audience segment pulls data from a single workspace or multiple workspaces.
A Customer Data Platform should enable users to build complex audience segments within a UI.
Audience activation and data forwarding
Finally, integration connections make it possible to get data out of the CDP. The connections that a CDP has, as well as the type of integration built for each connection, depend on the use cases. For example, to use a CDP to send events to an email service provider to power transactional messaging after a purchase has been made, the CDP will need a real time API integration. Less timely use cases, such as exporting raw event data to a data warehouse for BI reporting, may allow for bulk forwarding.
When forwarding audiences, it’s helpful to be able to build an audience once and connect it to multiple outputs. This is valuable for teams that are running cross channel campaigns, need to suppress their targeting in real-time, or would like to A/B test multiple channels with the same audience segment. Additionally, advanced CDPs allow users to schedule when an audience is forwarded to an external system.
It’s important to consider that integration needs change over time. CDPs should enable flexibility by making it easy to begin sending data to new destinations and/or stop sending data to existing connections.
A Customer Data Platform should allow users to connect data to external tools and systems within a UI.
Building a Customer Data Platform in-house
Companies sometimes consider whether they would be better off building their own CDP in-house.
What are the benefits of building your own CDP?
If your team has domain expertise in building customer data infrastructure, there are some benefits to building your own Customer Data Platform. First, you’re able to build a system that is completely customized to your environment–you can prioritize building the connections that are most important to you, and you can ensure that your CDP works well with your existing architecture. Second, the product trajectory of your in-house Customer Data Platform will be completely in your hands. If there are any features that you’d like to add to your CDP, or any existing aspects that you’d like to change, you can update the platform yourself and don’t have to request your CDP vendor to modify their product roadmap.
Unfortunately, these benefits can also introduce complications.
What are the downsides of building your own CDP?
First, even if you do have deep domain expertise in building customer data infrastructure, developing a Customer Data Platform in-house is at best a 6-12 month project, according to the Customer Data Platform Institute. Without deep domain expertise, the build can easily take 18-24 months, if not more. There are many risks that can compromise such a long term project, such as personnel turnover, shifts in organizational priorities, and budget constraints. End users that requested the platform initially may not get access to a solution for a year or more.
It’s also a complicated project. Success will depend on collaboration between multiple departments across the organization, and on working with external partners to build API connections. Gartner notes that “homegrown CDPs require the full range of IT development roles, including the ability to build a business-friendly user interface.” It’s often senior, more experienced team members that have extensive experience with your existing architecture and institution that will have to dedicate significant time to building the CDP, increasing the cost of each hour invested.
Second, your Customer Data Platform’s evolution will be yours to execute once the initial platform is live. New API connections, product features, and more will likely be requested from end users of your CDP on an ongoing basis. If there are any changes in government regulations or market conditions, your organization’s platform requirements may change. Frequent CDP maintenance and integration builds will pull resources away from core development, increasing CDP cost and impacting the productivity of the team as a whole. As your customer base grows, you will need to ensure that your CDP has the data processing scalability required. Although you will have the flexibility to focus on updates that are most important to your organization, it will also be up to you to build them.
Buying a Customer Data Platform
Many organizations, both enterprises and startups, that turn to Customer Data Platforms consider working with a leading CDP vendor. There are several benefits to doing so.
One of the biggest benefits is the speed at which you can have your CDP up and running. Instead of allocating resources to build your Customer Data Platform, you’re able to implement packaged SDKs in your digital properties or set up API connections once and then return to core development.
For an example of what these look like, you can see mParticle’s SDK and API documentation here.
Furthermore, some CDP vendors will offer professional services support to assist with the implementation and ensure that adopting a CDP doesn’t necessitate strains on engineering. mParticle offers implementation support designed to help you go from kickoff call to production in 90 days or less.
Here is an outline of our four step Quick Start implementation roadmap:
Outline of mParticle's Quick Start implementation roadmap.
Working with a leading Customer Data Platform vendor can also be cost effective. As noted previously, building a CDP internally is expensive, as it requires dedicated hours from senior engineers and alignment of stakeholders across the organization for a long period of time. Any delays in the build process will only make the project more expensive. Working with a leading CDP vendor will allow you to access the capabilities you and your team need at a subscription cost, eliminating the opportunity cost of the build process.
Beyond the initial launch, working with a CDP vendor will also reduce costs on an ongoing basis. Packaged CDPs with extensive integration ecosystems and friendly UIs will make it easy for non-technical teams to connect data to new tools without engineering support. CDPs with data quality and audience segmentation tooling will allow non-technical stakeholders to build high quality audience segments without having to submit requests to data engineering or data science.
Flexibility for end users
Data strategies evolve over time. It’s important for Customer Data Platforms to accelerate your team’s evolution, not prohibit it.
With secure data collection and extensive integration ecosystems, packaged CDPs make it easy to shift or evolve your data pipeline. Business teams are able to collect data from new sources, such as a POS system, with little-to-no engineering work required, and trial or A/B test new vendors by sending limited data sets to them in just a few clicks.
When you need to instrument a new event, CDP developer tools such as mParticle’s Smartype make it easier for you to translate your data schema into type-safe code to help you ensure proper event collection at run time. Business users are able to get access to high quality customer data sooner, and you’re able to work faster while reducing technical debt.
How to determine the right path
Evaluating whether to build or buy a CDP can be a difficult decision. Here are a few guiding questions that may be helpful as you decide what’s best for you.
What are the competitive advantages you seek by having a CDP?
If you’re competing in a mature market, an excellent way to differentiate is by investing in technology that enables better customer experiences. You may have some of the resources to build your own solution, but legacy infrastructure or organizational restrictions may make it difficult to move forward.
If you’re operating in an emerging market, speed to market is important. The sooner you can implement a viable solution, the sooner you can begin delivering results.
What available resources do you have, and what is your track record of tech development?
Many organizations consistently adopt to build their own tools, and therefore have the resources and processes in place to do so again successfully. If you don’t have a track record of developing data infrastructure, however, it’s important to consider whether you have the resources and processes in place to build something as foundational as a Customer Data Platform.
What are your CDP time-to-market requirements?
As noted earlier, there is a significant time differential between building and buying a customer data platform. If you are operating in an evolving industry and are facing pressure to keep up with the competition, it may be difficult to allocate 12-18 months to building your own solution (if not longer). Working with the right CDP vendor can allow you to get to market in as quickly as 90 days.
What will the labor costs of developing and maintaining a solution be for your organization?
Once your initial CDP solution is built, you’ll still be responsible for building new integrations, maintaining existing connections, ensuring your infrastructure can scale, and introducing the feature updates your team needs. To properly forecast a CDP build, it’s important to estimate the number of updates you’ll need to make a year (based on your team’s historical vendor selections) as well as your projected tracked user growth, and calculate what the labor costs of those updates will be.
Interested in learning more about how a Customer Data Platform is built? You can access mParticle’s developer docs here.
Latest from mParticle
Avoiding the growth trap
What do cattle farmers from the 1600s have in common with teams across modern companies? Both rely on shared resources that can quickly be depleted by an overzealous desire for growth, leading to the tragedy of the commons. Learn how you can avoid the growth trap by leveraging your customer data infrastructure and saving your engineering resources from depletion. Stop the vicious cycle, not the development cycle.
Why real-time data processing matters
Business-critical systems shouldn't depend on slow data pipelines. Learn more about real-time data processing and how implementing it strategically can increase efficiency and accelerate growth.
APIs vs. Webhooks: What’s the difference?
An API (Application Programming Interface) enables two-way communication between software applications driven by requests. A webhook is a lightweight API that powers one-way data sharing triggered by events. Together, they enable applications to share data and functionality, and turn the web into something greater than the sum of its parts.