GrowthApril 04, 2022

What is Data Chaos? And how to solve it

As you successfully drive growth, the complexity of your data landscape increases significantly. Unpredictable changes, both internal and external, make it difficult to execute your customer data strategy at scale. This is data chaos, and to solve it you need to be able to adapt to the changes.


There’s a well known parable about a monkey and a pedestal, in which a team is promised a large sum of money if they can teach a monkey to recite Shakespeare while standing on a pedestal. Eager to get started, the team begins with the easy part–building the pedestal. When asked to share a status update, the team appears as if they’re making progress because they’ve got a nicely made pedestal to present. In reality, however, their pedestal is nothing more than evidence of their avoidance to solve the hard part–teaching a monkey to recite Shakespeare. 

In 2022, the business imperative to invest in digital experience is a foregone conclusion at this point. But most companies still don’t have a coherent data strategy, opting to avoid the hard part first. In NewVantage Partners Annual Survey, only 30% of the respondents said their companies actually had a data strategy.

Creating a data strategy

A customer data strategy includes all of the relevant Events that need to be tracked, the set of Identities that are accessible, and the various data objects that should be captured in order to achieve business goals, solving for specific use cases, and the various applications which consume the data. 

Creating a customer data strategy is critical for long-term success, and the path to creating your data strategy is bespoke to your business. Teams often start with a single use case around customer data and don't establish a comprehensive data strategy when they set out to build their digital presence. But this increases risk exposure when market conditions change.

Case in point, the calculus around customer acquisition has changed quite dramatically over the past 12 months due to the changes dictated by Apple in their iOS14.5 update. The impact that iOS14.5 had on the targeting capabilities offered by the walled gardens has created a massive ripple effect. It’s becoming a lot harder and more expensive to acquire new customers, so teams have shifted focus to building a first-party data strategy and improving retention.

Enter data chaos

For companies beginning to develop their customer data strategy, starting with a limited number of Events, Identities, and required integrations is manageable. As teams successfully drive growth, however, the diversity and magnitude of these data objects naturally expand. Over time, more objects and more use cases create significantly more complexity. 

Inside most companies, a typical scenario emerges:

  • Websites and apps change, with landing pages and app screens being added, optimized, or removed.
  • New campaigns are run, often requiring new data flows to measure success. New tools are required to optimize performance, thus requiring new integrations. 
  • Event tracking changes as developers across different platforms work in silos over time.
  • Customers toggle between known and anonymous states, and across different devices, which requires dynamic identity resolution capabilities. 
  • Users update their consent preferences to opt out of personalized experiences, and their information will need to be extracted from certain flows and tools. 
  • Models are built and experiments are run which force several of these steps to be repeated. 

And in the market:

  • New privacy regulations such as GDPR and CCPA fundamentally change the way you can collect, manage, and activate data.
  • Apple and Google create new platform rules that change how you can access cookies and device identifiers.
  • API requirements change as vendors continually update their offerings and specs. 

Teams end up making a Faustian bargain in their pursuit of growth. The challenge becomes adapting to the changes—more use cases require that more tools are added, the number of data objects grows exponentially, and the subsequent maintenance due to API changes increases, not to mention the introduction of new privacy requirements. It all becomes quite overwhelming. 

This is Data Chaos. This is a universal phenomenon.

How to tackle data chaos

The biggest mistake teams make is that they believe the problems they have today will be the same problems they’ll have in the future, or that they’ll just have a greater number of the same problems. The logic is flawed, unfortunately, because complexity compounds as the amount of data, sources and destinations, workflows, privacy restrictions, and data quality challenges multiply at scale. 

The question teams have to address is how to maintain data trust along the way, and how to effectively scale their personalization efforts amidst ever-changing internal and external demands across both business, and privacy realms.

Or more simply put, the activation challenge is how easily teams can adapt to solve the problems of tomorrow when they look nothing like the problems of today…the unknown unknowns. 

The answer is that teams need a trusted data pipeline that can connect customer data to and from internal systems as well as the digital ecosystem, and that can address the perpetual complexity while maintaining data trust. More specifically, a customer data pipeline that can address the dynamic needs across the organization, support complexity at scale, protect data quality and privacy, and can connect data in real time to any other tool. (Note that generalized pipelines won’t do, but more on that in a few).

Most people think about the Customer Data Platform (CDP) as a system for activating data. This view misses the point. The CDP should be a system for adaptability, ensuring that the activation of data is done properly, as the constant state of change (data chaos) is what undermines a data strategy and inhibits activation. 

The opportunity around activation is like putting together a puzzle where a number of pieces need to come together. The problem however, is that the puzzle pieces are constantly being moved. The real challenge must solve for the moving puzzle pieces first, and only then should you focus on building the puzzle.

Why the alternatives fall short

In the perpetually-evolving ecosystem, the idiosyncratic nature of customer data challenges the capabilities of generalized data pipelines. These characteristics are the result of customer data which exists in far more heterogeneous contexts ( various media platforms, device types, business units, countries, privacy regimes, etc.) than other types of business data. These contexts create requirements that go beyond the ‘unstructured’ and ‘interchange’ capabilities of traditional data pipelines.  

A few of the unique challenges customer data pose on data pipelines and data management systems:   

  • Privacy: With various privacy regulations, teams may be required to cease or restrict data collection on known users depending on device, changes to consent status, etc. In many cases, the same data from the same user must be dealt with differently (i.e. logged in vs logged out states). Additionally, consumers may interact in different privacy regimes - separate rules may apply depending on media (Video Privacy Protect Act) and geographic context (GDPR, CCPA, LGPD etc). This is well beyond the traditional ETL management scope—these conditions must be managed appropriately or the data must be deprecated to low granularity to ensure compliance.  

  • Data modality: As customer data is created and consumed across several systems and applications, each of these technologies offer a proprietary specification by which data is stored and processed. Inconsistencies in data structure and formatting requirements, combined with the high likelihood of manual error create a non-linear relationship between data integrity and value. For example, when data quality is high, complexity is reduced dramatically. When issues are introduced, value is quickly eroded.

  • Identity complexity: With multi-channel customer experiences, Identity has become increasingly complex. Traditional architectures typically have rigid identity schemes (Golden Records, MDM etc) and have difficulty dealing with mutable identities, multiple identities, and state transitions which are common with the new customer journey. For example, customers may have multiple emails or may switch email addresses, social handles, devices, mobile #’s, etc. Without accounting for this behavior, any customer data platform is going to be restricted in its use. 

  • Governance: Personalization creates complicated governance issues for more complex companies, especially media and multinational corporations. This is primarily the result of consumer behavior not aligning to corporate structures, unlike other business processes. Customer data must be either siloed or effectively coordinated across business units, media properties and regional lines. Most platform deployments default to the former, as data schemas do not take into account this ‘cross unit’ consideration. Additionally, privacy considerations complicate matters as well. 

  • Channel-specific requirements: As consumers’ expectations of brands continue to increase, teams are challenged to leverage various engagement channels to their full potential. These channels each have unique and highly-nuanced features, along with semantic behaviors that must be addressed in data management. Generalized data management capabilities largely fail to address these needs. 

Reverse ETL offers an interesting activation solution for data engineering teams building a data architecture around the cloud data warehouse. The reduction in both the cost of storage and compute has made this appealing on the surface. And while this approach may solve some of the technical challenges around customer data activation, they too ignore the market realities around data chaos as well as the idiosyncratic nature of customer data. More specifically, the operational challenges caused by perpetual changes, as well as the huge opex burden created by the need to manage those changes. Lastly, basic questions such as “how is data quality managed?” and “how is privacy incorporated?” are usually brushed aside entirely.

To clarify a bit further, the gap between operational challenges and technical challenges are the following:

  • Updates to Event tracking, naming, schema management, and how those changes propagate to downstream APIs
  • The dynamic nature of integrations, and the sheer volume of changes & maintenance to partner APIs.
  • Identity resolution related challenges which dynamically address complex state transitions between known and anonymous states, and the impact on personalization.
  • Privacy requirements, and how mutable identities impact certain use cases, and the treatment of related data across multiple integrated systems.
  • Workflows and interfaces to better address organizational and information silos.

On the other end of the spectrum, you have Application CDPs & Marketing Suites focused on providing powerful tooling for marketers including rich audience insights and segmentation capabilities. These are valuable capabilities only when there is a strong foundation of adaptability rooted in data quality and governance. 

What’s required is a solution that helps teams adapt to this perpetual state of change. The best dashboards and segmentation capabilities will not make an impact if the data is bad or the pipelines break. Garbage in, garbage out, and when you're talking about petabytes of data moving through the system in real time out to a dozen applications, that's a lot of garbage.

The final piece is real-time personalization, which is critical to most digital strategies. Marketing in the moments that matter is often the difference between your success and your competitor’s. The vast majority of the solutions mentioned above aren't built on real-time architecture from ingestion all the way through to connection. They don’t offer integrated identity resolution capabilities and can't drive impact in the moments that matter, leaving brands susceptible to competitive conquest.

If brands want to solve the activation challenge, they need to begin by solving the data chaos challenge, including both the technical as well as the operational aspects. Only customer data infrastructure (CDI) helps solve these challenges. CDI should work alongside your Cloud Data Warehouse, treated as complementary not competitive, and should underpin your activation tools, which can even include your Application CDP.

Latest from mParticle

See all insights
mParticle 2.0


Deep-dive into the new mParticle: A unified platform and updated UI

The new mParticle featured image thumbnail


Welcome to the new mParticle

Mach Alliance


Leading the next generation of CDP solutions: mParticle celebrates acceptance into the MACH Alliance