Engineering—May 27, 2015

How to improve data control with SDK consolidation

SDK consolidation can make all of the difference for companies looking to improve their data consistency and decrease technical debt.

Lots of great companies are building compelling technology to help app owners do more, grow faster, and become smarter. As the ecosystem of app services continues to expand, many apps have begun to invest in data infrastructure solutions like mParticle designed to help them take better control of their data and more easily adapt to the changing landscape.

In some instances, this involves consolidating the number of SDKs that are capturing data within the app to a single solution and moving to server-side integrations. In seeking to create data consistency and pay down some of the technical debt created along the way, there are also tradeoffs to consider. In this post, I’ll highlight the value and risk of both consolidated and distributed setups.

Distributed SDKs:

Value

By embedding a number of disparate SDKs within an app, the app can (in theory) diversify dependence across multiple partners. Embedding SDKs directly is probably suitable if there is little to no overlapping data requirements across vendors, and there are no aspirations of adding new partners anytime soon. This is a set it and forget it approach, which can work for periods of time.

Additionally, by maintaining multiple embedded end points, one could argue there is no “consolidation risk” or single point of failure. In a hypothetical situation, the infrastructure provider has a technical outage or the provider one day ceases to do business altogether which would, in theory, create acute operational pain for the client. The implication is that event could happen without any warning and would require all automated end points to be manually reconfigured, which would be time-consuming. By embedding and maintaining multiple end points, disruption from any partner interruption remains localized to the affected partner.

Risk

By choosing to embed each SDK individually, an app is choosing to create and support an extremely expensive process operationally. Maintaining individual SDKs introduces engineering inefficiency, creates data inconsistency, and limits vendor effectiveness. This setup creates massive opportunity costs as well as switching costs. By embedding multiple end points, you lose control over your data and are devaluing your engineers time. These costs have a compounding effect and can become paralyzing to the operation over time.

While it may reduce a certain type of perceived risk, it shifts risk from perceived consolidation risk to actual operational risk.

What happens to your historical data if you want to move from one system to another? Your data becomes trapped and you have lost complete control over your data. So the idea of control and diversification is a bit misguided since vendor lock-in becomes a significant risk and a real operational hurdle over time in this setup.

Next, what happens to the code your engineering team created to embed the original service, can you reuse it? The answer is no. There are parts of the data plan that are reusable but the code itself is not. Embedding SDKs on a one-off basis is a slow, expensive process that devalues your engineers’ time and code.

Additionally, a fragmented or distributed setup provides no assurances that risk can be properly mitigated. Services can and often do still fail individually, and many are not compatible with other services – not to mention the fact that you are trading actual control for perceived control. Often times different stakeholders across different groups are the reason multiple SDKs are required in the first place and if any of them fail for any reason, it can require multiple groups to coordinate to find a solution and potential replacement if required.

Consolidated Setup:

Value

By investing in infrastructure that moves data distribution server-side, the main benefits are speed, control, and economies of scale from a single engineering initiative.

By investing in data automation, a single client side deployment provides access to an ecosystem of tools and services. Apps can save time (engineering time, time to onboard new partners, etc), save money (reduce data inefficiencies across vendors, utilize data sampling, etc), and create additional value by using data to improve marketing effectiveness.

The typical vendor vetting process can last several months and the risk of choosing the wrong partner is very real. By moving to server-side connections, you can turn on/off services with point and click simplicity and accelerate the optimization of vendor configurations. Create A/B tests and have vendor bake-offs to make informed decisions, all without code changes or App Store approval.

There is also control over data. Without a consolidated solution it’s impossible to send historical data to new services, so anytime you change vendors you are divorcing yourself from your most valuable asset. Additionally, privacy and security become much easier to control without additional codes changes. Finally, predictability over engineering timelines is probably the greatest form of control.

Operationally, it also removes uncertainty across multiple groups, individual owners, and vendors.

Risk

The main argument against consolidating end points is that it creates a single point of failure. If the infrastructure provider goes down or an API changes, what happens? The data platform should have a comprehensive notification system in place to be able to assess and triage the situation. Understanding the controls in place to mitigate this risk are important and an important question to ask.

Another question to ask is what happens if data cannot be transmitted from the device to the server, and/or from the server to any of the Service Providers? In mobile, any system must be built to handle limited connectivity. The data collected inside the client should be stored locally until a network connection is restored. The API connections to services must be setup with proper alerts and notifications. Having tools like data replay help provide for additional redundancy so that historical data can be sent to any partner at any time.

With proper alerting, notification, and redundancy it’s easy to address to the single point of failure. We recommend that anyone looking to consolidate end points ask these important questions.

Conclusions:

With technical debt, like any kind of debt, there is a big difference between consolidation and settlement. There is always going to be some risk present and you have to analyze the tradeoff between value and risk.

The biggest argument from both sides is about control. With a consolidated approach, there is control over data, cost structure, engineering roadmap, privacy, security, and the vendor management process. A distributed setup provides control over a hypothetical situation and there is still significant risk in distributed SDKs, but the value is localized.

In the end, we believe that the harder you try to not fail, the less likely it is that you will succeed.

AuthorMichael KatzCEO & Co-Founder