Skip to main content
Data Management

Data Management Platforms: Three Costly Errors and How to Avoid Them

Three mistakes we see again and again in data management platform projects — and the practical ways to avoid each one before they eat your budget.

John Lane 2025-12-02 6 min read
Data Management Platforms: Three Costly Errors and How to Avoid Them

Data management platform projects are one of the most reliable ways we know for a mid-market organization to spend a lot of money slowly and arrive in year two with less usable data than they had at the start. That sounds harsh, and it is, but we have watched the pattern enough times over the last twenty-three years that we feel comfortable being direct about it.

The good news is that the failure modes are remarkably consistent. There are three errors we see in nearly every troubled project, and each one has a practical fix that is cheap if you apply it early and expensive if you apply it late. This post is about those three errors and the lessons we have learned from helping customers recover from them.

Error 1: Buying the platform before defining the business question

The most common failure mode starts with a vendor pitch. Someone in leadership sees a compelling presentation — usually from a Snowflake, Databricks, Fabric, or BigQuery account team — and authorizes a proof of concept. The platform gets provisioned, the data engineering team is asked to "start loading data," and six months later the organization has an expensive data lake with no clear answers to any specific business question.

This pattern fails because it inverts the correct sequence. The right order is: define the business question first, identify the decisions that the answer will drive, identify the data needed to answer the question, and only then choose a platform capable of hosting and serving that data. Start from "we want to know which customers are likely to churn in the next 90 days" and work backward. Do not start from "we need a data platform" and work forward.

The cost of getting this wrong is substantial. We have seen customers spend six-figure annual bills on Snowflake for a dataset they could have served from a single SQL Server instance, because nobody asked the question of whether the problem they were actually trying to solve required a warehouse at all. We have also seen customers invest in expensive data engineering capability for use cases that were ultimately served by a Power BI dashboard against an operational database.

The fix is discipline at the front of the project. Before you sign the platform contract, write down the three business questions the platform will answer in the first year. Assign an owner from the business side, not from IT, for each question. If you cannot produce the list, you are not ready to buy the platform.

Error 2: Underestimating the data quality and integration work

The second failure mode is a math error on the project plan. Vendors — all of them, across all the data platform categories — systematically under-represent how much of the work is in the data integration and data quality layer. The platform itself gets provisioned in a few weeks. The data loaded into it gets cleaned, standardized, de-duplicated, joined, and made trustworthy over months or years.

We have seen project plans that allocate two months to "data ingestion" for a data warehouse implementation that, in practice, took eighteen months to produce anything resembling clean, queryable, trustworthy data. This is not a criticism of the team that did the work. It is a criticism of the project plan that lied to leadership about the timeline.

The reason the estimates are consistently wrong is that nobody knows how messy the source data is until they start loading it. Systems that looked tidy from the outside turn out to have decades of workarounds, inconsistent naming, duplicate records, nullable fields that are actually required, and free-text fields that were supposed to be categorical. Every one of these has to be understood and handled before the data is usable downstream.

The fix has two parts. First, always do a data profiling exercise against the key source systems before you commit to a platform migration timeline. Spend two or three weeks running profiling tools (dbt profiling, SQL queries, Power Query profiling, whatever fits your stack) against the biggest, dirtiest sources you plan to include. This will surface most of the skeletons before they become sprint-ending surprises. Second, build a timeline that dedicates 50 to 70 percent of total project effort to data quality and integration work. If the vendor's proposal allocates less than that, push back hard.

Error 3: Building the platform without a governance and access model

The third failure mode happens later in the project, typically six to twelve months in. The platform is working, data is flowing, dashboards are being built, and the organization realizes it has accidentally built a parallel data silo with no clear rules about who can access what, who owns which datasets, who is allowed to publish a dashboard, and what "the number" is when two different dashboards give different answers to the same question.

This failure looks like a governance problem at first glance. It is actually a design problem. The governance and access model should have been part of the platform design from day one, not bolted on after the first dashboards are in production.

The concrete practices that avoid this error are not new. Assign an owner for every dataset at the time it is ingested. Define sensitivity classifications — public, internal, confidential, regulated — and apply them to datasets before anyone queries the data. Decide who approves new dashboards that reference sensitive data. Decide how metric definitions are standardized (the "revenue" column has to mean the same thing everywhere, and the definition has to live in a place people can find). Most of this is boring. All of it is necessary.

The fix is to set up the governance scaffolding before loading the first real dataset. At minimum: a metadata catalog that every dataset lands in, a standard set of sensitivity tags, and a named data steward from the business side for each major domain. Platforms like Microsoft Purview, Atlan, Collibra, and dbt's semantic layer all help, but the platform is a small part of the solution. The ownership and the rules matter more.

A fourth error, honorable mention

We could have written this as a list of four. The fourth common error is treating the data platform project as an IT initiative when it is actually a change-management initiative. The platform will only be valuable if people across the organization change how they make decisions, and that change doesn't happen because IT stood up a dashboard. It happens because leadership insists on the data being used, celebrates people who use it well, and tolerates the discomfort of the first few months when the data contradicts a long-held assumption.

Every successful data platform deployment we have seen has an executive sponsor who actively uses the data themselves. Every unsuccessful one has a sponsor who delegated the project to IT and then asked for status updates.

What a successful project looks like

To put the three errors in context, here is what the successful pattern looks like instead.

It starts with three named business questions and three named business owners. It commits to answering those questions in a defined timeframe, usually 90 to 180 days for the first answer. It does a real data profiling exercise against the sources and allocates majority of the timeline to data quality work. It sets up the governance scaffolding — ownership, classification, metric definitions — before the first dataset lands. And it has an executive sponsor who is personally using the dashboards within 30 days of the first one going live.

That pattern delivers value. The patterns that skip any of those elements tend to deliver an expensive piece of infrastructure and a pile of unfulfilled promises.

Three takeaways

  1. Start with the business question, not the platform. If you cannot name the three questions you want answered in the first year, you are not ready to buy anything.
  2. Plan for the data quality work honestly. It is 50 to 70 percent of the real timeline. Vendors will not tell you this. Build your project plan around it anyway.
  3. Governance is a day-one design decision. Dataset ownership, sensitivity classification, and metric definitions have to exist before the platform goes live. Bolting them on later is painful.

Talk with us about your infrastructure

Schedule a consultation with a solutions architect.

Schedule a Consultation
Talk to an expert →