10 Costly Data Integration Project Mistakes

costly data project mistakes

Every data integration project is different, but they're all susceptible to some of the same design and implementation mistakes. 

CloverDX Senior Consultant Kevin Scott walked through some of the common pitfalls in our webinar, plus gave some tips on how to avoid them in your projects. Watch the webinar or read the transcript below. 

Webinar 10 Costly mistakes

1. Starting too quickly

One common mistake we see is starting a data integration project too quickly. Sometimes referred to as ‘Ready, Shoot, Aim’. When deadlines are looming it’s tempting to start making immediate progress. 

The danger here is wasted effort, because you’re likely processing without a firm set of requirements. And when those requirements do surface or change, it’s going to be costly to go back and re-work.

Try to have some objectives and measures in place at  the outset to avoid the temptation to rush in.

2. Not thinking about solution scaling

When you’re approaching a project, the temptation is often to focus on the current state – the current character of the data – and then to design a solution around that state.

But the data you’re working with is dynamic. It will grow. And will also likely change structures and formats. And your processes might also change – maybe today you need a daily batch of data to move in a two-hour window, or handle 1000 new events per hour throughout the day, but these are rarely static numbers.

It's important to consider this moving target at the beginning. 

10 costly data integration mistakes

10 > 16 Costly Mistakes 

So our first two mistakes: don’t rush in, and spend some time thinking about scale. 

And when I applied these two lessons to this talk, I discovered I actually have 16 mistakes, not 10.

Let’s start with some mistakes that can happen as you initially begin to define a data problem and possible solution. I already started with the two above.

Preliminary Mistakes in Data Integration Projects:

3. Incorrect view of solution lifespan

Closely related to the scaling problem is misunderstanding the lifespan of your data solution. It’s too easy to think of a data integration (DI) project as a one-off, with definite start and end dates. The reality is that most DI projects are better thought of as an ongoing initiative.

Most data integration projects are never done. (Maybe counter-intuitively, this is particularly true for successful projects as people will often want more).

Healthy business processes continually evolve, and the data processes need to evolve in parallel. It is a mistake to plan for a DI project to have a certain specific end-point. 

4. Magical thinking about technology

There’s a great marketplace of software tools available for your data integration project, and you'll probably spend significant effort evaluating and choosing the right data tools.

But it’s a mistake to confuse that effort with the effort of actually building your solution. The tools will make you more efficient, but the hard work of designing your data pipeline will still be hard work.

No tool is magic – be mindful of what your tool can do for you and what it can’t. And remember there will still be work to do.

5. Mis-alignment with your business user

As you define your data integration project, you will need to think carefully of end users. If your solution is not aligned with user expectation, either in their needs or their capabilities, your success will be limited.   

If your users think the solution is too difficult or time-consuming to use, your effort will be wasted when no-one uses it. Underestimating need can also be a problem if the solution gets used more than you had planned for and maybe ends up failing under the load.

Take the time to understand your users and align your solution to (1) their skills (2) their work context and of course (3) their needs. 

6. Mis-interpreting the motivation for change

misinterpreting motivation for change

A data integration project doesn’t often start with a clean slate. More likely is that there are processes in place that are no longer working. Maybe they’re too slow, maybe broken, maybe too difficult to maintain.

Not taking the time to fully recognize, understand and memorialize this pain can be a mistake.  

Fail to understand the motivating pain and you risk building a solution that may be better, but not better in ways that actually address that pain. We’ve seen projects that make some technology change, or move from on-prem to the cloud, without really addressing the root motivation. 

Take time to translate the motivation for change into actual measurable objectives for the project. 

Mistakes in Assembling the Team:

Let's move on to a new class of mistakes, ones that tend to surface as you assemble the team that is going to work on the project.

7. Missing business owner

Your project needs a business owner. Someone that understands the business goals of the project and can answer questions related to functionality, schedule and tradeoffs – and someone with authority to change the project scope.  

Equally importantly, the business owner must be actively involved in the project. Too often business owners are figureheads, attending a weekly status meeting or keeping only a passing interest in progress. 

Passive business owners attending weekly rubber-stamp status meetings will invariably cause project delays that will quickly accumulate.

8. Don’t overbuy technology

Overbuying technology

The choice of data integration software can be dizzying. It’s tempting to try and choose tools that will meet all your data management needs -  for this project and others, and for needs that might occur in the future.   

In our experience, it can be a better to take a more measured approach to software selection. Don’t insist on features you don’t need in the next year, or are unable to use.  For example, we’ve seen this a lot with big data technologies: until recently every data integration project added Hadoop to the end of their tool requirements list, seemingly regardless of any plans for big data analytics. 

Focus on the core capabilities that will allow you to get you to your project goals the quickest.

9. Skipping training

When you assemble a team, they will of course be confident, smart and well-suited for the project. It can be tempting to presume that they can learn new tools quickly by themselves and don’t need any formal training.  

In our experience it can be a mistake to skip this training. In training, you’ll likely learn tips and tricks that don’t surface in self-guided study, and expert training can often provide more than just tool expertise. Your vendor’s training staff often has valuable experience with many similar projects, and a session with them can be a source of domain expertise.

Establishing a personal relationship with technical staff from your vendor can also be a valuable resource as you execute.

Take time to choose trainer carefully. Ask your vendor about the trainer and make sure it is right fit and that they understand your project.

10. Deciding how much to DIY 

How much third party help do you want for your project? This depends on many factors, from the skill set and availability of your internal staff to the aggressiveness of your schedule.  

There is a spectrum of building-it-yourself options. What parts of the project do you want to build? What parts do you want to buy? Spend the time necessary to choose where on that spectrum to position your project. 

At one extreme is a 100% custom solution built without any tools. Maybe you have a strong internal team of developers who may be able to make rapid initial progress on core project goals – the goals that address the pain we talked about earlier.

The mistake we see with this approach deals with underestimating the ‘assumed’ features that surround a data integration solution – things such as data quality assessment, logging, error handling and reporting, security, monitoring and so on. These foundational features can cause your solution to quickly gain weight beyond the core requirements.

Your programming team may be able to build it, but it that the best use of their time? 

Be realistic about the true scope of your project, and reserve your internal development resources to build specific business features, augmenting with third party tools where necessary or desirable.

11. Misjudging cost

Closely related to the decision of where to place your project on the DIY spectrum is your assessment of total project cost.  Misjudging costs is a common mistake. 

At the outset it is tempting to equate the cost of the tool = cost of the solution.  

In addition to the cost of development, you need to consider the cost of operating the finished solution; maintenance cost of fixing an issue; and the opportunity cost of experiencing an incident.  

12. Choosing the wrong solution architecture

Choosing the wrong solution architecture (text against an abstract background)

Data integration architectures can of course run from the very basic project-specific to a more generic framework centric. 

I need to be careful not to over-generalize, but in my experience, the more DIY flavor a project has, the more likely it is to tend towards project-specific architecture.

When I say framework, I mean some abstract of generic approach to architecture, often driven by some configuration file. So instead of writing code that reads a specific customer file and inserts its content into a customer table in a database, you create a framework which will read an arbitrary file and insert its content into an arbitrary table, based on some guidance in a configuration file. 

Data integration frameworks can appear intimidating and unnecessarily complex. Initial progress can seem slow, when compared to a more direct treatment of project specific requirements.   

Project-specific architecture is easier to conceive and faster to develop, but is less flexible and extensible.  

Also, frameworks will be more likely to support corporate data governance and auditability goals and be more tolerant, and even welcoming to business process changes and resulting data changes.   

Don’t underestimate the value of generic frameworks, particularly if you expect growth in the amount and type of data you will ultimately process. 

Mistakes in Project Execution:

13. Skipping a proof-of-concept (POC) 

A POC can:

  • Serve to confirm your planned approach
  • Help clarify and validate your needs
  • Provide early assessment for any outside consultants

But be careful how you represent a POC to stakeholders. POCs are great for validating requirements, but can also convey a false sense of progress to the casual observer.

A POC should address the most important parts of the project – regardless of whether or not it is the hardest part.   

14. Lack of transparency during the process

The goal is to avoid surprises. Demo as much as possible as early as possible to as many stakeholders as possible.  Call it Agile or Lean development from software.

Demo early and often. No big reveals.

15. Omitting processes for handling bad data

Omitting processes for handling bad data (text against an image of a traffic light)

Handling bad data is often an afterthought in data projects and that can be a costly mistake.

When bad data creeps into our systems it can affect credibility and can be expensive to repair. Bad data handling should be a core part of your DI project.

Remedies can include adding explicit validation stages to look for bad data throughout the pipeline and catch it as early as possible, and to define processes to not only detect but to correct bad data. 

Don’t just focus on the happy path in your data pipelines.

New call-to-action

16. Omitting a formal testing phase

I have left discussion of testing as the last item.  Ironic, because this is also often left as the final step in DI projects - which is a mistake.

The biggest mistake we see in the testing phase is the failure to obtain relevant and valid test data.

Robust test data is vital to ensure your solution works, but at the same time remarkably difficult to obtain.  The best test data is actual production data, but that is almost always off limits, because (a) it is in production on production infrastructure and (b) it may hold sensitive or PII data. Synthesizing or anonymizing test data is the next best thing, but make sure you plan and budget for this effort as well.

webinar - data anonymization for better software testing - watch now

We have seen projects delayed months due to the inability to obtain test data of quality sufficient to perform the required QA assessment. 

Don’t leave QA to the end of your project. 

And that wraps up the list. I am sure you have seen many of these before, but I trust at least a couple of them have given you something new to think about.

Want to avoid these mistakes in your project?

At CloverDX we have been helping clients plan and execute data integration projects for nearly 20 years, and we welcome the opportunity to talk with you in more detail about your specific challenges, and help you avoid some of these costly mistakes.  

Deliver a successful data project - get a demo


Posted on February 17, 2020

Where to go next