CloverDX Blog on Data Integration

16 costly data integration project mistakes (and how to avoid them)

Written by Kevin Scott | February 17, 2020

Every data integration project is different, but they're all susceptible to some of the same design and implementation mistakes. 

CloverDX Senior Consultant Kevin Scott walked through some of the common pitfalls in our webinar, plus gave some tips on how to avoid them in your projects. Watch the webinar or read the transcript below. 

1. Starting too quickly

One of the most common data integration challenges is starting a data integration project too quickly. We refer to this as ‘Ready, Shoot, Aim’. When deadlines are looming it’s tempting to start making immediate progress. 

The danger here is wasted effort, because you’re likely processing without a firm set of requirements. And when those requirements do surface or change, it’s going to be costly to go back and re-work.

Try to have some objectives and measures in place at  the outset to avoid the temptation to rush in.

2. Not thinking about solution scaling

When you’re approaching a project, the temptation is often to focus on the current state – the current character of the data – and then to design a solution around that state.

But the data you’re working with is dynamic. It will grow. It'll change structures and formats, and your processes might also change.

Today, you might need a daily batch of data to move in a two-hour window, or handle 1000 new events per hour throughout the day, but these are rarely static numbers.

It's important to consider this moving target at the beginning.

10 > 16 costly mistakes 

To summarize our first two mistakes: don’t rush in, and don't forget to spend some time thinking about scale. 

However, when I applied these two lessons to this talk, I discovered I actually have 16 mistakes, not 10.

Let’s start with some mistakes that can happen as you initially begin to define a data problem and possible solution. 

Preliminary mistakes in data integration projects

3. Incorrect view of solution lifespan

Misunderstanding the lifespan of your data solution is another data integration project mistake.

It’s too easy to think of a data integration (DI) projects as a one-off, with definite start and end dates. The reality is that most DI projects are better thought of as an ongoing initiative.

Most data integration projects are never done. (Maybe counter-intuitively, this is particularly true for successful projects as people will often want more).

Healthy business processes continually evolve, and the data processes need to evolve in parallel. It is a mistake to plan for a DI project to have a certain specific end-point.

4. Magical thinking about technology

There’s a great marketplace of software tools available for your data integration project. You'll probably spend significant effort evaluating and choosing the right data tools.

But it’s a mistake to confuse that effort with the effort of actually building your solution. The tools will make you more efficient, but the hard work of designing your data pipeline will still be hard work.

No tool is magic – be mindful of what your tool can do for you and what it can’t. And remember there will still be work to do.

Should you build or buy your data integration solution?

5. Misalignment with your business user

As you define your data integration project, you'll need to think carefully about your end users. After all, if your solution is not aligned with user expectation, either in their needs or their capabilities, your success will be limited.   

If your users think the solution is too difficult or time-consuming to use, your effort will be wasted when no-one uses it. Underestimating need can also cause problems if the solution gets used more than you had planned for and maybe ends up failing under the load.

Take the time to understand your users and align your solution to: (1) their skills, (2) their work context, and (3) their needs.

6. Misinterpreting the motivation for change

A data integration project doesn’t often start with a clean slate. Often, there are processes in place that are no longer working. Maybe they’re too slow, broken, or too difficult to maintain.

Not taking the time to fully recognize, understand and memorialize this pain can be a mistake.  

If you fail to understand the motivating pain, you risk building a solution that may be better, but not better in ways that actually address that specific pain. We’ve seen projects that make some technology change, or move from on-prem to the cloud, without really addressing the root motivation. 

Take time to translate the motivation for change into actual measurable objectives for the project.

 

Mistakes in assembling the team

Let's move on to a new class of mistakes, ones that tend to surface as you assemble your data integration project teams.

7. Missing business owner

Your project needs a business owner. This should be someone that understands the business goals of the project and can answer questions related to functionality, schedule and tradeoffs. Ideally, they should also be someone with authority to change the project scope.  

Equally, the business owner must be actively involved in the project. Too often business owners are figureheads, attending a weekly status meeting or keeping only a passing interest in progress. 

Passive business owners attending weekly rubber-stamp status meetings will invariably cause project delays that will quickly accumulate.

8. Don’t overbuy technology

The choice of data integration software can be dizzying. It’s tempting to choose tools that will meet all your data management needs -  for this project and others, and for needs that might occur in the future.   

In our experience, it can be a better to take a more measured approach to software selection. Don’t insist on features you don’t need in the next year, or are unable to use.  For example, we’ve seen this a lot with big data technologies: until recently every data integration project added Hadoop to the end of their tool requirements list, seemingly regardless of any plans for big data analytics. 

Focus on the core capabilities that will allow you to get you to your project goals the quickest.

9. Skipping training

When you assemble a team, they will of course be confident, smart and well-suited for the project. It can be tempting to presume that they can learn new tools quickly by themselves and don’t need any formal training.  

In our experience, it's a mistake to skip this training. In training, you’ll likely learn tips and tricks that don’t surface in self-guided study, and expert training can often provide more than just tool expertise. Your vendor’s training staff often has valuable experience with many similar projects, and a session with them can be a source of domain expertise.

Establishing a personal relationship with technical staff from your vendor can also be a valuable resource as you execute.

Take time to choose a trainer carefully. Ask your vendor about the trainer and make sure it is right fit and that they understand your project.

10. Deciding how much to DIY 

How much third party help do you want for your project? This depends on many factors, from the skill set and availability of your internal staff to the aggressiveness of your schedule.  

There is a spectrum of build-it-yourself options. What parts of the project do you want to build? What parts do you want to buy? Spend the time necessary to choose where on that spectrum to position your project. 

At one extreme, is a 100% custom solution built without any tools. Maybe you have a strong internal team of developers who can make rapid initial progress on core project goals – the goals that address the pain we talked about earlier.

The mistake we see with this approach deals with underestimating the ‘assumed’ features that surround a data integration solution – things such as data quality assessment, logging, error handling and reporting, security, monitoring and so on. These foundation features can cause your solution to quickly gain weight beyond the core requirements.

Your programming team may be able to build it, but it that the best use of their time? 

Be realistic about the true scope of your project, and reserve your internal development resources to build specific business features, augmenting with third party tools where necessary or desirable.

11. Misjudging cost

Closely related to the decision of where to place your project on the DIY spectrum is your assessment of total project cost.  Misjudging costs is a common mistake. 

At the outset, it's tempting to equate the cost of the tool with the cost of the solution.  

In addition to the cost of development, you need to consider the cost of operating the finished solution; the maintenance cost of fixing an issue; and the opportunity cost of experiencing an incident.

12. Choosing the wrong solution architecture

Data integration architectures can of course run from the very basic project-specific to a more generic framework. 

I need to be careful not to over-generalize, but in my experience, the more DIY flavor a project has, the more likely it is to tend towards project-specific architecture.

When I say framework, I mean some generic approach to architecture, often driven by some configuration file. So instead of writing code that reads a specific customer file and inserts its content into a customer table in a database, you create a framework which reads an arbitrary file. This then inserts its contents into an arbitrary table, based on some guidance in a configuration file. 

Data integration frameworks can appear intimidating and unnecessarily complex. Initial progress can seem slow, when compared to a more direct treatment of project specific requirements.   

Project-specific architecture is easier to conceive and faster to develop, but is less flexible and extensible.  

Additionally, frameworks will be more likely to support corporate data governance and auditability goals and be more tolerant, and even welcoming to business process changes and resulting data changes.   

Don’t underestimate the value of generic frameworks, particularly if you expect growth in the amount and type of data you will ultimately process. 

 

Mistakes in project execution

13. Skipping a proof-of-concept (POC) 

A POC can:

  • Serve to confirm your planned approach
  • Help clarify and validate your needs
  • Provide early assessment for any outside consultants

But be careful how you represent a POC to stakeholders. POCs are great for validating requirements, but can also convey a false sense of progress to the casual observer.

A POC should address the most important parts of the project – regardless of whether or not it is the hardest part. 

14. Lack of transparency during the process

The goal is to avoid surprises. Demo as much as possible as early as possible to as many stakeholders as possible.  Call it Agile or Lean development from software.

Demo early and often. No big reveals.

15. Omitting processes for handling bad data

Handling bad data is often an afterthought in data projects and that can be a costly mistake.

When bad data creeps into your systems, it can affect credibility and can be expensive to repair. Bad data handling should be a core part of your DI project.

Remedies can include adding explicit validation stages to look for bad data throughout the pipeline and catch it as early as possible, and defining processes to not only detect but to correct bad data. 

Building data pipelines to handle bad data: How to ensure data quality

Don’t just focus on the happy path in your data pipelines.

 

16. Omitting a formal testing phase

I have left the discussion of testing as the last item.  Ironic, because this is also often left as the final step in DI projects.

The biggest mistake we see in the testing phase is the failure to obtain relevant and valid test data.

Robust test data is vital to ensure your solution works, however it's remarkably difficult to obtain.  The best test data is actual production data, but that is almost always off limits, because: a) it is in production on production infrastructure, and b) it may hold sensitive or PII data. Synthesizing or anonymizing test data is the next best thing, but make sure you plan and budget for this effort as well.

 

We have seen projects delayed months due to the inability to obtain quality test data needed to perform the required QA assessment. 

Don’t leave QA to the end of your project. 

And that wraps up the list. I am sure you have seen many of these before, but I trust at least a couple of them have given you something new to think about.

Want to avoid these mistakes in your project?

At CloverDX, we've helped clients plan and execute data integration projects for nearly 20 years.  We welcome the opportunity to talk with you in more detail about your specific challenges, and help you avoid some of these costly mistakes.