• Blog
  • Podcast
  • Contact
  • Sign in
CloverDX Logo
Product
  • OVERVIEW
  • Discover CloverDX Data Integration Platform###Automate data pipelines, empower business users.
  • Deploy in Cloud
  • Deploy on Premise
  • Deploy on Docker
  • Plans & Pricing
  • Release Notes
  • Documentation
  • Customer Portal
  • More Resources
  • CAPABILITIES
  • Sources and Targets###Cloud and On-premise storage, Files, APIs, messages, legacy sources…
  • AI-enabled Transformations###Full code or no code, debugging, mapping
  • Automation & Orchestration###Full workflow management and robust operations
  • MDM & Data Stewardship###Reference data management
  • Manual Intervention###Manually review, edit and approve data
  • ROLES
  • Data Engineers###Automated Data Pipelines
  • Business Experts###Self-service & Collaboration
  • Data Stewards###MDM & Data Quality
clip-mini-card

 

Ask us anything!

We're here to walk you through how CloverDX can help you solve your data challenges.

 

Request a demo
Solutions
  • Solutions
  • On-Premise & Hybrid ETL###Flexible deployment & full control
  • Data Onboarding###Accelerate setup time for new data
  • Application Integration###Integrate operational data & systems
  • Replace Legacy Tooling###Modernize slow, unreliable or ad-hoc data processes
  • Self-Service Data Prep###Empower business users to do more
  • MDM & Data Stewardship###Give domain experts more power over data quality
  • Data Migration###Flexible, repeatable migrations - cloud, on-prem or hybrid
  • By Industry
  • SaaS
  • Healthcare & Insurance
  • FinTech
  • Government
  • Consultancy
zywave-3

How Zywave freed up engineer time by a third with automated data onboarding

Read case study
Services
  • Services
  • Onboarding & Training
  • Professional Services
  • Customer Support

More efficient, streamlined data feeds

Discover how Gain Theory automated their data ingestion and improved collaboration, productivity and time-to-delivery thanks to CloverDX.

 

Read case study
Customers
  • By Use Case
  • Analytics and BI
  • Data Ingest
  • Data Integration
  • Data Migration
  • Data Quality
  • Data Warehousing
  • Digital Transformation
  • By Industry
  • App & Platform Providers
  • Banking
  • Capital Markets
  • Consultancy & Advisory
  • E-Commerce
  • FinTech
  • Government
  • Healthcare
  • Logistics
  • Manufacturing
  • Retail
Migrating data to Workday - case study
Case study

Effectively Migrating Legacy Data Into Workday

Read customer story
Company
  • About CloverDX
  • Our Story & Leadership
  • Contact Us
  • Partners
  • CloverDX Partners
  • Become a Partner
Pricing
Demo
Trial

Automating data pipelines: Why you should move on from scripts or Excel

Data Automation Replacing Legacy Systems
Posted October 06, 2023
8 min read
Automating data pipelines: Why you should move on from scripts or Excel

You’ve got a simple task – move some data or prepare it for use. It seems like a pretty quick job, so you solve the problem with some Excel, or maybe some scripting. Job done, right?

But over time the simple task may have revealed itself to be…. not so simple. It’s maybe evolved to be more complex, or perhaps it’s become too crucial to the business to be handled casually. Maybe it’s time to consider moving on from homegrown ETL solutions to something more robust and reliable that will enable you to handle more data, more easily.

This post is based on a recent webinar on Automating Data Pipelines. You can watch the full video here, or read the post below.

 

What do we mean by homegrown ETL solutions?

When we talk about ‘homegrown’ or DIY ETL solutions, we mean something that’s been designed and developed in-house to solve a particular data problem. It’s often a combination of several things:

  • Manual process: Your ETL solution might begin life as just a set of manual processes. Grab some data from a source; drop it into Excel; do some magic with copy/paste, text-to-columns, formulas, or Vlookups; export the result to CSV and pass that along to some other process or data endpoint.
  • Scripts: Or you might have a power user skilled with scripting, using a shell, Excel macros, Python or SQL, and they can write code to shape, map and prepare data.
  • Custom applications: Larger companies might have in-house IT departments who can build custom solutions to address specific requirements.

There are very understandable reasons you may have originally decided to use any of the above approaches:

  • Naïve assessment of the task: It may have seemed like a simple task at the outset
  • Urgency: Maybe there was a tight deadline and you didn’t have the time to research or go through the buying process for a third-party tool
  • Exceptional requirements: You might have thought that your requirements are too challenging for an off-the-shelf solution to handle
  • Exceptional team: Maybe you had a highly skilled dev team already – they understood the project and they’re eager to build something bespoke
  • Habit: Or it could be that this is the way you or your company has always done these projects

But at some point these DIY solutions might start causing problems. Some might be manageable, but some might be posing risks to your business.

Modernize your data processes with CloverDX: See how you could save time and eliminate risk

The risks of homegrown data scripts

Homegrown scripted data processes can pose risks in several different areas:

  • Not being able to keep up with increasing volumes of data: When scripted processes take so long to run, and often have issues that cause them to fail, your users can be left without the data they need.
  • Ad-hoc solutions that might stop working: When the business is relying on a scripted process, it’s vulnerable to changes that are out of your control. We had one customer who came to us worried about Excel’s increasing restrictions on macros and VBA, given how much their organization relied on Excel-based data processing. As he described it, “If Microsoft turned off that VBA, our finance department would be in tears. They wouldn’t be able to do their job.”

“If Microsoft turned off that VBA, our finance department would be in tears. They wouldn’t be able to do their job.”

 

  • Wasting time on repetitive tasks: Processes that might not even be very complex can take up a significant amount of time. A couple of hours on a report every week add up quickly when there are multiple reports to prepare.
  • Lack of visibility into errors. It can be hard to know if your data is reliable. Often, the first indication of a problem with a scripted solution is when the end user reports a problem with their data. (That’s if they even notice – what often happens is no one realizes that e.g. a particular source isn’t working, so the analysis is unknowingly using incomplete data).
  • Tricky troubleshooting: Assuming someone has realized the data is wrong, it’s often time-consuming and frustrating to figure out where the problem lies. There can be a lot of combing through logs, and even then it’s not always apparent where the error occurred, so it takes even more time to try and reverse-engineer the issue.
  • A key person leaving the company: When processes have been built and maintained by one person, the knowledge about how things work or why they were built a particular way often only exists in their heads. If they leave the company, the business can be in a tricky spot (and the data process becomes even more unreliable because everyone else is too scared to touch it).

Often this situation reaches a tipping point where the time required to manage the process, the unreliability of the data, or the risk to the business becomes too much. But how do you know when you've reached this point?

How do you know when your homegrown solution is no longer meeting your needs?

How to spot when it's time to upgrade your data pipelines

There are several warning signs or triggers that might make you realize you’ve outgrown your older decisions, and you need to explore more robust, reliable solutions.

  • Feature gaps: You need new features that are hard – or impossible – to build. Or the endpoints the solution interacts with may have evolved – what used to be in Excel is now in a database for example. Or new data quality constraints may have come into play.
  • Lack of transparency: You need more information about how the solution is operating, and better visibility over whether things are running reliably and on time.
  • Age: Maybe your ETL has accumulated too much technical debt, or the systems it runs on are becoming obsolete.
  • Maintenance costs: The costs – direct or indirect - of keeping things running are increasing. Maybe the dev team has other projects they should or could be working on.
  • Scaling issues: Needing to meet increased demand is often a driver for seeking a more robust solution.

How a data integration platform can help

Using a data integration platform to replace your scripted or Excel-based data processing can eliminate a lot of the risks you might be facing, and bring several benefits.

  • Fully automate your jobs – run full end-to-end processes on a schedule, on a trigger (e.g. files arriving in a certain location), or at the click of a button, giving you flexibility and saving time.
  • Better visibility – a platform that enables visual design makes it easier for everyone to understand what’s happening with the data process.
  • Instil confidence – monitoring, automatic error alerts, detailed logs, and full visibility into exactly where problems occurred makes your data more reliable and troubleshooting much faster.
  • Free up significant time – spend less time on manual, repetitive tasks and painful troubleshooting, and focus more on higher value activities.
  • Enable less technical staff to work with data jobs – a data integration platform can make it easier for non-technical users to operate and use data jobs, without having to deal with the complexity of the IT side.

Case study of an automated data pipeline in a data integration platform

This example is a real-life solution of a FinTech customer using CloverDX to ingest data. They receive data from multiple clients on a regular basis, and each one has a different format that needs to be transformed and loaded into the warehouse.

The manual option

Of course, it's possible to do all of this manually - grab files from an FTP program, open them in Excel, visually inspect them, clean up the data, maybe remove, split or merge some columns, and use a data import wizard to get them where you need them.

But this quickly becomes impractical. Huge data files, too many files, needing to do it too frequently – doing it manually isn’t really an option at any kind of scale.

You could use scripts to handle these tasks – and maybe more scripts to stitch each step together. Still quite ad-hoc and error-prone, as well as not being very transparent. And although it saves some time, it still takes up some resource.

Or you could use a data integration platform and take people out of the loop completely. You can automate the entire process, monitor it all and alert people to errors that need intervention. You save significant time, remove sources of error, and get complete transparency.

Let’s look at how this process works in CloverDX.

We want to automate the whole end-to-end process of these steps:

  • Detecting arrival of client files to be processed
  • Detecting format and layout of client files
  • Reading files
  • Transforming/mapping
  • Assessing quality
  • Loading to target
  • Detecting/logging at every step

And we want a reusable process. The ideal is a single generic job that can be used for many situations, not creating a new version of the job every time you want to ingest data from a new client.

This is what that pipeline looks like. It handles every step, and captures errors that occur at any stage. You can run multiple files through the pipeline, and collect info on any that are rejected (and even handle those rejected files automatically).

Orchestrating the ingest process

The Ingest File process in more detail

This is what the Ingest File component, which handles the record level processing, looks like when we drill down into it. If you were doing this manually, this is the activity you’d most likely be doing in Excel.

Ingest process details

This is what this part of the process is doing:

  1. Read the file
  2. Make sure the record count is what is roughly what is expected. If not we can fail the file and/or alert someone to the error.
  3. Encrypt or mask certain fields
  4. Apply a set of data quality rules (filtering our records that are suspect)
  5. Collect those invalid records so that can be addressed
  6. Ultimately to create a newly shaped file that is ready to load into the warehouse

You could do all this in Excel, or a script, but the visual nature of this job allows us to see every step - much easier than trying to understand the process by looking at an Excel workbook.

Data validation in CloverDX

Automation of data processes in CloverDX

This all becomes even more valuable when you run jobs automatically and unattended – whether scheduled, triggered by a specific event, or launched on demand (e.g. via an automatically generated API endpoint, triggered via a simple user interface).

And you get full visibility over all activity – alerts for failure, logs of every execution, and graphical inspection of a run to see exactly where the job failed and why.

monitoring data jobs in the dashboard

 

If you want to discover more about how you can automate your data pipelines and move on from scripts and Excel, request a demo of CloverDX and see how you can eliminate business risk and save time and money.

How setting up a data ingestion framework helps automate and speed up customer data onboarding

Book a CloverDX demo and discover how to reduce time-consuming data tasks with automation

 

Share

Facebook icon Twitter icon LinkedIn icon Email icon
Behind the Data  Learn how data leaders solve complex problems every day

Newsletter

Subscribe

Join 54,000+ data-minded IT professionals. Get regular updates from the CloverDX blog. No spam. Unsubscribe anytime.

Related articles

Back to all articles
Digits on a computer screen signifying manual data processing
Data Quality Data Automation
3 min read

Is manual data processing making your organization error-prone?

Continue reading
A row of planes on a runway with cloudy sky behind
Data Automation
4 min read

How to accelerate your customer onboarding with data automation

Continue reading
5 benefits of implementing an automated reporting process in your organization
Data Automation
4 min read

5 benefits of implementing an automated reporting process in your organization

Continue reading
CloverDX logo
Book a demo
Get the free trial
  • Company
  • Our Story
  • Contact
  • Partners
  • Our Partners
  • Become a Partner
  • Product
  • Platform Overview
  • Plans & Pricing
  • Customers
  • By Use Case
  • By Industry
  • Deployment
  • AWS
  • Azure
  • Google Cloud
  • Services
  • Onboarding & Training
  • Professional Services
  • Customer Support
  • Resources
  • Customer Portal
  • Documentation
  • Downloads & Licenses
  • Webinars
  • Academy & Training
  • Release Notes
  • CloverDX Forum
  • CloverDX Blog
  • Behind the Data Podcast
  • Tech Blog
  • CloverDX Marketplace
  • Other resources
Blog
The vital importance of data governance in the age of AI
Data Governance
Bringing a human perspective to data integration, mapping and AI
Data Integration
How AI is shaping the future of data integration
Data Integration
How to say ‘yes’ to all types of data and embark on a data-driven transformation journey
Data Ingest
© 2025 CloverDX. All rights reserved.
  • info@cloverdx.com
  • sales@cloverdx.com
  • ●
  • Legal
  • Privacy Policy
  • Cookie Policy
  • EULA
  • Support Policy