• Blog
  • Podcast
  • Contact
  • Sign in
CloverDX Logo
Product
  • OVERVIEW
  • Discover CloverDX Data Integration Platform###Automate data pipelines, empower business users.
  • Deploy in Cloud
  • Deploy on Premise
  • Deploy on Docker
  • Plans & Pricing
  • Release Notes
  • Documentation
  • Customer Portal
  • More Resources
  • CAPABILITIES
  • Sources and Targets###Cloud and On-premise storage, Files, APIs, messages, legacy sources…
  • AI-enabled Transformations###Full code or no code, debugging, mapping
  • Automation & Orchestration###Full workflow management and robust operations
  • MDM & Data Stewardship###Reference data management
  • Manual Intervention###Manually review, edit and approve data
  • ROLES
  • Data Engineers###Automated Data Pipelines
  • Business Experts###Self-service & Collaboration
  • Data Stewards###MDM & Data Quality
clip-mini-card

 

Ask us anything!

We're here to walk you through how CloverDX can help you solve your data challenges.

 

Request a demo
Solutions
  • Solutions
  • On-Premise & Hybrid ETL###Flexible deployment & full control
  • Data Onboarding###Accelerate setup time for new data
  • Application Integration###Integrate operational data & systems
  • Replace Legacy Tooling###Modernize slow, unreliable or ad-hoc data processes
  • Self-Service Data Prep###Empower business users to do more
  • MDM & Data Stewardship###Give domain experts more power over data quality
  • Data Migration###Flexible, repeatable migrations - cloud, on-prem or hybrid
  • By Industry
  • SaaS
  • Healthcare & Insurance
  • FinTech
  • Government
  • Consultancy
zywave-3

How Zywave freed up engineer time by a third with automated data onboarding

Read case study
Services
  • Services
  • Onboarding & Training
  • Professional Services
  • Customer Support

More efficient, streamlined data feeds

Discover how Gain Theory automated their data ingestion and improved collaboration, productivity and time-to-delivery thanks to CloverDX.

 

Read case study
Customers
  • By Use Case
  • Analytics and BI
  • Data Ingest
  • Data Integration
  • Data Migration
  • Data Quality
  • Data Warehousing
  • Digital Transformation
  • By Industry
  • App & Platform Providers
  • Banking
  • Capital Markets
  • Consultancy & Advisory
  • E-Commerce
  • FinTech
  • Government
  • Healthcare
  • Logistics
  • Manufacturing
  • Retail
Migrating data to Workday - case study
Case study

Effectively Migrating Legacy Data Into Workday

Read customer story
Company
  • About CloverDX
  • Our Story & Leadership
  • Contact Us
  • Partners
  • CloverDX Partners
  • Become a Partner
Pricing
Demo
Trial

Data validation in CloverDX

Data Quality Data Ingest
Posted April 14, 2022
5 min read
Data validation in CloverDX

Data validation in your data ingestion process is vital to keep your data pipelines flowing smoothly and to ensure that the data you’re working with is correct and useful.

We outlined some more of the objectives of data validation in this post and here we’ll show you how you can build these validation steps into your pipelines with CloverDX to maintain the quality of the data you're ingesting.

Data profiling and data validation to ensure data quality

Both data profiling and data validation play an important role in managing your data quality.

  • Data profiling is a statistical analysis of the data. It gives you a holistic assessment of your entire data set, so not only does it prevent you processing suspect data, but can also help you detect trends in your data, or data quality decay over time.
  • Data validation is an assessment of your data at the record level. It involves defining business-specific rules (e.g. all records must have a date, or be formatted in a specific way) so that you can identify records that don’t meet the criteria, and get actionable error messages and fix issues. You can also define what happens to the records, for example continuing to process the good records and rerouting the bad ones.

The data ingestion process with CloverDX

In this example we’ll walk through both the profiling and validation steps in a CloverDX data ingestion pipeline.

This case study involves a platform where multiple schools upload data to either an FTP site or via email and get an analyzed view of that data returned.

The ingest process involves several steps, in a single pipeline:

  • Monitoring both sources for incoming files
  • Copying incoming files from the FTP
  • Unzipping the files
  • Checking against a manifest
  • Profiling the data
  • Transforming it
  • Loading to target
  • Logging the results

So our basic process looks like this (with the validation steps highlighted):

data ingestion process

And we build that out into an executable data pipeline in CloverDX that looks like this:

data ingestion process in cloverdx

The section highlighted in green above is where we’re performing all those steps, and we can see each individual part of that section by opening it up:

validation steps 1-3validation steps 3-6

There are several places where we’re doing some kind of check on our data:

  • Checking to see if the incoming file is zipped or not (because we want to handle those files differently. We can unzip them if needed, and handle that without logging any errors – adjusting on the fly without manual intervention)
  • Checking against a manifest to make sure what we’re receiving is what we’re expecting
  • Filtering to make sure we’re only checking the files we want
  • And the profiling step, which is where the bulk of the additional validation takes place.

And we can drill down further into that profiling step:

profiling step

This is our validation step. We’re taking a flat file and running two data pipelines against it: the profiler, to give us that holistic, overarching view; and the business rules validation, where we validate our data against specific rules. We don’t necessarily stop the pipeline if we encounter any suspect data, but we do log it to Excel files.

Watch what this process looks like in CloverDX

Profiling

When we run this process in CloverDX we can see the numbers of files flowing through the process, and how it’s changing at each step. We can also see all the individual records that are coming across, and can see that we’re profiling based on a number of different criteria: for this example where we’re looking at school data we have records based on classes, enrolments, students, teachers and so on.

We can also look at the validation results for each record. For instance, in this example we’re checking for null records. We’ve defined a threshold of 10% - if we’re getting fewer than 10% null records we continue processing, but as soon as it hits that threshold we log an error and alert the user, pulling those results into a spreadsheet.

results in spreadsheet

And we can see that even though we haven’t yet reached that critical threshold, the data is trending in the wrong direction – we’re getting more and more null counts. With that information, your end user or support team can address those errors before they even reach critical status – enabling you to fix your data before it even fails.

Business rules validation

Each school in our example has different business rules, different ways they format teacher or class IDs for instance, each very specific to a particular system or data source. The challenge in dealing with data ingestion from multiple sources is how to scale that up, without needing to build new data pipelines for each source?

In this video you can see how, instead of having those validation rules for each source explicitly listed in the pipeline, we’re externalizing the rules.

We’re looking at a file for each school, and depending on which school it is, changing the variables, allowing you to change the data validation rules on the fly per source system.

 

Error handling

As mentioned above, in CloverDX you can define how you want to handle errors. Some you might want to stop the pipeline for, but some softer errors you might not. But it’s still useful to inform users about those soft errors in a way that’s meaningful and allows them to take action.

In our example we’re creating two Excel sheets, one showing errors by school, and one showing errors by date. You can imagine how for instance you might want to send the by-school report to those schools so they can fix any common errors. The errors-by-date report can be useful internally, for instance to check how your pipelines are running over time.

Data validation with CloverDX

Building data ingestion pipelines with CloverDX gives you complete flexibility in how you manage your data and what data validation rules you want to apply. Building data validation into your automated processes enables you to:

  • Adapt on the fly to different data sources, all within a single data pipeline
  • Spot errors early, with reporting that allows users to pinpoint and fix problems
  • Scale up the number of customers or data sources you can handle, without needing additional technical resource
  • Automate the entire end-to-end data ingestion process, right through from detecting incoming files, to data quality checks, processing, loading data to a target and logging every action

Watch the whole video of this demo of data validation in data ingestion processes here. And if you want to talk about how your can build and automate your data ingestion pipelines in CloverDX, just get in touch with us.

Data validation in data ingestion processes - watch now

 

Share

Facebook icon Twitter icon LinkedIn icon Email icon
Behind the Data  Learn how data leaders solve complex problems every day

Newsletter

Subscribe

Join 54,000+ data-minded IT professionals. Get regular updates from the CloverDX blog. No spam. Unsubscribe anytime.

Related articles

Back to all articles
Data ingestion from different sources on a whiteboard
Data Ingest
3 min read

How to say ‘yes’ to all types of data and embark on a data-driven transformation journey

Continue reading
Data ingestion tools - features you should look for
Data Ingest
7 min read

Data ingestion tools: 7 features you should look for

Continue reading
Data Quality
4 min read

Why data quality is crucial for data integration projects

Continue reading
CloverDX logo
Book a demo
Get the free trial
  • Company
  • Our Story
  • Contact
  • Partners
  • Our Partners
  • Become a Partner
  • Product
  • Platform Overview
  • Plans & Pricing
  • Customers
  • By Use Case
  • By Industry
  • Deployment
  • AWS
  • Azure
  • Google Cloud
  • Services
  • Onboarding & Training
  • Professional Services
  • Customer Support
  • Resources
  • Customer Portal
  • Documentation
  • Downloads & Licenses
  • Webinars
  • Academy & Training
  • Release Notes
  • CloverDX Forum
  • CloverDX Blog
  • Behind the Data Podcast
  • Tech Blog
  • CloverDX Marketplace
  • Other resources
Blog
The vital importance of data governance in the age of AI
Data Governance
Bringing a human perspective to data integration, mapping and AI
Data Integration
How AI is shaping the future of data integration
Data Integration
How to say ‘yes’ to all types of data and embark on a data-driven transformation journey
Data Ingest
© 2025 CloverDX. All rights reserved.
  • info@cloverdx.com
  • sales@cloverdx.com
  • ●
  • Legal
  • Privacy Policy
  • Cookie Policy
  • EULA
  • Support Policy