• Blog
  • Contact
  • Sign in
CloverDX
Product
  • Overview
  • CloverDX Data Integration Platform
  • What's new in CloverDX 6
  • Pricing
  • CloverDX plans
  • Deployment
  • CloverDX on AWS
  • CloverDX on Azure
  • CloverDX on Google Cloud
  • CloverDX on-premise
  • Resources
  • Customer Portal
  • Documentation
  • Downloads & Licenses
  • Webinars
  • Academy & Training
  • Release Notes
  • CloverDX Forum
  • CloverDX Blog
  • Tech Blog
  • Other resources
isometric-illustration--product@2x 1

Get under the hood of CloverDX

See how CloverDX can benefit your business with a live demo. Simply get in touch with our team and we’ll handle the rest.

Book a demo
Solutions
  • By Industry
  • Banking
  • Capital Markets
  • Consultancy & Advisory
  • FinTech
  • Government Agencies
  • Healthcare
  • By Use Case
  • Data Quality
  • Data Ingest
  • Data Warehousing
  • Data Migration
  • Digital Transformation
  • Enterprise Data Management
  • Risk & Compliance
  • Anonymization
How F3 Group use CloverDX to ingest more client data - webinar
Customer interview

Formula 3: Staying Small And Agile While Working With Large Enterprise Ecosystems

Browse webinars
Services
  • Services
  • Onboarding & Training
  • Professional Services
  • Customer Support

More efficient, streamlined data feeds

Discover how Gain Theory automated their data ingestion and improved collaboration, productivity and time-to-delivery thanks to CloverDX.

 

Read case study
Customers
  • By Use Case
  • Analytics and BI
  • Data Ingest
  • Data Integration
  • Data Migration
  • Data Quality
  • Data Warehousing
  • Digital Transformation
  • By Industry
  • App & Platform Providers
  • Banking
  • Capital Markets
  • Consultancy & Advisory
  • E-Commerce
  • FinTech
  • Government
  • Healthcare
  • Logistics
  • Manufacturing
  • Retail
Migrating data to Workday - case study
Case study

Effectively Migrating Legacy Data Into Workday

Read customer story
Company
  • About CloverDX
  • Our story & leadership
  • Contact us
  • Partners
  • CloverDX Partners
  • Become a partner
Pricing
Demo
Trial

CloverDX Data Quality: An Introduction to Validator

Data Quality
Posted November 25, 2013
5 min read
CloverDX Data Quality: An Introduction to Validator

Before your data enters the ETL process, it's in your best interest to only work with "good" data. That is, the data that conforms to its respective domain rules, ranges, allowed values, and perhaps other unusual restrictions. If it doesn't, you'll want to log and remove all erroneous records so as not to pollute your transformation, as well as to have means to report and fix the data later on.

Having data checked and verified in the transformation goes hand in hand with CloverDX's mission of rapid data integration. With the right tools out of the box, along with an easy setup and immediate actionable results, it takes less trial and error and time to achieve the expected outcomes.

Let's take a look at how this all works in Clover.

What is Validator?

Validator, a part of the CloverDX Data Quality package, is a comprehensive filtering tool that lets you visually define data quality rules. What does this mean? Simply put, Validator is a component where you specify a set of checks that filter incoming data. Anything that the filter doesn't let through is reported along with detailed information about the reasons why. You can use this output as a basis for correcting problems; what's great about this is even the non-technical team members in your organization can work with and understand the process. Imagine putting the output into a spreadsheet and sending it back to the accounting department to fix the problems – no "translation" needed!

Built-in data validation rules can check various data quality characteristics, like date format, numeric value, interval match, phone number validity and format, and more. If you have special requirements, you can implement custom validation rules in CTL (CloverDX Transformation Language).

CloverDX Data Quality – An Introduction to Validator

New call-to-action

How Validator Fits into Data Quality

Validator nicely complements the Data Profiler in the Data Quality package. Generally, you start with Data Profiler to assess the overall condition of your data using statistics. Seeing variations of formats, missing values, and excessive ranges indicates which fields will need special care when setting up Validator. This will make sure no bad records get through. Validator acts as the second stage of checks for specific problems in your data and reports each one of them to you. Used in conjunction, these tools allow for efficient, comprehensive data quality.

Validation Rules

The single most important concept in Validator is a validation rule. You can look at rules as Boolean operators checking field values and returning either a success or failure. You can create complex combinations by putting multiple rules into groups, which can be either evaluated as "all rules in the group must pass" (AND) or "at least one must pass", i.e. OR.

Adding New Rules

To add a new active rule, drop one of the available rules to the active rules list, then click the corresponding Input fields property and select a field to which you want to apply this rule. You can also use drag and drop to set Input fields – just drag fields from Input metadata to any active rule. Some rules can be further configured on the Properties tab below the list.

CloverDX Data Quality – An Introduction to Validator

Basic Rules

Along with basic checks, you can do pretty cool things with two interesting rules in the Basic rules palette. First, with the Lookup rule, you can match field values against any type of lookup table. (Note: You'll need to have the lookup table previously defined or linked in your graph.) Secondly, the Expression rule let you do a little bit of programming and define simple custom filter expression to validate the data. This rule works to ExtFilter component with a filter expression.

String Rules and Conversions

String rules is a special category of rules that applies to string fields and can not only be used as checks, but also as conversion functions. For example, let's take a look at an instance where the raw data you get as a CSV file is a complete mess. If you were to create proper metadata for this file, you would lose a lot of records solely because the parser would skip all those misspelled numbers, ad-hoc date formats, etc.

Instead, define the metadata for reading as "all string fields", and let Validator try to find out which are good, and thus convertible, and which are bad. You can connect an output edge with non-string type fields and use these rules to convert the values for you automatically - e.g. string to date or string to number.

CloverDX Data Quality – An Introduction to Validator

Configuring Rules

Every active rule can be, and typically needs to be configured. The configuration of a rule usually specifies which input field will be validated by this rule, then sets rule-specific parameters, like the date format mask for rules that parse validated strings as dates.

By default, there is a "Copy all by name" rule at the beginning of the validation tree. This ensures that Validator will send validated values to output fields with matching names and types. All rules can also write the validated input value to any explicitly specified output field.

As said earlier, Validator can also convert incoming string values to a specific target data type – for example, date or integer. This can be done in rules like "Is Date" and "Is Number". These allow you to specify an output field that's of the desired target type.

CloverDX Data Quality – An Introduction to Validator

There are two special Assignment rules available to you, both of which copy data from selected input fields into desired output fields. The Copy rule copies a single input field to a single output field (1:1), while the Transform rule allows you to specify a CTL transformation that can do an arbitrary field assignment (M:N). With this, Validator can transform your input data based on conditions found in your data. By placing the Assignment rules into different branches of your validation tree, you can control which fields will be selected from the input data and which are not going to be used in Validator's output.

Error Output Mapping

The second important configuration setting of the Validator component is "Error output mapping". Here, various validation error details can be mapped onto the component's second output port fields.

CloverDX Data Quality – An Introduction to Validator

By connecting a writer component to this port, you can generate your validation error reports. In these reports, you can include the number of the record that did not pass the validation, validation message that describes the problem found in your data, the name of the rule that detected the problem, names and actual values of validated fields, and the identification of this Validator component. Error reports serve as an important entry point for fixing problems in your data.

Enhance your data quality activities 

Ensure high data quality with CloverDX's Validator. By defining set rules, you'll avoid data inconsistency, in turn allowing for better, workable data.

We hope you've found this run-through useful. If you need more help, don't hesitate to get in touch. 

New call-to-action

Share

Facebook icon Twitter icon LinkedIn icon Email icon
Try CloverDX for 45 days  Full access to Tech Support as if you were a customer

Newsletter

Subscribe

Join 54,000+ data-minded IT professionals. Get regular updates from the CloverDX blog. No spam. Unsubscribe anytime.

Related articles

Back to all articles
How to increase your overall data quality by enabling data self-service for business users
Data Quality
4 min read

How to increase your overall data quality by enabling data self-service for business users

Continue reading
Building data pipelines to handle bad data
Data Quality Data Ingest
5 min read

Building data pipelines to handle bad data: How to ensure data quality

Continue reading
What is automated error handling and how can it improve your data quality
Data Quality
4 min read

What is automated error handling and how can it improve your data quality?

Continue reading
CloverDX logo
Book a demo
Get the free trial
  • Company
  • Our story
  • Contact
  • Partners
  • Our partners
  • Become a partner
  • Product
  • Platform overview
  • Plans & Pricing
  • Customers
  • By Use Case
  • By Industry
  • Deployment
  • On-premise
  • AWS
  • Azure
  • Google Cloud
  • Services
  • Onboarding & Training
  • Professional Services
  • CloverCARE Support
  • Resources
  • Customer Portal
  • Documentation
  • Downloads & Licenses
  • Webinars
  • Academy & Training
  • Release Notes
  • CloverDX Forum
  • CloverDX Blog
  • Tech Blog
  • Other resources
Blog
Why data accessibility is essential for your digital transformation
Data Integration
4 barriers to making data-driven decisions
Data Strategy
4 steps to providing a data-driven customer experience
Data Integration
Implementing data democratization: 3 ways to make your data more accessible
Data Innovation
© 2023 CloverDX. All rights reserved.
  • info@cloverdx.com
  • sales@cloverdx.com
  • ●
  • Legal
  • Privacy Policy
  • Cookie Policy
  • EULA
  • Support Policy