CloverDX's blog for developers and data experts

Data Quality Examples on GitHub

Written by Branislav Repček | Nov 13, 2025 10:53:50 AM

As our first set of examples, we published a project called DataQualityExamples which demonstrates how users can address their data quality issues with CloverDX. This example project provides several introductory examples to help you get familiar with CloverDX tools that can be used for data quality tasks.

You can find the project in our GitHub repository. It is split into four folders with each focusing on slightly different topic:

  • Basic validations – introduction into Validator component
  • Data format exception handling – built-in format exception handling
  • Data profiling – role of data profiling in data quality
  • Custom data quality checks – data quality is not only about off-the-shelf components

 Basic validations (1-basic-validations directory) 

The examples in this folder will help you get familiar with the core data quality component in CloverDX – Validator. Four different examples are provided to show you how to configure the Validator component in different situations.

With the Validator component, you can define your validation rules to measure the data quality of the data sent to the component. Validator comes with a set of predefined data quality rules that cover the most common validation tasks. You can, of course, add your own rules as needed. The component also offers easy to use visual interface to manage the rules and their settings.

Data format exceptions handling (2-data-format-validation folder)

Many times, you will be facing situations where your data is in such bad shape that you cannot process it any further. In these situations, CloverDX will help you to handle data format exceptions or at least log them and react accordingly. This exception handling – called Data policy – is a built-in feature in most of the Reader components. 

Typical use case is when you expect files that besides standard records occasionally contain records with alternative structure. In such a case, you can chain readers and process different data structures separately while also handling errors that do not match any of your expected formats.

Data profiling (3-profiling-and-validation)

The third set of examples deals with more real-world data processes. Record validation is certainly the first thing that comes to minds of the most data engineers, but typically before you start processing any data, you want to make sure that your entire dataset is of some basic data quality – so data profiling is what you do before you proceed to Validator.

To help you evaluate data set rather than individual records, CloverDX provides ProfilerProbe component. This component can help you measure your data sets – get minimum, maximum values, value lengths, common patters, histograms, and much more. This information can then be used to verify whether the data set matches your expectations before you do any more work on the data.

Custom data quality checks (4-custom-dq-check)

The last set of examples shows that data quality does not necessarily need to be implemented using just the Data Quality components in CloverDX. It is very common that organizations have very specific data quality rules that are too complex for components like Validator or ProfilerProbe. In such cases, you will use many other CloverDX components like Map, Denormalizer, RESTConnector, etc.

The examples also show one other very important concept – a concept of a subgraph. Subgraphs allow you to wrap your logic in a reusable component that can be easily called from many different places. This makes your graphs much easier to maintain since it removes the need for copy-paste and it also makes it easier to make changes in your shared logic since you only need to do it once.