Data Quality with CloverDX

Data quality is not a stand-alone or a one-off event. To make sure your data quality is as good as it can be, it’s important to remember that data quality is a process. It must be done continuously and everywhere.

The Data Quality Process

  1. Discover: Collect information (metadata) about your data - location, format, purpose, usage, and so on.
  2. Profile and measure: Know your data quality - measure your data against your data quality measures of completeness, accuracy, consistency, validity, integrity and timeliness.
  3. Define data quality rules: Describe the technical and business rules that the data must comply with.
  4. Monitor - Keep track of how your data quality changes, evaluating your data against your rules and error thresholds, and create error exceptions.
  5. Report - Inform stakeholders about data quality issues so that data quality issues are resolved.
  6. Fix - Correct or clean your data based on the reported errors as soon as possible (ideally in the source system).

Improving your Data Quality with CloverDX

The CloverDX Data Integration Platform has several key components designed to help you address specific points in the data quality process. When it comes to profiling and measuring your data, monitoring your data quality, reporting on errors and fixing bad data, CloverDX is designed to help you build your process and maintain your data quality.

Save time and money by validating email addresses

Recognizing which email addresses in your data are valid (and which aren't) can help improve your business efficiency, help target marketing more effectively and prevent system misuse. 

CloverDX's Email Filter component verifies specified fields for valid email addresses, and sends out valid ones through an output port. You can also send out specified fields from rejected inputs as well as information about any errors, so you can see details of exactly which email addresses are causing problems.

Documentation: Read more about the CloverDX Email Filter

 

Get a quick summary of your data, even within complex workflows

Measure and perform statistical analyses on the data as it flows through the ProfilerProbe component to easily get information on data from various sources. 

The ProfilerProbe gives you a quick reporting summary of basic data properties such as min/max values, short/long strings and data patterns.

Being able to use the ProfilerProbe component in your CloverDX workflow makes profiling accessible in complex workflows including data integration, data cleansing and other processing tasks. 

With the CloverDX Server, you can share findings with your team, customers or data providers, enabling a clear, cooperative workflow. And because the full history of profiling results is stored, you can use this to analyze trends or detect anomalies.

Documentation: Read more about the Profiler Probe

Automatically filter data based on your validation rules

Ensure clean, valid data by checking for invalid records as they come in - automatically. Minimizing the need for human intervention saves time and money, and reduces errors. 

Anything the filter doesn't let through is reported, along with detailed information about the reasons why, giving you the information you need to fix any data quality issues. Even non-technical team members can work with and understand the process - you can for instance put the output into a spreadsheet and send back to your accounts team to fix a problem.

The CloverDX Validator component comes with pre-built rules to validate your data, checking against criteria such as:

  • Empty or missing values
  • Date or phone number format
  • Numeric values
  • Interval match
  • Phone number validity

and more. But you can also add and configure your own validation rules, using either a graphical interface or manually entering in the form of xml. 

Being able to define your own rules means you can create complex validation logic, checking multiple fields at once. And once you've created your rules, you can share and reuse among all your transformations.

ss-validation-dialog-Screenshot---Validation-dialog

You can extend the power of the Validator component by validating data against third party sources - for example, validating addresses against an address database.

Documentation: Read more about the Validator component

Custom Data Validation

You can use many of the CloverDX components to implement your own custom data validation. For example:

  • Dedup to detect duplicates
  • Join components to verify relational data integrity
  • Reformat to implement custom volition rules

Any of these components can be used on an input stream to reject all data that doesn’t conform to the specifications, and/or use on output streams to check if assumptions about the data still hold.

The flexible, customizable nature of CloverDX means you can design your projects for whatever particular data quality requirements you have, without being limited to what comes out of the box.

Data Quality Reporting

Reporting is an important part of the data quality process. Reporting with CloverDX can help you collect measurements and errors, and report on the results either via email; a database or reject files; or in fact anything else (such as custom web applications or messaging systems).

Data Quality Project Examples Using CloverDX

Data cleansing for a logistics company

A huge supply-chain management company needed a solution to improve the quality and accuracy of their address data. We worked with them to create a data validation solution that processes all orders and cleanses most addresses automatically, removing a manual bottleneck in their process and enabling the business to grow.

Project Highlights:

  • Rule engine implementation for complex validation rules
    • Needed to handle addresses for multiple different countries with different languages and address formats
    • Address validation via Google Maps, HERE Maps and Baidu Maps REST APIs
    • Complex algorithm for evaluating address quality
  • Learns common addresses and fixes data automatically
    • The company now has a huge database of valid addresses across the region
  • Had to meet high-performance tight SLAs around data latency
  • For the minority of addresses that couldn't be automatically repaired, we created a custom web application that allows for manual data correction

Address validation for Canada’s National Democratic Party

Project Highlights:

  • Worked with more than 300 databases and tens of millions of records to build a new centralized database
  • Automatically matched, merged and deduped data
  • Users submitted addresses via contact form on the web
  • Eliminated the manual effort needed to cleanse data, enabling a more effective campaign

Read more about the CloverDX Data Integration Platform, or to see how you can use it to improve the data quality in your own processes, just book a demo.

Request a CloverDX demo