CloverDX is a new name for CloverETL Learn more
Recently we’ve released the second public beta release of a new member of our CloverDX family – CloverDX Data Profiler. With this article we’d like to share an overview of the technical architecture of CloverDX Profiler – the why and how of the design.
The CloverDX Data Profiler is a data profiling application, i.e. it provides the users with various information about their data, such as average, number of empty values , histogram-like charts etc. When designing the application we had several goals in mind:
Let’s have a look at the main building blocks of CloverDX Data Profiler:
All parts of CloverDX DataProfiler are currently bundled in one simple-to-use package as a standalone application – just start the CloverDX Data Profiler and it automatically launches an embedded Result Storage, Reporting Server etc. However we’ve laid the foundations for separating the building blocks for bigger deployment scenarios and better integration with the rest of CloverDX family.
Now let’s have a look on the nitty-gritty details of actually profiling data. The basic premise is that we already have all the tools needed for profiling in CloverDX – a transformation graph has all the expressive power needed for profiling of data:
So anyone could manually create a CloverDX graph that profiles data. But it’s quite a complex task which would take the user’s focus away from the core: what data source does he want to profile and which metrics does he want to use. In the CloverDX Data Profiler, the user describes this core information in a profiling job, and then the CloverDX Profiler Engine transforms it into a CloverDX transformation graph. The profiling job is defined in a relatively simple XML file that can be edited in a graphical editor. The job primarily contains the following information:
The above picture demonstrates the process of running a profiling job:
This article is a brief introduction into the architecture of CloverDX Data Profiler. As you can see, we’ve saved a lot of effort by using the power of CloverDX at the core. The CloverDX Data Profiler can be also seen as a successful example of embedding CloverDX.