I am sure you are often finding yourself in your work in a situation when you have data so-called „in your hands“ and you would like to transform them to a proper form, clean up them or load them to a data destination (e.g. a database). But you do not want to reinvent the wheel again and again. It is the time when you start searching a suitable technology which can help you to avoid developing what has been already developed, tuned and is stable. Most of the ETL tools often satisfy all these requirements and CloverETL is not an exception. In addition, CloverETL Engine provides a few tricky approaches how to process your data a little bit unconventional.
cat employees.txt | clover -D:metadata=“employees.fmt“ -D:filter=“$salary>10000“ filter.grf > beFired.txt
//prepare channel with the data for ETL processing
InputStream inputData = getInputDataStream();
//prepare channel where the resuled data will be formatted
OutputStream outputData = prepareOutputDataStream();
//create graph instance based on grf file and initialize it
TransformationGraph graph = TransformationGraphXMLReaderWriter.read(File);
EngineInitializer.initGraph(graph);
//initialize graph dictionary
//our input channel will be registered under „inputStream“ key
graph.getDictionary().setValue(„inputStream“, „ReadableChannel“, inputData);
//our output channel will be registered under „outputStream“ key
graph.getDictionary().setValue(„outputStream“, „WritableChannel“, outputData);
//execute graph – output data will be pushed to output stream during graph run
runGraph.executeGraph(graph);
Now you probably ask yourself how the graph knows that input data are ready in dictionary under „inputStream“ key and on the other hand how it knows where to write the result output data. The answer is simple – fileURL attribute of UniversalDataReader/Writer has a specialized syntax for dictionary entries. Reader can have fileURL set to "dict:inputStream". In case of Writer we need to setup fileURL attribute to "dict:outputStream". That is all – the CloverETL engine takes care of data transmission between your data streams and the CloverETL graph automatically. Data prepared in input stream will be parsed by a dedicated data reader and will be passed as Clover data records for further processing to next components down the graph. And incoming data to a data writer will be formatted into your channel prepared in dictionary under „outputStream“ key.
As it was already mentioned, Dictionary can handle various data types. Beside already described data streams, it is possible to store all basic Clover data types – string, integer, long, number (Java equivalent double), decimal (Java equivalent BigDecimal), byte, and boolean. So Dictionary can be used for passing input values to a graph or also for inter-component communication – the first component writes some semi-result into Dictionary and the second component can pick up this value for further processing.
Probably the most advanced way how to exploit Dictionary is possibility to define your own proprietary dictionary data types. Similarly to components, connections, CTL functions and so on, the dictionary entry types are also fully pluginable. So you can easily introduce your own type that corresponds to your needs. For example, you can extend CTL by your own function that allows you to access this data value from Clover and converts it to a CloverETL data record – the basic data element processed by CloverETL engine. It is certainly possible to create a new set of components that understand your specific data format. In the scope of component run your custom data format can be retrieved from Dictionary, transformed into a standard CloverETL data record and passed to an output port for following processing. We have been using this approach successfully in several projects where the data format was totally incompatible with CloverETL records.
I hope this little bit technical insight into CloverETL engine inspires you for its usage in situations that seemed inappropriate till now.