The UniversalDataReader is designed for reading files in various formats. We use this component for many purposes. One of them is parsing of an Apache access log. The file normally includes records in a commonly used combined log format, e.g.:
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"
Fields in the record are delimited by a space mark. But a space can be included in some quoted fields, such as "GET /apache_pb.gif HTTP/1.0", so a single space is an improper delimiter. Fortunately, CloverDX allows you to define variable delimiters in metadata. So parsing of the log depends only on a proper setting of metadata on an output edge from the reader. In our case we defined following delimiters: space, space, space+left square bracket, right square bracket+space+quotation mark, quotation mark+space etc.
The complete example with an additional computing of the most visited pages and the most visiting IP addresses can be found in Advanced Examples (AccessLogParsing.grf) included in CloverDX Designer or you can download all examples from SourceForge.
Data integration software and ETL tools provided by the CloverDX platform (formerly known as CloverETL) offer solutions for data management tasks such as data integration, data migration, or data quality. CloverDX is a vital part of enterprise solutions such as data warehousing, business intelligence (BI) or master data management (MDM). CloverDX Designer (formerly known as CloverETL Designer) is a visual data transformation designer that helps define data flows and transformations in a quick, visual, and intuitive way. CloverDX Server formerly known as CloverETL Server) is an enterprise ETL and data integration runtime environment. It offers a set of enterprise features such as automation, monitoring, user management, real-time ETL, data API services, clustering, or cloud data integration.