Customizing metadata propagation

Metadata propagation, i.e. the ability to push metadata out from connected components is in the product since CloverETL 4.0.0. A new addition in CloverDX 5.3.0 allows programmers to enable this feature in custom Java components. This allows more seamless integration of custom components into projects, being even more generic and user friendly or unlock solutions previously not possible.

Since it is a relatively new feature, let’s explore some already implemented uses, lessons learned and how CloverDX metadata propagation actually works. I expect this article will be predominantly technical and will expect at least basic knowledge of Java – we will be talking about Java-driven component after all.

General metadata propagation algorithm

When CloverDX engine runs metadata resolution, it does it iteratively from obvious (user defined) to component provided (templates or propagated) as long as there are no changes in metadata assignments. Why this point is so important? If your algorithm will not return a stable result, it is very simple to end up in an infinite loop which is very hard to detect and, in most cases, causes unresponsiveness of Designer which, eventually, has to be killed.

Further along, we will see when to look out for this situation and how to prevent it.

Propagation interface

For every resolution iteration, method public void propagateMetadata() is called and therefore should be as lightweight as possible. Preferably, without any network calls or anything which would slow down the propagation algorithm. If such expensive algorithm is inevitable, cache its results. This cache will not survive between 2 resolutions but will be valid over all iterations of 1 resolution. Since there could be easily 10 iterations per resolution, savings can be significant.

Slow resolution affects both job design experience and initialisation before job execution. This method has no return value, but is expected to call (either or both)

setInputMetadata(<port number>,<DataRecordMetadata instance>),
setOutputMetadata(<port number>,<DataRecordMetadata instance>)

to push metadata into a job from desired ports. An important detail to remember, when algorithm does not assign metadata during propagateMetadata() call, it is the same as setting null metadata (= do not propagate metadata), often leading to unpredictable propagation behaviour or in the worst case to an infinite propagation loop, and is the reason why your algorithm should always return stable result.

Working with metadata records

Java package org.jetel.metadata contains couple of handy classes which could be used to manipulate metadata records. For example org.jetel.metadata.DataRecordMetadataXMLReaderWriter helps to easily read and write external metadata files.

DataRecordMetadata loadExternalMetadata(String path) {
if (path != null && !path.isEmpty()) {
  try {
    File addMetadata = getFile(path);
    //
    // When metadata file is provided, load its definition
    if (addMetadata != null) {
      try (InputStream is = new FileInputStream(addMetadata)) {
        return DataRecordMetadataXMLReaderWriter.readMetadata(is);
      } catch (IOException e) {
        getComponent().getLog().warn("Unable to read metadata file: " + addMetadata.getAbsolutePath(),e);
      }
    }
  } catch (Exception e) {
    getComponent().getLog().warn("Invalid metadata file URL: " + path);
  }
}

return null;
}

When building metadata records from scratch, remember to set record type and all other required properties (e.g. delimiters), otherwise it will result in invalid metadata – these will not cause error during checkconfig but may cause issues when used to write into or read from a file. For example, the following procedure will still produce invalid metadata (default delimiter is missing):

public static DataRecordMetadata createMetadata(Properties props,String metadataName) {
  DataRecordMetadata metadata = new DataRecordMetadata(metadataName);

  for (Object variableName : props.keySet()) {
    DataFieldMetadata field = new DataFieldMetadata("xxx",DataFieldType.STRING,null);
    field.setLabel((String) variableName);
    metadata.addField(field);
  }

  metadata.normalize();
  return metadata;
}

Notice static “xxx” as field name and label DataFieldMetadata.setLabel() as variable input, this is because field names can contain characters not valid for field names but label is more permissive – you can save yourself some input validation if field names come from user input. Calling DataRecordMetadata.normalize() will (amongst others) convert field labels into valid field names for you.

Propagation strategies

In general, there might be 3 different strategies in either direction (in or out):

Propagate through,
Provide custom,
Do not propagate.

The top two of which can be enforced (as e.g. error output from FlatFileReader and output port from Filter) or conditional.

Even though enforced propagation tries to overwrite all metadata on port, it may not succeed, as priorities of metadata assignment still apply. Do not rely on custom metadata in your transformation algorithm! If you need to check metadata were successfully propagated, use public ConfigurationStatus checkConfig(ConfigurationStatus status) in transformation algorithm.

With conditional propagation, condition should not “change its mind” too often as it may increase number of propagation rounds. For example, the following algorithm is extremely dangerous.

@Override
public void propagateMetadata() { 
  if (getMetadataFromOutputPort(0) == null) {
    DataRecordMetadata metadata = new DataRecordMetadata("dummy");
    setOutputMetadata(0, metadata);
  }
}

In the first situation, there’s going to be infinite rounds of metadata propagation. The edge between the two components will alternate between null and dummy. The second example will finish in couple of rounds, with HTTPConnector_Request metadata which may come as a surprise but let’s take a look at what is happening.

Custom component -> dummy (left input), HTTPConnector -> HTTPConnector_Request
Custom component -> null, HTTPConnector -> HTTPConnector_Request (only input)
Custom component -> null, HTTPConnector -> HTTPConnector_Request (only input)

Better algorithm may look like:

@Override
public void propagateMetadata() {
  DataRecordMetadata existing = getMetadataFromOutputPort(0);

  setOutputMetadata(0, existing == null ?
    new DataRecordMetadata("dummy") : existing
  );
}

This is because first instance would end up with dummy and second one too with one round less:

Custom component -> dummy (left input), HTTPConnector -> HTTPConnector_Request
Custom component -> dummy (left input), HTTPConnector -> HTTPConnector_Request

To wrap it up

Today, we looked at propagation interfaces and highlighted some caveats which may not be as obvious when developing metadata propagation algorithm, talked about some helpful Java packages and metadata record methods and finished up some final thoughts about propagation implications.

Pavel Švec

May 31, 2021

Java Developer Component Feature

Quick start

CloverDX Academy

Customizing metadata propagation

General metadata propagation algorithm

Propagation interface

Working with metadata records

Propagation strategies

To wrap it up

More from Tech Blog

Sending emails via Azure Communication Services SMTP

Connecting to REST APIs (OpenAPI)

Performance tuning: How to troubleshoot database-related performance issues in CloverDX

CloverDX Transformation Language: How to Extend CTL with Java Functions

Organizing large projects: Separating Configuration and Data

Efficient data modelling with DBT and ETL data pipeline

Visit CloverDX Blog

The vital importance of data governance in the age of AI

Bringing a human perspective to data integration, mapping and AI

How AI is shaping the future of data integration

How to say ‘yes’ to all types of data and embark on a data-driven transformation journey

Data ingestion tools: 7 features you should look for

Read On

Understanding Metadata Propagation in CloverDX 4.0

Metadata Propagation: It Makes Your Data Integration Jobs Much Easier