Customizing metadata propagation

Metadata propagation, i.e. the ability to push metadata out from connected components is in the product since CloverETL 4.0.0. A new addition in CloverDX 5.3.0 allows programmers to enable this feature in custom Java components. This allows more seamless integration of custom components into projects, being even more generic and user friendly or unlock solutions previously not possible.

Since it is a relatively new feature, let’s explore some already implemented uses, lessons learned and how CloverDX metadata propagation actually works. I expect this article will be predominantly technical and will expect at least basic knowledge of Java – we will be talking about Java-driven component after all.

General metadata propagation algorithm

When CloverDX engine runs metadata resolution, it does it iteratively from obvious (user defined) to component provided (templates or propagated) as long as there are no changes in metadata assignments. Why this point is so important? If your algorithm will not return a stable result, it is very simple to end up in an infinite loop which is very hard to detect and, in most cases, causes unresponsiveness of Designer which, eventually, has to be killed.

Further along, we will see when to look out for this situation and how to prevent it.

Propagation interface

For every resolution iteration, method public void propagateMetadata() is called and therefore should be as lightweight as possible. Preferably, without any network calls or anything which would slow down the propagation algorithm. If such expensive algorithm is inevitable, cache its results. This cache will not survive between 2 resolutions but will be valid over all iterations of 1 resolution. Since there could be easily 10 iterations per resolution, savings can be significant.

Slow resolution affects both job design experience and initialisation before job execution. This method has no return value, but is expected to call (either or both)

  • setInputMetadata(<port number>,<DataRecordMetadata instance>),
  • setOutputMetadata(<port number>,<DataRecordMetadata instance>)

to push metadata into a job from desired ports. An important detail to remember, when algorithm does not assign metadata during propagateMetadata() call, it is the same as setting null metadata (= do not propagate metadata), often leading to unpredictable propagation behaviour or in the worst case to an infinite propagation loop, and is the reason why your algorithm should always return stable result.

Working with metadata records

Java package org.jetel.metadata contains couple of handy classes which could be used to manipulate metadata records. For example org.jetel.metadata.DataRecordMetadataXMLReaderWriter helps to easily read and write external metadata files.

DataRecordMetadata loadExternalMetadata(String path) {
if (path != null && !path.isEmpty()) {
  try {
    File addMetadata = getFile(path);
    //
    // When metadata file is provided, load its definition
    if (addMetadata != null) {
      try (InputStream is = new FileInputStream(addMetadata)) {
        return DataRecordMetadataXMLReaderWriter.readMetadata(is);
      } catch (IOException e) {
        getComponent().getLog().warn("Unable to read metadata file: " + addMetadata.getAbsolutePath(),e);
      }
    }
  } catch (Exception e) {
    getComponent().getLog().warn("Invalid metadata file URL: " + path);
  }
}

return null;
}

When building metadata records from scratch, remember to set record type and all other required properties (e.g. delimiters), otherwise it will result in invalid metadata – these will not cause error during checkconfig but may cause issues when used to write into or read from a file. For example, the following procedure will still produce invalid metadata (default delimiter is missing):

public static DataRecordMetadata createMetadata(Properties props,String metadataName) {
  DataRecordMetadata metadata = new DataRecordMetadata(metadataName);

  for (Object variableName : props.keySet()) {
    DataFieldMetadata field = new DataFieldMetadata("xxx",DataFieldType.STRING,null);
    field.setLabel((String) variableName);
    metadata.addField(field);
  }

  metadata.normalize();
  return metadata;
}

Notice static “xxx” as field name and label DataFieldMetadata.setLabel() as variable input, this is because field names can contain characters not valid for field names but label is more permissive – you can save yourself some input validation if field names come from user input. Calling DataRecordMetadata.normalize() will (amongst others) convert field labels into valid field names for you.

Propagation strategies

In general, there might be 3 different strategies in either direction (in or out):

  1. Propagate through,
  2. Provide custom,
  3. Do not propagate.

The top two of which can be enforced (as e.g. error output from FlatFileReader and output port from Filter) or conditional.

Even though enforced propagation tries to overwrite all metadata on port, it may not succeed, as priorities of metadata assignment still apply. Do not rely on custom metadata in your transformation algorithm! If you need to check metadata were successfully propagated, use public ConfigurationStatus checkConfig(ConfigurationStatus status) in transformation algorithm.

With conditional propagation, condition should not “change its mind” too often as it may increase number of propagation rounds. For example, the following algorithm is extremely dangerous.

@Override
public void propagateMetadata() { 
  if (getMetadataFromOutputPort(0) == null) {
    DataRecordMetadata metadata = new DataRecordMetadata("dummy");
    setOutputMetadata(0, metadata);
  }
}

In the first situation, there’s going to be infinite rounds of metadata propagation. The edge between the two components will alternate between null and dummy. The second example will finish in couple of rounds, with HTTPConnector_Request metadata which may come as a surprise but let’s take a look at what is happening.

  • Custom component -> dummy (left input), HTTPConnector -> HTTPConnector_Request
  • Custom component -> null, HTTPConnector -> HTTPConnector_Request (only input)
  • Custom component -> null, HTTPConnector -> HTTPConnector_Request (only input)

Better algorithm may look like:

@Override
public void propagateMetadata() {
  DataRecordMetadata existing = getMetadataFromOutputPort(0);

  setOutputMetadata(0, existing == null ?
    new DataRecordMetadata("dummy") : existing
  );
}

This is because first instance would end up with dummy and second one too with one round less:

  • Custom component -> dummy (left input), HTTPConnector -> HTTPConnector_Request
  • Custom component -> dummy (left input), HTTPConnector -> HTTPConnector_Request

To wrap it up

Today, we looked at propagation interfaces and highlighted some caveats which may not be as obvious when developing metadata propagation algorithm, talked about some helpful Java packages and metadata record methods and finished up some final thoughts about propagation implications.

More from Tech Blog

  • CTL2 error handling - try/catch block

    Poor data quality, format changes and unreachable data sources are just a few examples of runtime problems that can wreak havoc on a seemingly robust data... Feature
  • Starting a new CloverDX project

    We often get questions such as 'What is a best practice for project structure?', 'How do you work on a single project in parallel?', 'What's the best... Best practice
  • Deployment templating for CloverDX Server

    As more and more companies move towards cloud or container deployments, CloverDX has introduced a number of features, supporting both an infrastructure as... CloverDX Server
  • Publishing data sites

    One of the frequently used features of CloverDX is Data Services. Data Services allows you to publish your CloverDX transformations as REST APIs. A less... API
  • Quick Tip - Organizing Executions History

    Reusability is very important topic when it comes to job design in CloverDX. We are strong advocates of the DRY principle, which can be a big help during...

Visit CloverDX Blog

Read On