CloverDX Blog on Data Integration

Improving subgraphs in CloverDX 4.1

Written by Jan Sedláček | June 30, 2015

Last year, we introduced the subgraph as the most important innovation in CloverDX 4.0 (formerly CloverETL). Since the introduction, we've received positive feedback from our customers and we've been improving subgraphs in the following releases to refine the feature.

We strive for CloverDX to become a more versatile tool, capable of tackling new data challenges. In release 4.1, we added many new features.

In this blog, we will focus on new features improving subgraphs such as optional ports and the capability to dynamically disable components based on the connected edges.

With the help of optional ports, you are creating an elegant subgraph that can work in many different scenarios, rather than run several specific subgraphs to achieve the same goal. In addition, it will allow an experienced developer to change the behaviour of a subgraph based on different conditions.

We are very fond of these features so we re-engineered the example called “Read and Analyze Tweets” from CloverDX 4.0 to incorporate optional ports to demonstrate the enhanced capability.

Let's take a look at the example.

As you can see, it is pretty straight forward graph. It reads the Twitter feeds and then it uses the subgraph “Sentiment classify” to sort tweets as positive, negative or neutral based on the keywords.

Here is the “Sentiment classify” subgraph processing the Twitter feed.

We are using two keyword lists in this example. The lists are simple text files, however, you can use any data source. „Default sentiment keywords“ reader is located in the subgraph and „My sentiment keywords“ is located before the subgraph. If users choose to use custom keywords, they just simply connect „My sentiment keywords“ to the subgraph.

It is really easy to set ports as optional. You simply select a port directly in the subgraph or in the outline and set it as a required or an optional port.

We are also illustrating another new feature in this example as well. Dynamically disabled components. Right-click on the „Default keyword list“ reader and choose enable.

This reader is conditionally disabled as we do not want to mix the two keywords lists. As you connect the edge with custom keywords, the default keywords reader is automatically disabled.

Optional Ports

Whereas previously all ports in subgraphs had to have edges connected, regardless of whether data is coming through the edge, it can now be defined as optional.

In our example you can see, there are several behavioral options for ports. If you declare a port as optional, you need to choose its behavior. The behavioral options are a) removal of edge if port is not connected or b) keep the edge, but discard all records. Every behavior type has a specific use. You can learn more about all behavior in our other blog dedicated to practical use of optional ports.

Dynamically enabled/disabled components

With the implementation of optional ports, we've encountered a few issues. What if the removal of edge in the subgraph prevents some component within the subgraph to work properly. How to avoid such trouble?

The solution we came up with was to expand the already existing disable component feature. In addition to the existing options to disable component either manually or based on graph parameter, we've added the capability to dynamically disable components based on edge connection to the port of the component.

We utilized this feature in the previous Twitter example. „Default sentiment keywords“ reader was conditionally disabled based on the connection of the other edge. Once you connect the edge with “My sentiment keywords”, the subgraph automatically disable “Default sentiment keywords”.

Although we introduced this as a subgraph feature to complement optional ports, it can be very handy in common graphs as well.

If you are veteran user of CloverDX, you will probably notice missing “pass through” option from previous versions of CloverDX. We decided, that disabling components is the same as “pass through” and we combined those two options. So if you disable component now the edge will pass through the components as it is not there.

We believe these new additions will be very beneficial to making your data processes easier to prepare and faster to run.

These new features and improvements are available in our latest release of CloverDX 4.1. Download your evaluation copy today.