Deployment templating for CloverDX Server

As more and more companies move towards cloud or container deployments, CloverDX has introduced a number of features, supporting both an infrastructure as code approach or templating. As some of these features may not be obvious, I’d like to summarize them in this blog post.

These very features are used in CloverDX cloud marketplace offerings and to spin up & down internal test instances. These features can be categorised into these groups:

Environment-aware configuration
Centralized logging
Server settings auto-import
Sandbox auto-deployment

Server properties via config file placeholders

It is possible to inject system and environment properties into clover config file. This file can be used to alter any server related property, but besides using exact values it is possible to also use 2 types of placeholders:

${sys:property.name} and
${env:property.name}.

The only difference between these notations lies in how they are defined in the operating system. System properties (keyword sys) are defined as JAVA_OPTS of application server (i.e. as Java property). The other ones, environment properties (keyword env) are related to operating system environment variables, like $PATH (Linux) or %PATH% (Windows).

Therefore, if environment variable is initialized as

export CLOVERDX_SNDBX_DIR="/data/sandboxes"

or in Windows command line (or system properties)

setx CLOVERDX_SNDBX_DIR "C:\Users\Clover\sandboxes"

it is possible to set up a sandbox directory in clover config file as

# Configure sandboxes root directory
sandboxes.home=${env:CLOVERDX_SNDBX_DIR}

In both infrastructure as a code and container deployments, it is more common to use environment variables to set these properties as both approaches offer a convenient way to set them up. To escape placeholder definition, use $$ instead of a single one. This could be useful e.g. when a configuration property has some placeholders on its own (like LDAP query string).

Direct property injection

Besides placeholders, it is also possible to configure properties with environment variables only, bypassing the clover config file completely. To do this, the property must be defined with the clover. prefix. Because not all environments support dot . naming notation (typically Linux), underscore _ notation is also allowed. As a consequence, both environment variables clover.sandboxes.home and clover_sandboxes_home configure the same property - sandboxes.home.

Templating server configuration

Even though it does not matter which approach or combination you use, I’d highly encourage you to choose one and stick with it. A combined approach may lead to increased maintenance overhead and over time it may become difficult to track the origins of configuration errors or to even do simple configuration changes as those might collide.

I prefer using custom clover config file with placeholders, mainly because:

It will allow to specify even static values, in one place, conveniently
Most of the configuration is usually static/common anyway
May be version controlled
It is possible to use comments in config files, i.e. in-place documentation can be used
Clover config file can be ported between different environments with no additional effort
To apply changes, only CloverDX’s application container has to be rebooted.

This approach is illustrated in CloverDX’s public Docker example (see architecture diagram). You may notice that there are other directories and resources on a persistent volume. JNDI, HTTPS and JMX configuration files are mostly related to application container and Java settings, but this directory may also contain license file(s) for your planned installations (e.g. serverA.lic, serverB.lic, ...). Of course, these can be switched over, using system property or environment variable as any other clover property.

Also noteworthy is the presence of 2 directories, sandboxes and log. Log and logging I’ll discuss in a different chapter but keeping sandboxes separately is an architectural choice and depends heavily how software deployments work within your organization. This decision also has some job design impact as well.

With sandboxes in a remote directory, it is possible to run CloverDX server in a cluster without any limitations. It also turns projects into stateful ones, resilient to environment rebuilds and restarts.

Storing sandboxes locally on the other hand, allows for more packaged, self-sustaining solutions to be delivered to production. This deployment mode may find its use when server acts as single-purpose application.

First start

Not all configuration aspects of CloverDX server can be provided via property files, most notably user and group permissions.

That’s why, when server starts up for the first time (when database is initialized), an attempt to find and import configuration.autoimport.file, defined by its absolute URL is made. Input file is compatible with one generated by server configuration export tool and therefore capable of importing both server and sandbox (project) configuration.

However, it is not recommended to use server configuration file to keep sandbox-related

Sandbox scanning for unattended deployment

This feature can be enabled or disabled via sandboxes.autoimport property. With this property enabled, server, during its first startup, scans sandboxes.home for projects. When a project is recognized, its root directory is checked for presence of a sandbox_configuration.xml file. If found, configuration related to the sandbox is imported (schedules, data service endpoints, execution parameters, etc.) from it.

As sandbox configuration file can be moved around and version controlled with job files, it is the preferred method to keep sandbox-specific server configuration. When reference environment is set up, the sandbox configuration file can be created directly from server UI, Sandboxes module.

The Export configuration button will create a file, in sandbox’s root directory with all server settings, related to it. If you also have a project connected to a designer, refresh the project for the file to appear in Navigator view.

Logging options and configuration

CloverDX server produces multiple types of logs:

Server status and runtime
Graph execution
Worker & Server core garbage collection

Each of these have different means to configure but all of them can be configured using different system properties.

Server status and runtime

These logs contain information about server and worker failures, performance history, access logs etc. and use log4j2 configuration file(s). Files can be found in server web archive (WAR)

[clover.war]/WEB-INF/log4j2.xml

and

[clover.war]/WEB-INF/worker/cloveretl.server.worker.jar/log4j2.xml

for worker, respectively.

To make changes to configuration, file needs to be extracted from original WAR and path to new configuration changed using system property called log4j.configurationFile. Although same property, it needs to be configured twice (if you wish to change both of them):

Server logging:
Using JAVA_OPTS of your application container (consult documentation of your software). This property can be set using setenv.sh or setenv.bat in Apache Tomcat deployments.
Worker logging:
Is configured via clover properties file, more specifically using worker.jvmOptions property and is portable between environments (windows, linux, …).

If you’d only want to change directory offset location for both server and worker log files, configure clover.clover.home system property. This causes logs to be stored in <clover.clover.home>/cloverlogs.

Important! Changing clover.home parameter also affects server’s classpath and default sandbox location.

Graph execution

Logs related to job execution are configured separately from server status ones even though the default target is the same for both, changing one will not affect the other.

To change job logs target directory, change clover.graph.logs_path system property or graph.logs_path CloverDX property. Property is propagated from server core to worker and therefore does not have to be configured separately.

Garbage collection

Has very little to do with CloverDX server but may help to identify performance issues, memory and connection leaks, irresponsiveness or frequent worker restarts.

These logs can be configured separately for server core and worker.

Server core

Configuration of garbage collection logs is done exclusively on application server level (i.e. Tomcat, JBoss, WebSphere, etc.) as Java execution argument, per Java documentation for argument -Xloggc and documentation for respective application server.

Worker

Since worker generally does most of heavy lifting (and is started by server), it is by default enabled and points to same directory as are Graph execution logs but yet again is consequence of same default values.

The process of configuration and accepted properties is exactly same as for Server core and only differs where these properties are configured. Since server core is the process managing worker they need to be added to the worker.jvmOptions CloverDX property.

Pavel Švec

May 31, 2021

CloverDX Deployment Docker Infrastructure as Code Containers

Quick start

CloverDX Academy

Deployment templating for CloverDX Server

Server properties via config file placeholders

Direct property injection

Templating server configuration

First start

Sandbox scanning for unattended deployment

Logging options and configuration

Server status and runtime

Graph execution

Garbage collection

Server core

Worker

More from Tech Blog

Sending emails via Azure Communication Services SMTP

Connecting to REST APIs (OpenAPI)

Performance tuning: How to troubleshoot database-related performance issues in CloverDX

CloverDX Transformation Language: How to Extend CTL with Java Functions

Organizing large projects: Separating Configuration and Data

Efficient data modelling with DBT and ETL data pipeline

Visit CloverDX Blog

AI in data transformation: Solving data privacy concerns

Building resilient data pipelines for sensitive, high-impact use cases

Bringing clarity to data modeling, AI, and frictionless pipelines

How data-driven automation is improving efficiency in high-touch fulfillment

The vital importance of data governance in the age of AI

Read On

Under the hood of CloverDX Cluster

CloverDX in a Kubernetes Pod using Google Kubernetes Engine (GKE)