• Blog
  • Podcast
  • Contact
  • Sign in
CloverDX Logo
Product
  • OVERVIEW
  • Discover CloverDX Data Integration Platform###Automate data pipelines, empower business users.
  • Deploy in Cloud
  • Deploy on Premise
  • Deploy on Docker
  • Plans & Pricing
  • Release Notes
  • Documentation
  • Customer Portal
  • More Resources
  • CAPABILITIES
  • Sources and Targets###Cloud and On-premise storage, Files, APIs, messages, legacy sources…
  • AI-enabled Transformations###Full code or no code, debugging, mapping
  • Automation & Orchestration###Full workflow management and robust operations
  • MDM & Data Stewardship###Reference data management
  • Manual Intervention###Manually review, edit and approve data
  • ROLES
  • Data Engineers###Automated Data Pipelines
  • Business Experts###Self-service & Collaboration
  • Data Stewards###MDM & Data Quality
clip-mini-card

 

Ask us anything!

We're here to walk you through how CloverDX can help you solve your data challenges.

 

Request a demo
Solutions
  • Solutions
  • On-Premise & Hybrid ETL###Flexible deployment & full control
  • Data Onboarding###Accelerate setup time for new data
  • Application Integration###Integrate operational data & systems
  • Replace Legacy Tooling###Modernize slow, unreliable or ad-hoc data processes
  • Self-Service Data Prep###Empower business users to do more
  • MDM & Data Stewardship###Give domain experts more power over data quality
  • Data Migration###Flexible, repeatable migrations - cloud, on-prem or hybrid
  • By Industry
  • SaaS
  • Healthcare & Insurance
  • FinTech
  • Government
  • Consultancy
zywave-3

How Zywave freed up engineer time by a third with automated data onboarding

Read case study
Services
  • Services
  • Onboarding & Training
  • Professional Services
  • Customer Support

More efficient, streamlined data feeds

Discover how Gain Theory automated their data ingestion and improved collaboration, productivity and time-to-delivery thanks to CloverDX.

 

Read case study
Customers
  • By Use Case
  • Analytics and BI
  • Data Ingest
  • Data Integration
  • Data Migration
  • Data Quality
  • Data Warehousing
  • Digital Transformation
  • By Industry
  • App & Platform Providers
  • Banking
  • Capital Markets
  • Consultancy & Advisory
  • E-Commerce
  • FinTech
  • Government
  • Healthcare
  • Logistics
  • Manufacturing
  • Retail
Migrating data to Workday - case study
Case study

Effectively Migrating Legacy Data Into Workday

Read customer story
Company
  • About CloverDX
  • Our Story & Leadership
  • Contact Us
  • Partners
  • CloverDX Partners
  • Become a Partner
Pricing
Demo
Trial

Parallel Data Processing with CloverDX Cluster

Data Integration Data Transformation CloverDX
Posted November 04, 2009
3 min read
Parallel Data Processing with CloverDX Cluster

For the upcoming release of CloverDX 2.9, we are working on improvements in CloverDX Server which will allow run transformations in parallel on multiple cluster nodes.

CloverDX Server already supports clustering, so more instances may cooperate to each other. Current stable version already implements common cluster features: fail-over/high-availability and scalability of lots of requests which are load-balanced on available cluster nodes. These features are actually implemented since version 1.3.

The basic concept of new parallelism
Transformation may be automatically executed in parallel on more cluster nodes according to configuration and each of these "worker" transformations processes just its part of data. Because there is one "master" transformation, which manages the other transformations and which gathers tracking data from "worker" transformations, the parallelism is transparent for CloverDX Server client. Client by default "sees" just one (master) execution and aggregated tracking data. However there are still logs and tracking data for each of "worker" transformations, so it's still possible to inspect details of this parallel execution. "Worker" transformations outputs are gathered to the "master", thus client has one single transformation output which may be processed further.

So how to get parts of input data?
Basically, transformation can process data which is already partitioned, which is the best case and there is no overhead with partitioning of data, or CloverDX Server itself can partition input data from one single source and distribute data on the fly (during the transformation) to several cluster nodes using the network connection. Overhead of this operation depends on the speed of network communication and other conditions.

Design changes in the graph
We aim to keep the transformation graph almost the same as it would be for "standalone" execution. Thus there will be just a couple of extra components in the graph which is intended to run in parallel. These components will handle partitioning/departitioning of data in case it's not already partitioned.

Scalability
The new parallelism in CloverDX Server is a giant leap for scalability of the transformations. Ever since the graph is designed for paraller run, the number of computers which run this transformation depends just on cluster configuration. Graph itself is still the same. Configuration of the parallelism includes:

  • working CloverDX Server cluster, thus standalone server instances won't be able to handle such execution
  • "partitioned" sandbox(see below) with list of locations

New sandbox types
On server side, graphs and related files are organized in so-called sandboxes. Until version 2.8, there was just one type: "shared" sandbox. It means that it contains the same files and directory structure on all cluster nodes. Since version 2.9 there will be two more types:

  • "local" sandbox - is (locally) accessible on just one cluster node. It's intended for huge input/output data which is not intended to be shared/replicated among multiple cluster nodes.
  • "partitioned" sandbox - each of its physical location contains just part of data. It's intended as a storage for partitioned input/output data of transformations which are supposed to run in parallel. List of physical locations actually specifies nodes which will run "worker" transformations.

Master - worker responsibilities
Master observes all related workers and when some transformation phase is finished on all workers, it's master's responsibility to allow the workers to process next phase. When any of the workers fails from any reason, it's master's responsibility to abort all the other workers and select whole execution as failed. Master/worker - These terms have meaning only in the scope of one transformation. Since 2.9 there is no privileged node configured as "master" in the cluster, but it doesn't mean that all the nodes are equal. There may be differences between nodes in accessibility to physical sources. Configuration of sandboxes should reflect it.

Share

Facebook icon Twitter icon LinkedIn icon Email icon
Behind the Data  Learn how data leaders solve complex problems every day

Newsletter

Subscribe

Join 54,000+ data-minded IT professionals. Get regular updates from the CloverDX blog. No spam. Unsubscribe anytime.

Related articles

Back to all articles
Data Integration
4 min read

Bringing a human perspective to data integration, mapping and AI

Continue reading
How AI is shaping the future of data integration
Data Integration
4 min read

How AI is shaping the future of data integration

Continue reading
Data visualization with a dashboard on a computer screen
Data Integration
4 min read

Data literacy, visualization and the challenges of data integration

Continue reading
CloverDX logo
Book a demo
Get the free trial
  • Company
  • Our Story
  • Contact
  • Partners
  • Our Partners
  • Become a Partner
  • Product
  • Platform Overview
  • Plans & Pricing
  • Customers
  • By Use Case
  • By Industry
  • Deployment
  • AWS
  • Azure
  • Google Cloud
  • Services
  • Onboarding & Training
  • Professional Services
  • Customer Support
  • Resources
  • Customer Portal
  • Documentation
  • Downloads & Licenses
  • Webinars
  • Academy & Training
  • Release Notes
  • CloverDX Forum
  • CloverDX Blog
  • Behind the Data Podcast
  • Tech Blog
  • CloverDX Marketplace
  • Other resources
Blog
The vital importance of data governance in the age of AI
Data Governance
Bringing a human perspective to data integration, mapping and AI
Data Integration
How AI is shaping the future of data integration
Data Integration
How to say ‘yes’ to all types of data and embark on a data-driven transformation journey
Data Ingest
© 2025 CloverDX. All rights reserved.
  • info@cloverdx.com
  • sales@cloverdx.com
  • ●
  • Legal
  • Privacy Policy
  • Cookie Policy
  • EULA
  • Support Policy