AI in CloverDX

AI-powered data transformation in CloverDX

Our philosophy at CloverDX has always been to give our users maximum control and customizability, and to build features that help you work smarter and more productively - the integration of AI into CloverDX is no different.

AI in CDX

AI features in CloverDX

Transform data with the power of AI and NLP in CloverDX

PRIVACY AND CONTROL

Locally hosted AI text classification

Perform data classification and anonymization tasks - such as sentiment analysis or PII identification - using locally hosted NLP models. 

Plug in your own model, or grab one from the CloverDX Marketplace, and get full control over your data. 

Benefits

  • Secure and private: Using locally-hosted models means data never leaves your CloverDX Server, so you don’t have any extra worries about security or governance.

Need to know

  • Runs on your AI-capable infrastructure: For maximum performance we recommend hosting CloverDX Server on infrastructure equipped with NVIDIA GPU(s)
THE POWER OF LLMS

Transform data with OpenAI

A new OpenAI component for CloverDX enables you to integrate ChatGPT to your data transformations, with the ability to handle back-and-forth prompt chaining chat to process and refine responses.

Benefits

  • Stay in control: You’re always in full control over what data you send to OpenAI.
  • Bring your own key: Use your own OpenAI key and select which OpenAI model you want to use.

Need to know

  • Cost and governance: Keep in mind when using OpenAI integration, data might be leaving the CloverDX platform to a 3rd party service. 
ASSISTANT FOR PRODUCTIVITY

Data transformation in plain English

With the Clover AI Assistant in Wrangler, you can just tell it what you want to do, and get suggested steps to prep and transform your data.

Benefits

  • No technical knowledge needed: Convert or reformat data, correct errors, conditionally filter rows, and more - just ask in plain English.
  • Build transformations faster: No need to go step-by-step, the Assistant can suggest every step at once.

Need to know

  • No black box: The Assistant suggests individual steps that you can accept or reject - so your transformation runs with transparent steps for predictable, reliable results.

All the power of OpenAI

Data transformation using OpenAI/ChatGPT

The CloverDX OpenAI component allows you to use OpenAI LLMs in your data workflows, with customizable response processing, for power and flexibility.

Define your custom prompt and the data you want to send, and then react to the LLM response – essentially allowing you to have a back-and-forth conversation with the LLM.

For example: if you ask for a JSON response from the LLM, you can detect if the response is valid, and if not, send instructions to OpenAI to fix the error.

  • OpenAIClient: compose and send queries to ChatGPT, and process the response
Locally hosted AI

Privacy-first AI data transformation with local-run models

Four new components available now to use in your CloverDX Designer workflows.

These components are essentially wrappers around models - you plug in various machine learning models to implement different use cases.

Models run locally – either on your own hardware or in your own cloud environment, so your data stays in-house, for complete privacy and governance.

You can download models from the CloverDX Marketplace, or use your own. 

Data classification components

3 components that you can use for various data classification use cases. The use case is defined by the model you insert to the component.

  • AITextClassifier: scores input text field(s) against pre-trained set of classes.
  • AIZeroShotClassifier: allows you to define your own classes, and scores input text against them.
  • AITokenClassifier: breaks input text into sub-word units (tokens) and scores them against a pre-trained set of classes.

Anonymization component

Mask data identified by the model, without needing to send your data to a 3rd party for anonymization.

  • AIAnonymizer: allows you to run a token classification model, e.g. to identify PII, and then masks the identified tokens in the output.
.
illustration-cdx7--data-analyze@2x
Clover AI Assistant

Wrangler data transformations in natural language

The Clover AI Assistant helps you to build data transformation jobs in Wrangler, just by asking the Assistant.

Instead of having to find the right steps yourself, just tell the Assistant what you want and it will suggest the steps needed.

The Assistant won’t do anything without you asking it to – you can accept or reject any of its suggestions so you’re always in control. And it builds regular Wrangler jobs, with transparent individual steps that you can always see, edit, or delete – no black box.

What’s shared with the AI? 

Unless you choose to, no data itself is shared with the AI, the Assistant can work just by knowing metadata about e.g. which columns exist in your dataset.

Data transformation with Clover AI Assistant
Note: Incubation features in CloverDX
All the components listed here are currently in Incubation
What does Incubation mean?
Incubation features in CloverDX are tested, supported and available for use, but they’re under active development and will likely see changes.

AI in CloverDX: Frequently asked questions

Here are some questions we've been asked about data transformation with AI in CloverDX. If you have questions that aren't listed here, we're always happy to answer them - just get in touch.

You can choose. If you use the OpenAI component, then you define exactly what data you want to send to AI for analysis or transformation.

But if you don't want to send data to OpenAI (or any other 3rd party), you can run AI models locally, so your data all stays in-house.

AI features in CloverDX don't have any extra cost, and you don't need any extra licences to use them.

If you use the OpenAIClient component, you'll need your own OpenAI account, and you'll pay via that as you usually would for any other OpenAI processing.

Using any AI features in CloverDX is completely optional - no data is shared with 3rd parties unless you choose to send data to OpenAI with the OpenAI component.

You can still benefit from AI capabilities by using the locally-hosted AI models for specific data classification and anonymization tasks. Because these models run completely in your environment, data stays under your control.

Locally-hosted AI components allow you to 'plug in' AI models to perform specific tasks, such as data classification. These models live on your hardware and run locally - no data is sent to any 3rd parties.

The OpenAI component does use a 3rd party (OpenAI) to process data. The advantage of this approach is you get the full power and flexibility of OpenAI, and you don't need your own powerful hardware to run queries. The CloverDX OpenAI component allows you to always specify exactly what data you send to OpenAI.

Read more about the different ways to use AI for data workflows in this post: AI in data transformation: Solving data privacy concerns

No - the AI models available through the CloverDX Marketplace (designed to be plugged into the locally-run AI components) are publicly available models that have been adapted to work with CloverDX by wrapping into a CloverDX library.

Information on each model, including its source and license information, is available on each model's detail panel.

No, we're not developing any models ourselves right now.

But if you have an existing model of your own, we can help you wrap it to make it compatible with CloverDX so you can use it as part of your CloverDX data workflows. 

If you want to talk to us about this, just get in touch.

You can either download an optimized AI model from the CloverDX Marketplace, get your own models (e.g. from HuggingFace, GitHub, etc.), or use your own built and trained models.

Models can then be plugged into the CloverDX locally-run AI components to run on your own hardware. 

Running AI models locally can be slow and resource intensive (CPU variants). But you can use NVIDIA GPU for maximum performance.

To make this easier, we created a new Docker image available on DockerHub.

This image is pre configured with all the dependencies so that you can take advantage of GPU acceleration for your machine learning workloads.

The image can be deployed on any machine where all software and hardware requirements are met. The machine requires NVIDIA GPU with properly configured drivers, CUDA, container toolkit and more.

The easiest way to use this image is to run it on AWS in AWS Deep Learning Base GPU AMI (Ubuntu 24.04) which runs on AWS EC2 G4dn or AWS EC2 G5 instances.

By default, the Clover AI Assistant does not share your data outside the CloverDX platform, unless you specifically ask it to (e.g. to improve the quality of the Assistant's answers).

System administrators can globally disable the option to share data with the AI provider, making Clover AI Assistant safe to use in regulated industries with rigid data privacy and governance rules.

Read more about the Clover AI Assistant

We're continuing to develop more features that use AI to help you improve productivity and results.

The latest release of CloverDX introduced the Clover AI Assistant for Wrangler, and updates to CloverDX AI functionality are planned for future releases.

Sign up to our product information mailing list to be the first to hear about new feature releases, and get invites to our live release walkthroughs with our VP Product.

Be first to see what’s new

Sign up to our product info mailing list to be notified of all the new features, and get invites to our first-look live release walk-throughs.

Stay up to date with CloverDX