When you're onboarding customer data into your platform, you're performing the same actions every time, but there's often important variances in what your clients are sending you.
You could ask your customers to send their data to you in a format that exactly matches what your system requires, but that's often time-consuming and frustrating for them (and can sometimes be impossible if they don't have the necessary technical skill).
Or you could build a data ingestion framework that will handle data in whatever format it's submitted, reducing the burden on your clients. That framework can also empower your less-technical staff to manage data onboarding, and enable you to create a repeatable onboarding process that you can adjust to support the small but important differences between multiple clients.
To examine this in more detail, let’s take a look at three real-world use cases where we worked with clients to build processes in CloverDX that enabled them to automate and speed up their data onboarding.
Each of these case studies shows how the data ingestion workflow can be designed for resilience; to handle variability in input format; and to manage the whole process automatically - from detecting arrival of incoming files, to ingesting the data, and providing robust reporting and error-handling.
1. Onboarding legal data
Our client had ambitious objectives for getting data into their legal case management platform. They wanted the platform to handle data in a variety of formats, without having to know in advance what the format was. They wanted to land that data into staging tables in a relational database. And they also wanted to simplify retries, without the need for technical support.
Here is a visual representation of the onboarding process they wanted to achieve:
And here’s this process visually represented in CloverDX Designer. You can see how designing data pipelines using CloverDX keeps the process in line with the original onboarding objectives:
By using CloverDX for their data onboarding, the client now has the ability to inspect and auto-detect the structure of the data and populate a stage table, all without the need for transformation. They also receive error reports so they can make the necessary adjustments to rerun processes without the need of a development team.
The error reports also tell them how long a run takes, what files are ingested, how many records are created and rejected, and why a run failed. They can then take this information and adjust the metadata to perform a rerun, without having to change the pipeline itself.
The result is faster, more efficient data onboarding and better service for their clients.
2. Onboarding operational data from schools
Class schedules, enrollment figures, attendance records—schools deal with a lot of dynamic data. They also have a responsibility to share this data with parents, teachers, and members of the school board.
Unsurprisingly, most educators are not trained in data management. But when a platform dealing with data from a network of K-12 schools came to us, they were using a bespoke system in Python. As you can imagine, this was challenging for its users.
The company wanted to update their processes to provide a portal that their stakeholders could easily access. They wanted the data ingestion process to be handled automatically, so the data they were providing back to those stakeholders would be accurate and up-to-date.
However, the schools and stakeholders still need to receive and process data in a variety of formats including FTP and email. As such, they still needed to monitor these continually for incoming files. And they wanted to orchestrate the entire process, all without creating a new pipeline every time they added a new school.
With CloverDX, they were able to create a new onboarding framework, including monitoring of an FTP site to automatically detect and process incoming files. But sometimes schools would also send in data via email, so the ingestion process also includes a workflow that automatically scans an inbox for emails that meet their criteria.
These emails are then automatically pushed into the FTP process:
In fact, CloverDX orchestrates the entire data pipeline including:
- File unzipping
- Quality checking
- Sanity checking
- Data transformation
- Pushing of data to APIs
It also takes the data files and pushes them into an S3 bucket.
What’s more, the pipeline is entirely reusable, so the platform owners don't need to create new pipelines when a new school is onboarded.
3. Onboarding data to a consumer debt collection platform
Debt collection is a highly competitive business. So when a debt collection platform wanted to automate their data onboarding, it was crucial that we remove all barriers in client acquisition.
We needed to allow for the fact that clients would provide data in a variety of formats. And of course, we had to empower non-technical users to onboard the data easily, taking into account that they would be updating debtor information on an ad hoc basis too.
And there was the additional factor that the client would be providing three categories of files—new debtors, debtors with status changes, and debtors who no longer required collection. As such, we needed to enable the ability to look up client-specific mapping and transformation rules.
Using an Excel file that could be configured by the onboarding team for each client meant that those non-technical users don't have to write code, they can just define that mapping in the spreadsheet.
The pipeline then consults the spreadsheet to do the data mapping.
We also provided a web app where those non-technical users could upload an input file and the data pipeline would be triggered in the same way and run automatically.
The solution also provides rich logging so users can view reports for their data onboarding, including which records were rejected and why, to make debugging easy and to enable users to rerun the process in the event of any errors.
Automated data ingestion frameworks with CloverDX
Although each of these real-world use cases are slightly different, they all used CloverDX to give them:
- End-to-end orchestration - a completely automated, unattended process to allow new data to be onboarded with no extra effort.
- A system that can handle variations in input. So being lenient in what we accept without having to stop the data process or burden the client by asking them to change the structure of what they're sending.
- A solution that can be managed by a non-technical team. By using configuration files to drive the underlying pipeline, onboarding teams (who have the best knowledge of the data) can do more of the work themselves, without needing to code or to rely on a development team.
To chat to us about building an automated data ingestion pipeline to onboard your customer data, just drop us a line here.
You can watch the whole video of the webinar this post is based on here: How setting up a data ingestion framework helps automate and speed up data onboarding