How AI is shaping the future of data integration

Using AI effectively requires more than new tools. It often involves rethinking data strategies from the ground up.

On a recent episode of the Behind the Data podcast, we sat down with Andy Coulson, a cloud architect on Epicor's Auto Catalog team, to discuss how AI is transforming data engineering and the complexities of building a new, cloud-native platform from scratch.

With more than two decades of experience and a keen focus on cutting-edge developments, Andy shared his insights on using AI for data processing, integrating new technology, and the complexities of transitioning from legacy systems to modern architectures.

Understanding the challenges and opportunities of AI in data engineering

AI is dominating the conversation in data engineering, but integrating it effectively is a nuanced challenge. Andy explains how the implementation of AI-driven systems, like chatbots, is more than just coding. It’s about understanding how to prompt AI correctly.

“Instead of writing procedural code that goes X then Y then Z, you’re just giving it (AI) instructions in a prompt ahead of time,” Andy says. But building effective AI solutions isn’t always straightforward. Getting chatbots and other systems to follow instructions precisely requires expertise in prompt engineering. A skill many traditional developers are still mastering.

One of the most important challenges with AI integration is maintaining cost-effectiveness and managing the speed of responses. As Andy mentions, “It’s not as fast as a Google search. There’s a time delay for it to analyze and generate responses.” Understanding the right balance between data quality, response time, and cost is of critical importance when designing these systems.

Using retrieval-augmented generation (RAG) to boost AI relevance and accuracy

Retrieval-augmented generation (RAG) is transforming how AI models interact with data. Rather than relying solely on pre-trained knowledge, RAG allows AI systems to tap into specific datasets or APIs to retrieve information on demand. This enables AI models to provide more contextually accurate and relevant results by referencing live data sources.

For Epicor, this approach is proving to be a game-changer. By leveraging RAG, the company team can prompt the AI to access their extensive automotive parts catalog, retrieve the exact data needed, and deliver precise, meaningful interactions for users. This level of integration ensures more customized responses and improves overall system performance, making AI systems significantly more reliable in real-world applications.

Moving from legacy systems to cloud-native platforms

Epicor’s Auto Catalog has been a trusted resource for decades, but it’s now undergoing a significant transformation. The current system is based on a 20-year-old Windows application that still relies on local installations and DVDs for data distribution—a method that Andy admits is “quite antiquated.”

To modernize, Epicor is building a cloud-native platform using a mix of AWS, graph databases, and advanced search technologies. The choice to go with a graph database is unusual, Andy notes, but it makes sense for linking complex data types like parts, labor specifications, and service intervals. However, graph databases also come with their own set of challenges, particularly around performance and handling wildcard queries.

“We use a search engine like Elasticsearch to narrow down primary keys, and then go after the detailed data in the graph database,” Andy explains. By combining the strengths of multiple technologies, they’re building a platform that’s flexible enough to scale with new demands.

Streamlining data workflows with CloverDX

With billions of data entries coming in from thousands of parts suppliers, handling and normalizing this information is no small feat. That’s where CloverDX comes into play. Andy shares that CloverDX’s flexibility and ease of use have been instrumental in managing diverse data sources, from JSON feeds to CSVs and Microsoft Access databases.

“CloverDX really accelerates our ability to onboard new datasets,” Andy says. It allows their team to rapidly ramp up on new data sources without needing to build everything from scratch. And as Epicor continues to expand, this agility is essential for keeping up with evolving demands.

Key lessons learned on data modernization

One of the biggest lessons Andy has learned is the importance of understanding data requirements upfront. “The more you can do upfront, the less you have to do down the road,” he says. Proper planning helps avoid scalability and performance issues, making future enhancements smoother.

He also emphasizes the need for a balanced approach between innovation and stability. While moving to a cloud-native, containerized system offers tremendous benefits, it’s not without risks. It's essential to thoroughly plan and understand the data before jumping into development.

Building a future-proof data platform with AI and automation

Epicor’s journey with their Auto Catalog product, from a legacy system to a cutting-edge cloud platform, is a work in progress. However, it’s clear that AI and modern data integration tools are paving the way for new capabilities and efficiencies. With projects like building advanced AI-driven search capabilities and using machine learning for data quality, Epicor is setting the stage for even more transformative changes.

For Andy, the future is about pushing the boundaries of what AI and data can achieve together. Whether it’s making complex data interactions feel more natural or using AI to glean new insights from existing data, the possibilities are expanding rapidly.

To hear more from Andy and other experts on the forefront of data innovation, check out the Behind the Data podcast and stay tuned for more insights on how to make the most of your data assets.

Ready to see how CloverDX can transform your data operations? Get in touch with our team for a demo and start your data journey today.