How data superheroes use data modeling, AI and Snowflake Cortex

Listen to this episode:

About this episode:

In this episode of Behind the Data, Matthew Stibbe interviews Keith Belanger, a senior data architect at DataOps.live and host of the podcast Behind the Cape. They discuss the emerging trends in the Snowflake ecosystem, particularly the integration of AI through Cortex AI, and the importance of data modeling in agile environments. Keith emphasizes the need for a strong data culture and strategy to avoid pitfalls in data management and decision-making. The conversation also touches on the challenges of trusting AI in business decisions and the future of data operations within Snowflake.

AI-generated transcript

Matthew Stibbe (00:01) Hello and welcome to Behind the Data with CloverDX. I'm your host, Matthew Stibbe and today I'm talking to Keith Belanger, who is a Snowflake Data Superhero, excellent title, and by profession a Senior Data Architect, and he works at our friends and Articulate Marketing client DataOps.live Great to have you on the show, Keith.

Keith Belanger (00:21) I'm happy to be here. Thanks for the invite. This is always happy to talk data.

Matthew Stibbe (00:24) Yeah, good. Let's talk data. Well, let's talk about talking about data first, because you also run a podcast in the world of data. So do tell us about that. Where can we find it?

Keith Belanger (00:35) Yeah, so currently I host a show called Behind the Cape, which is with other Snowflake Data Superheroes. So for those who do not know, there are about 100 Data Superheroes that get the title from Snowflake globally. So I host that show to kind of give them a platform and opportunity to showcase, I call it their superpowers. But they range from people doing AI, data architecture, DBA work, data engineer, you name it, of a huge, you know, breadth of the Snowflake place. And so, yeah, I do it in conjunction with Snowflake, the folks who do the YouTube channel for Snowflake. And so those podcasts are hosted on their Snowflake developer channel. And so you can find them there. If you go to Snowflake developer channel, look for Behind the Cape, and there's a series. We try to get something out every month. But it's touch and go because everybody gets busy. But yeah, we enjoy it. It's fun.

Matthew Stibbe (01:41) tell me about it.

I'm going to go check that out. Thank you. And just in case anyone was wondering, I did not know about Behind the Cape when we came up with the name Behind the Data. was just a delicious coincidence. But we both invented calculus at the same time. tell me, I'm interested in the world of Snowflake data. What are the big emerging trends or ideas that you're getting excited about at the moment?

Keith Belanger (01:52)

Well, obviously the big trend, I think, on the Snowflake space has been the usage and leveraging of Cortex AI. There's been a lot of hype, know, interest in a lot of organizations. How can we leverage Cortex AI? I also host the Boston user group with some folks here, and that's been a strong topic of interest is in the Cortex and AI space within Snowflake. So, yeah, that's been the hot trend.

Matthew Stibbe (02:41) And for people who are not in the Snowflake world, what is Cortex AI?

Keith Belanger (02:42) .

So Cortex AI is their built-in AI solution. So natively within Snowflake. So you don't have to leverage like any kind of API to say use any other AI solution that is outside of Snowflake. I mean, you could, but this was Snowflake's bringing that capability and functionality within the ecosystem of Snowflake. So just if you're doing your own coding, so think of it, you can write SQL, straight SQL that interacts with their models that are built into Snowflake. Kind of again, all encompassing. So even if you're, you know, want to do things with Python or notebooks or SQL, you know, you can leverage the Snowflake ecosystem and leverage their built-in AI capabilities. So a lot of experimenting and folks playing in that space.

Matthew Stibbe (03:42) What are the couple of cool experiments or things that you've seen using it?

Keith Belanger (03:47) Well, right now I've seen the ones I personally have seen of people trying to do things in... how do we leverage - it could be with help desk stuff, it could be with customer sentiment. Like how do we use a lot of interest in some of the unstructured data you can say, like, you you get things that are.. have a lot of text. How can we use that to understand what was the sentiment of the customer? How can our organization maybe is how you want to navigate an incoming call or things like that. So how do you kind of automate that? And so that's kind of some of the initiatives I've seen, but there are people trying to do all kinds of stuff with AI. I mean, product companies trying to do RFP analysis using AI.

So I think it's a wide breadth of, I'm gonna call it experimenting that is going on in this space to kind of take what people have been trying to do with manual over the years, but increase the speed in which they can do things with. That's at least what I've seen, but I think the imaginations of folks can go wild in that space. But then again, I'd say it's cautiously tiptoeing into that space.

Matthew Stibbe (05:16) How do we get to a place where we can be more confident to use AI? What is required for corporations to actually to roll that? I mean, I'm still thinking in the world of snowflake and data rather than a ChatGPT to write your emails.

Keith Belanger (05:27) Yeah.

I mean, you have to, I think from my perspective is you have to test and test and test. And, you know, I like to say trust, but verify, and highly on the verify at the moment, you know, what is the accuracy of the data coming through? You know, I heard this said, and I loved this perspective. If, you know, AI is going to look at a problem and it's going to make a determined like it's X, right?

I might look at that same data, you might look at the data and I say it's A and you say it's X. Okay, AI said it was X. What does that mean? Because if we were doing it manually and I was A and you were X, so AI is always gonna say it's X. So it's again, what is that accuracy? Is that what you would want? Is it not so, people are saying, well, some AI can get 70%, some of it's 80%, maybe you can get up to 90%.

You tell me if 70%, 80 % or 90% accuracy is something you want to make business decisions off of. Well, the only other way to determine if it's accurate is people manually have to look at the results. So it's a very, from my perspective, you're learning, we're learning about AI. And the other part of it, it's not inexpensive. You're using a whole level of different type of compute. You know, GPUs instead of CPUs and all this other stuff.

And so you don't want to waste that either. So, you know, what's the quality of your data? Are you giving it good data? You are you poisoning the model? And then you can't just easily untrain what you trained it. So I think from my perspective, there's a lot of people, like I say, tiptoeing, being cautiously optimistic, but...

I don't think it's an area where at the moment people are just like, hey, I trust AI. We're going to jump in, make a multi-million dollar decision. And next thing you you flopped.

Matthew Stibbe (07:37) like Apple News with their AI summaries of news running into problems where it was mis-summarizing it and, you know, there's reputation.

Keith Belanger (07:48)

And I see right now there's two areas of AI that I, in my head, I divide, right? There's leveraging AI to, like building AI solutions, like for your own business. But then there's also me using AI to do my job, right? So I can use AI to do AI, right? I find myself, I use AI as an accelerator for me to do my job is huge.

To me to sit there and say, yeah, I'm gonna write this store procedure that's five pages long. And back then, you're typing and finding all the miscues and it takes you three days. I can write that same store procedure in five minutes with AI, because I can just tell, I know - especially somebody who has years and years of experience on what you know you want to do - I know how to explain it. I know how to talk about it. And over the years, I had to like explain somebody who was gonna code that. Now I can code it because I'm not going to sit there and write the code. I just know what... So I see from my perspective, AI is playing kind of a dual role, like how can I use AI to make business decisions, but then how can I use AI as a data architect, as a data engineer, to accelerate my ability to do my job and deliver to the business.

Matthew Stibbe (09:07) It's been a long time since I was professionally or semi-professionally writing code, long, long time, 20 something years. But in order to pass a HubSpot exam, I had to write a little piece of code to interface with HubSpot's API. And it's astonishing, you have your coding window, and there's a chat bot. You're going, why didn't that work? And why did I get that error message? And can you rewrite it to, you know? And I know just enough to ask those questions.

But I could imagine if someone like you were an expert, could amplify your expertise, but is there a danger that people who are, if I tried to do what you do, but with my zero level of knowledge about data modeling, you end up sort of having AI amplifying my incompetence. Is that a risk?

Keith Belanger (09:52) Well, yeah, it's true. I mean, it'll just amplify your errors and your mistakes or stuff like that. So if you don't know, like, I can say what I want to do and I can quickly review the code and say, no, and I can tell, I literally will tell the AI agent, but you're being an idiot, right? You need to, no, no, I want, hey, I want to use these metadata columns. I need this underscore. No, let's do this. No, you didn't calculate the hashing correctly. Like, you still have to have the... it's not gonna overcome your experience, right?

And so from my perspective, it's the trust but verify. You have to do the verify. I'm not just gonna take something that I generated, copy, paste it, run it and think everything's great. I just look at it as I've been doing this for close to 29 years. I'm tired of writing. I can remember when... you know, create table, you type out column one, type out column two, type out column three, you know, 45 minutes later you've done one table.

Now I can sit there and say, yeah, I want a table called party. I want first name, last name. I can tell it what I want for class words. I can tell it what I want for data types every time. And it remembers that. So the next time I do another table, says, yeah, Keith told me he wants these metadata columns. He wants these data types. We're doing Snowflake, dah, dah, dah. It remembers, and then again accelerate.

So now what used to take me, you know, a long period of time to do probably mostly because of monotony and, you know, just that's how long it takes. It's just too like, nope, I can just accelerate all that. But to your point, if you don't know what you're doing, then you're just gonna accelerate with, you know, the mistakes. I guess you could say you're gonna fail fast, right?

Matthew Stibbe (11:43) Fail fast fail often. As long as you stop failing at some point it's probably okay.

Keith Belanger (11:47) Well, that's it. You don't want it to... I've been there when I've done something, let's say the old school way, and you deliver it to the business and the ramifications of that, right? Because it's sometimes harder to like, we've been loading that table for a week and now we've got a week of backtracking to do because we did something wrong. So obviously, it doesn't eliminate the need to test. I mean, the other thing great again with AI, I can generate tests, right, to validate what I'm doing. So to me, where I've seen mostly the bigger right now in the AI space was is more of it can help me do my job.

Matthew Stibbe (12:17) Hmm.

Keith Belanger (12:30) Using AI in the critical business space, in business decision making - like I said, I've seen it more people in the, they're in the POC trying, but they haven't wanted to make that big leap because it could cost the company millions, billions of dollars. If you've said, hey, this is X and you make a decision on it and you come to find out, you trained your model with bad data or it has a hiccup and next thing you know, you can't come back from that type of mistake. It's people are being very cautiously optimistic and it's fun, you know, it's as I say, is my job going to be replaced by AI? I don't think so. I mean, you can never say never, but I would say if you're not a data practitioner in some way, you're not leveraging AI, then you you might get bumped by somebody who is using AI. To me it's, if you're in this space, you need to understand it. Yeah.

Matthew Stibbe (13:27) Yeah, more likely to be overtaken by people who are using it well than replaced by it. Yeah. So we were talking earlier about data modeling, which is your superpower. Tell me a little bit about what you have been doing over the last year in the world of data modeling. And then maybe we can dive into the project.

Keith Belanger (13:33). Yes, absolutely. As you said, I mean, the first thing I ever learned to do in data was data modeling, right? So that was like almost 20... I had the benefit of having a mentor who got me into doing the fundamentals and understanding of data design, right? Which has nothing to do with the database or whether it's structured or semi-structured. It's just understanding the relational concepts of data and how they relate from your business perspective.

And I've seen over the years, I mean, every organization used to do data modeling and data design. I mean, in the real world, we design things like houses before we build them. We used to do that in the data world. We would design it and build it. I have seen over the years that, you know, whether it's because of Agile or other things that we've kind of become a lost art. Now we just build things, right?

And the cloud industry has made it very easy to just build things. Cause I can remember the day where you're like, you only have so many bytes to name your table and you only have so many bytes of storage. So you had to be very, you know, critical in your thinking in terms of what you're doing. And now it's like, you know, just we have unlimited storage, unlimited compute, you know, just, have it.

My, the last year I've been spending time working with organizations and trying to bring back the, you know, back to basics. I like to say is, is, you know, how do you reintroduce... data modeling, from my perspective, should be a core practice in every data organization. But I get people saying, Keith, we don't have time. We're too fast. And I'll argue that, I can tell you right now, I personally have been in very agile, very large enterprise organizations that have 200-something data engineers. And we had data modeling as a part of that life cycle.

Matthew Stibbe (15:33) How you integrate that slightly more rigorous data modeling into a fast paced agile development process?

Keith Belanger (15:43) Well, the big thing I like to say is one of the biggest parts is communication. You need to be in communication with your business, right? If you're finding out that today I need to build X data product and you're just finding out today, then you haven't been communicating with your business. Because I know the businesses I've been working on, like, this is our objective for the year. Like this is our yearly objective. This is our quarterly objective. And that quarterly objective then turns into it - everything usually kind of relates to like that.

I mean, yes, there are going to be those surprises. so if you know, like, hey, we want to do customer sentiment analysis coming into Q1, start those conversations, right? Talking with the business and understanding what does that mean to you? Where does that data come from? What is it? And you start formulating that vision. By the time it gets to the data engineer to build it, you should be able to sit there and say, here's the blueprint. Here's the, it starts with, you know, hey, this is the house I want on a napkin, right? You draw it on, you're sitting there. This is this house that I want to, here's the blueprint, here builder, build this house. And it's the same thing. And we would build in, you know, design sprints. So even if you have a build sprint, then maybe the sprint before it is the final design sprint or the sprint before it.

So you need to build that practice. If you build that practice into your development life cycle and your overall culture, of your organization. I like to say that there's people, process, technology. I also want to say there's culture, right? That's kind of...if it is ingrained in your culture that when I go to build something from a data engineering perspective, a transformation tool, and I know what I'm building, you know, or is that data engineer making it up, right? And oftentimes you'll see that data engineers making up their own table, their own target table.

How do you do join relay? Are you doing hashing? Are you doing sequences? There's so much that needs to be kind of, that then they can just worry about the business transformations of that use case, but not how does this fit into the big picture? That's been a lot of like, to me it's like, this is fundamentals, but yeah, for the last year it really has been, people ask me, Keith, what's the future of data modeling and data design? And to me, the future is going backwards, right?

And it's never been more important than now because back to the AI conversation, AI is going to be based on what you give it. If you tell it, this is party, it's going think it's party. But if party really is the payer or the claimant or the purchaser or the patient or this, you have to tell it that. You have to put typically source data does not represent our business concepts because you're usually using, like you said, HubSpot or you're using all these tools that are generic.

They're just generic, but in your business you might call it this or that. You have to transform that data into your business language. And that's where modeling comes from, I'd say. So as you can tell, I'm very passionate about it and it's something that I get the you're too old school, but I usually say, let me guess, you have duplicate data, you have this, you have this. Yes, we do. Okay. Well, you're not doing data modeling, right? You're just building data solutions that are silos and independent, and not integrated into the overall big picture.

Matthew Stibbe (19:11) Is that because developers who are doing their own little bit of this month's sprint to get this functionality out going, I don't know what the overarching master data plan is, I'm going to build this thing that does this now. Or I mean, how do you get the discipline and the backbone into the process from the start? What do...

Keith Belanger (19:24) Well, if it's not there, it's a challenge. I can tell you that when I've gone in and people are doing a certain way, certain, you know, hey, we're successful. if you're talking at the data engineering level, right, you're such at a micro level of the big picture of the organization. You got to kind of start from that top down and get them to understand that from their perspective, they're doing a great job. And oftentimes it's not, I'm not criticizing the data engineer themselves. I mean, they're, basically dealing with the situation they've been given. Here's my source data, I need you to build this. They might not even be aware of the big picture. But to me, it's that as an organizational structure, is you're going to sit there and say, I'm building a skyscraper, my data warehouse. My data engineer is decorating room number one on the first floor. Well, then I have to trickle that down.

When I oversaw, I was a senior architect of a very large team at a Fortune 100 insurance company, we had that - I had the big picture, then I had architects under me in different domains. They own that domain, but each architect understood the big picture. And then each sprint team understood their perspective of the big picture. And it had to all come together.

It's a lot of moving parts and it's not without its challenges, I'll say that. I mean, I think we all know coordinating a large group of people, but it was ingrained in our processes, ingrained in our practices, ingrained in our standards. And when a data engineer came on, they were educated on that, right? So we gave them the right level of autonomy to do their job. But there were certain things like, you you must put this and you must do this, you must do that. We can't have every data engineer hashing a data value or hash diffs completely different.

So if they're all doing hash diffs, then they all have to be doing hash diff the same way. But it's up to us as architects to say, let's make it so that they can just not have to think about it. They just add it in. However your tool, most tools have a way to sit there and say, here's the hash diff function. And everybody uses it.

Matthew Stibbe (21:51) You've been working with companies bringing this data modeling back to basics approach. Over the time you've been doing that, could you distill out one or two lessons? What are the most common mistakes and what are the most valuable insights that people need to know?

Keith Belanger (22:14) Well, I don't know if this is answering your question, but I'll say what I've seen, which was relatively surprising in this space, was just how unknowledgeable or unaware people were of just what is data modeling. Like, you know, I'm going to blame the educational system, like people that now you're going to, you can go to school and go to school for, um, to be a data scientist, right? Or, or you can go for data analytics. I mean, all we had when I went to school was computer science, which is, you know, jack of all trades, master of none type of scenario. But they're coming out and data modeling is just not a skillset or even a known. You know, it's like, I don't want to write... I don't want to take data and write Python and do stuff to it.

So what was surprising to me is just the lack of resources that are available to people to teach and educate and learn how to do data modeling. There are some great books and great people that really can teach you data modeling, you know, but you have to be a self learner. And there are some classes out there, but like me, you can't graduate from the university of... you know, put in name here and that you've come out with knowing how to do a data model. So to me, part of it was just the, some people have the fear of the unknown.

And so how do we get people to understand data modeling? And when I started doing it, you had that one modeler, right? And everybody went to this person. Now you can't do that. Now you have to sit there and say, I have to empower these people to have some extent of that capability. So that was a bit surprising to me over the last year is just like, it's easy for me to say we need to do data modeling. And by the 90% of the people just like, well, what is that? How do I do it?

Or they're using DBT and they say we are data modeling. Well, that's just a different type of model. It's not data modeling.

Matthew Stibbe (24:16) And if you're going to empower people, but you want some overarching consistency, that comes down to some shared doctrine of understanding or culture, doesn't it?

Keith Belanger (24:28) Absolutely. It's, it's, you have to have people that can take that methodology and concept and then trickle it down, right? And teach and mentor. And so like I said, it's the people, the process, the technologies, but you have to have a culture. You have to have a data culture that everybody buys into.

You know, I was fortunate many years ago on the large EDW project, where there was a company that didn't have it. We started from greenfield and we were able to build this culture that if you looked back, when I can remember, was 15 years later and everybody was speaking the lingo that 15 years earlier was not in existence. We built that. But if you're an organization that's been doing something without any real structure, I'm gonna call it... a real data strategy, right? A real, you need to have a business strategy that's backed up by your data strategy and then everybody's under that, you know? And if you don't have that, you just have, here's Snowflake, here's a bunch of tools, just, you know, people just solve this business... you're just doing, you know, popcorn architecture, as some people like to call it, is you just, you know, you're just building technical debt and type of technical debt on top of technical debt. And how do you unwind that?

And, you know, I don't know if, I don't think I've ever been in that situation. Cause typically I'm like, Ooh, I don't.. I'm going to walk away from this one. But there are, think there are many organizations that are there, like, we've been doing it this way. You know, I'll never forget. Like I was on a project. They had just gotten Snowflake. I was coming in to bring... they literally got Snowflake, gave the keys to the car, basically empowered every data engineer. Like here's Snowflake. I was like, you know, back in the day, you didn't do anything without the DBA.

And now we just give everybody the freedom inside of Snowflake to do... and that's dangerous, right? Next thing you know... Snowflakes happy because, you know, people are doing a lot of compute and using a lot of storage, but you have to have, you know, you know, I can remember I took the keys away and people were like, no, we need to have some level of control and structure around, around this.

Matthew Stibbe (26:51) So we're approaching the end of the time we have. And before we get to the end, I wanted to just ask you, thinking about Snowflake, at DataOps, your self-proclaimed title, at least on LinkedIn, is frictionless czar. So just unpack that for me. What's going on with frictionless at DataOps?

Keith Belanger (27:04) Yes, so recently, you know, Snowflake did announce that they have, Snowflake investments have invested in DataOps.live. The reason being is Snowflake has recognized that there is an area in the dataops space, you know, CI/ CD, pipeline management that in their world doesn't exist, which is causing pain and quote unquote friction with Snowflake customers, right?

So many of us who have used Snowflake over the years know I can design and create something, but then the requirements of whether it's regulatory requirements, it could be your own organization's security requirements, they say, well, everything needs to be infrastructure as code or this. So the idea that you can just log into Snowflake as we were saying, and I can just go to a worksheet, paste in my create table statement and hit the button and we're off. No, many organizations say, no, we need to have separation of duty. Have to do dev, task, QA, prod.

And what's happening over the years is because Snowflake doesn't have those capabilities natively inside, you have to resort to third party solutions, open source solutions, or build your own. So like I can remember my first Snowflake implementation, the days of, it takes weeks to buy infrastructure and find room in the data center and get power and air conditioning and yada, yada, yada. Well, with Snowflake, it was like, here's your URL, like minutes later, you have a full blown enterprise data warehousing solution. Great. Now we spent six months trying to figure out how we're going to get stuff into it and manage it and roll it back. So that is causing friction with Snowflake customers. The initiative with frictionless is to bring those DataOps.live capabilities natively into Snowflake, right? So you could be a Snowflake customer and go to the marketplace and say I want to use CI/CD, click on a button. And you're... there it is right, ready for you to go. So there's a lot more to come on that depending on when this is published, when people listening to me, we might already have more, but we will be sharing more in terms of what, we're working closely with the Snowflake product roadmap team on what this roadmap will look like so that these capabilities and functionalities will.... So just like you can use Snowpipe natively load data in, in the future, these DataOp type solutions should be readily available folks to use right inside Snowflake as well. So that's kind of where right now, the frictionless czar is, I'm kind leading point on how do we bring these, what are already available today in DataOps.live, but bring them into the Snowflake ecosystem.

Matthew Stibbe (30:11) Right. Amazing. Well, may your reign be a long and happy one. And I think on that note, that brings this episode nicely to a close. Thank you very much for joining us, Keith.

Keith Belanger (30:16) Thank you. It was a pleasure being here and I enjoyed it. Thanks.

Matthew Stibbe (30:27) So if you're listening and you'd like to get more practical data insights and learn more about CloverDX, please visit cloverdx.com/behind-the-data. And of course, check out Keith's show, Behind the Cape. Thank you very much, everybody, for listening, and goodbye.

Ask us anything!

How Zywave freed up engineer time by a third with automated data onboarding

More efficient, streamlined data feeds

Effectively Migrating Legacy Data Into Workday

How data superheroes use data modeling, AI and Snowflake Cortex

Download and listen on other platforms

Get notified about upcoming episodes