get you a data engineer
Hey there. My intended audience for this post is someone who is a decision maker at an organization,
Whatever it was, the prospect of using machine learning and AI to solve business problems has got you intrigued. Machine Learning offers many benefits. My raison d’être for this blog post is to convince you that 1.) your instincts were right, ML has tangible benefits and 2.) one of the larger benefits is actually an update of your systems. Let’s work backwards on those.
ML as a Harbinger
Most businesses, earnestly will answer the question “do you make data driven decisions” or even more broadly “do you have data” with a resounding yes. This is well and good until someone explains what “data” means in a new context. In the past, a “data driven decision” meant something on a spectrum of “we have a dashboard that tells us an aggregated statistic that we use downstream for decisions” to “we have someone who statistically measures significance between distributions of our sales.” This is an outdated view of “data drive decision making” though. Things have improved. If you haven’t had a data scientist on staff, these changes might have happened without your knowing. Machine learning is nothing new. Two widely used models: XGBoost and CART have been around for almost a decade, and since the 80s, respectively. Some other models are even older. Modeling techniques have roots in models from the 1800s, so what is the big differentiator? Data. Namely: the collection, storage, and processing of that data. Let’s take Ronald Fisher’s Studies in Crop Variation written in 1921 for example. Data collection has evolved much since then, but many of the statistical techniques used there have survived. So how is anything different?
The ability to store and transform data is the game changer here. Since the early 2000s with Google’s paper on horizontal scaling of data stores (which would later become Hadoop) we have seen massive improvements over the datastores of old.
Sexy Jobs and Plumbers
My strong argument is that Data Engineers are the unsung heroes here. Data Scientists are sexy but there’s a foundation that needs to be there before they really at pinnacle valuable. Monica Rogati penned a brilliant article on this, and it’s a must read (ref). We can think of data as progressing layer upon layer, like a pyramid !.