How much can Machine Learning be democratised?
I liked this post by Robert Sahlin and Johan Gunnarsson about how they used BigQueryML to create alerts for cost anomalies.
Since I too had a hackathon at work this week a colleague and I implemented the same. It only took a couple of hours to do, so by lunch we had something working.
I then wanted to understand more about exactly how it was working. What is this model? Am I using it right? I seemed to be using the same data for both the training and the alert, is that right? I don’t know much about ML but that didn’t sound right…
It took me all afternoon to try to answer those questions, reading articles and documentation and understanding them as best I could, and I while I think what we have is right (and I have no reason to doubt Robert and Johan!) I’m still not 100% certain.
Which makes me think, how close can we get to truly democratising ML?
This is something I’ve been thinking about a while, and wrote about back in 2020.
The tools are getting better and more accessible, with BigQueryML being a great example of that. All I needed to know was SQL and I got something working in a few hours, without needing any other infrastructure beyond BigQuery. In theory anyone who knows SQL (which is a huge number of people across data and software) can do the same, bringing ML into new domains and powering features great and small.
However, without understanding the models and how they work it’s difficult to have confidence in our results.
Will this get easier over time so that ML is truly democratised? Or will we always need a data scientist to support the deployment of any non-trivial ML application?