3. Over the past months, I have consistently been met with Machine Learning platforms
including AutoML, promising to be code-free. Telling companies that their data scientists
now don't need to be proficient in software development (no code needed), amongst
others. (The promise is also that your data scientists can do data science without
understanding it - but that is a completely different topic, which I will comment on in another
post).
The dangerous part about the above promise is that AutoML only solves the easiest
part. AutoML is fine, but it does not automatically solve your problems.
I do not want to dismiss the value of further research into such tools, but I would like
to help nuance the image that is drawn of AutoML tools.
The machine learning code is only a small part of the system, and AutoML solves
only a small part of that task.
I believe it is more important to have a functioning pipeline that provides value and
that can be incrementally improved which requires more from data scientists than
not needing to code and not needing to understand data science.
Google has posted a number of good practice rules for working with ML, not
surprisingly it is about engineering problems more than data science problems.
https://developers.google.com/machine-learning/guides/rules-of-ml/
4.
5. Everyone wants to do the model work, not the data work hence data
cascades in High-Stakes AI (AI models applied to areas like health,
banking etc.)
Data analysis, business understanding and feature engineering must be
first class citizens when working on Machine Learning problems.
Read this insights-full paper from google. The paper analyses the effects
of data cascading in high-stakes AI.
https://research.google/pubs/pub49953/
6.
7. Over the past five years, I have been following the development and
emerging of countless machine learning platforms for the enterprise.
The challenge for companies is that there is no standard form.
Machine Learning is still in its infancy (Yes! even after approximately 2-3
decades). The current platform initiatives still suffer from early-stage
technology lifecycle, meaning we have not converged to a dominant
design for Machine Learning platforms.
This leaves many companies in chaos, when embarking on their ML
journey, there are just too many platforms out here and no dominant
design. Companies have to pick from countless platforms and tools,
each platform has a different opinion on how we should work with
Machine Learning.
My suggestion for companies is:
👌 You need to see clearly in the field and formulate an action plan that relates to reality.
👌 Accept that there is no best practice in terms of how your Machine Learning Platform should work
(very good to consider, when hiring big consultancy firms to tell you what the latest best practices are).
👌 If you are at the beginning of your Machine Learning journey - hire an evangelist not a senior data
scientist with PhD. in nuclear physics or worse.
8.
9. You definitely do not need to be afraid to launch a product without
machine learning. Even if you think machine learning might be the
answer.
Machine learning is great, however, it requires data. If you don't have
data or are not comfortable implementing machine learning, use
heuristics.
Heuristics will get you a long way in any case, it will also force you to
understand the business problem better and even help you discover
what data might be important in the event of actually pursuing a
machine learning solution later.
A rule of thumb for most cases is, if machine learning will give you a
100% boost, heuristics will at least 50%, which is not too bad.
In some cases, the heuristic approach will even be better in terms of
simplicity and performance.
10.
11. Heuristics can get you a long way, however, if and when the heuristics
get complex, the solution will suffer from becoming unmaintainable and
hard to debug and improve.
It is a good idea to start with heuristic if you are missing data, when you
do have data move on to machine learning.
In software engineering, your team will want to periodically update the
solution (heuristic or machine learning model).
Machine learning models will be easier to update and maintain
compared to complex heuristics.
12.
13. Today RPA is by many companies not only used for process automation
but also as an integration layer to quickly connect different emerging
systems, where typically applications would be integrated using APIs.
RPA is applied in critical areas such as insurance, banking, health,
police, etc.
Doing RPA reveals that there are too many legacy systems with no
interfaces that are useful for integration. It is better to make this analysis
early on and plan for more robust solutions.
Doing RPA is not a long-term solution. RPA solutions inherently lack
almost every aspect of software engineering that has been forged for
decades to ensure better security and maintainability.
Instead of doing frontend automation (RPA), do backend automation
(good old software development).
14.
15. A new time-series #python tool from Facebook. Kats is comparable to
Prophet, with the difference that Kats is Python-specific, and you can
even use Prophet within Kats.
https://facebookresearch.github.io/Kats/
https://facebook.github.io/prophet/
16. If you find my posts useful
Follow me on LinkedIn
Follow