Big data, data science, and machine learning rose to prominence around a decade ago, and have become cemented in the tech landscape as the size and complexity of data continues to increase. However, many companies still are confused about how to best make use of their data, and try to hire "all-in-one" superstars who can do everything from research to creating efficient ETLs to maintaining machine learning projects in production. In this talk, we'll explain how most data science or machine learning work should be a collaboration between dedicated data scientists and data engineers. We'll talk about the core responsibilities of each role, where they can each add the most value to companies, and also where their roles overlap. We'll also discuss where specializations such as machine learning scientists, machine learning engineers, DBAs and ML ops fit in, and whether the future seems to be heading more towards generalization or specialization. We'll end with recommendations on how you can build the best combination of these roles for your company's needs.
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Avoid hiring data ninja rockstars: how to build effective data teams
1. Big Data LDN, September 2022
Dr. Jodie Burchell, Data Science Developer Advocate
Pasha Finkelshteyn, Big Data Developer Advocate
Avoid hiring data ninja
rockstars
How to build effective data teams
17. Data scientist skills
● Data + science!
● Automate manual processes
● Build data products
● Mathematics
○ Statistics
○ Machine and deep learning
● Programming
● Data visualisation
19. Data scientist responsibilities
4. Model building
● Feature engineering
● Iteratively find acceptable model
5. Handover model for productionisation
6. Monitor model metrics
21. BI analyst skills
● Deep understanding of business
● Data communication and visualisation
● Data wrangling
● Data exploration and analysis
22. BI analyst responsibilities
1. Understand feasibility of business goals
2. Design metrics to measure business success
3. Request data sample from data engineer
23. BI analyst responsibilities
5. Request tables needed for reporting
6. Monitor business metrics
● Creating dashboards
● Ad hoc reporting
25. Data engineer skills
● Data lifecycle
● Strong engineering skills
● Monitoring data pipelines
● Manage data needs for multiple teams
● (Sometimes) responsible for ML models in production
27. Data scientist responsibilities
4. Input as to whether model is technically feasible
5. Production:
● Build data pipelines
● (Potentially) implement model
6. Monitor data metrics
28. What is the definition of done of the
model?
Overlaps
30. Summary
1. There are many responsibilities around data
2. These responsibilities are broad
3. Work can be split between three dimensions
4. Hire team according to needs