It’s been 15 years since the term data scientist has become one of the most sought-after professions. Nevertheless, if you ask a lot of data scientists what their profession is you will get very different answers, which will mostly depend on the kinds of companies they work for.
Talk given at CodeTalks 2023 in Hamburg.
So how does one learn Data Science when the definition of the field is open to interpretation? Or, when putting together all the job descriptions, in order to become a Data Scientist one would need to know all the theory, and new approaches and be able to use hundreds of tools.
In this talk, we will explore the fundamentals needed to be a data scientist, from the perspective of theory, tooling, and approaches. We will talk about some of the common misconceptions people starting to learn data science have. And about some of the reframing that I have seen successful learners have done on their path to data science.
And what role do large language models play in the future of learning data science?
3. AN “OFFICIAL" PROFESSION SINCE 2008
WHATISADATASCIENTIST?
DJ Patil and Je
ff
Hammerbacher of LinkedIn and Facebook made "Data Scientist" an
o
ff
icial buzzword. They were looking for a job title that didn’t sound too Wall Street (Data
Analyst) nor too academic (researcher).
HTTP://ETC.CH/Q7YT
4. AN “OFFICIAL" PROFESSION SINCE 2008
WHATISADATASCIENTIST?
DJ Patil and Je
ff
Hammerbacher of LinkedIn and Facebook made "Data Scientist" an o
ff
icial buzzword. They were looking for a
job title that didn’t sound too Wall Street (Data Analyst) nor too academic (researcher).
“THETITLESOUNDSSOPHISTICATEDAND
JUSTVAGUEENOUGHTOTRANSCEND
INDUSTRIESANDBETAKENSERIOUSLY,
EVENBYPEOPLEWHOHAVENOIDEA
WHATITIS.”
6. FIRST USED IN 1974 IN “THE CONCISE SURVEY OF COMPUTER METHODS”
WHATISDATASCIENCE?
Peter Naur de
f
ined Data Science as
"THEUSEFULNESSOFDATAANDDATA
PROCESSESDERIVESFROMTHEIR
APPLICATIONINBUILDINGANDHANDLING
MODELSOFREALITY.”
13. /IN/TEREZA-IOFCIU
SOME PEOPLE THINK…
YOUNEEDTOSHOWALLTHETHINGSYOUKNOW
- Hardly ever.. you need to show how you understand and tackle
a problem with one approach, create a useful baseline, work
iteratively towards a better solution
- In an interview, you have limited time, ask what would be most
important: speed, impact, tech debt and suggest a solution
addressing that
15. Understand distributions
Understand insights from other specialists,
develop your analytic business knowledge
You often start with getting data, cleaning data,
doing EDA, and feature engineering. You get
better at these when you understand your data
and the business
You might need to design an AB test to validate
your solution
/IN/TEREZA-IOFCIU
THEORY-DATA
STATISTICS & PROBABILITY
16. Really understand the basic models: linear
and logistic regression, random forest,
gradient boosting, ridge regression .. rather
than rushing to LLMs and NN
Decision-making in industry is mostly
performed based on tabular data.. needing
models that are not just accurate, but also
e
ff
icient and interpretable
/IN/TEREZA-IOFCIU
THEORY-ML
BASICS ARE ESSENTIAL
17. Strong 🐍 python skills .. beyond the
notebook, including code testing and code
reviews
Essential e
ff
ective pandas, numpy, scipy and
matplotlib or other viz libraries
/IN/TEREZA-IOFCIU
TOOLING
BASICS ARE ESSENTIAL
18. Data is cross-functional.
No matter what you will need to be able to
• Clearly gather requirements and scope your
problem
• Use storytelling and viz to demonstrate impact
• Get projects/budgets approved
/IN/TEREZA-IOFCIU
TOOLING
COMMUNICATION IS 🔑
19. PICK AN APPROACH THAT WORKS FOR YOU AND YOUR EXPERIENCE / OR A COMBINATION
GETTINGSTARTED
Short programs
Long programs /
degrees
Self - learning
28. THANK YOU
Special thanks to Noa Tamir for helping out with this talk 🫶
@VIS.SOCIAL@TEREZAIF
/IN/TEREZA-IOFCIU
📧HELLO@TEREZAIOFCIU.COM
ICONS FROM THENOUNPROJECT & ICONS8