SlideShare a Scribd company logo

from_physics_to_data_science

1 of 27
Download to read offline
FROM PHYSICS TO DATA SCIENCE
Martina Pugliese
17 December 2015
Scotland Data Science & Technology
An outline of what we will discuss
THE PARTS ABOUT
ME,
MY JOB,
MY BACKGROUND
WHAT (I LEARNED) IT
MEANS TO
DO DATA SCIENCE
WHAT IS DATA SCIENCE
AND ITS (AMBIGUOUS)
RELATIONSHIP
TO RESEARCH
WHO AM I?
Why am I here?
What do I want?
THE BORING BACKGROUND
➤ I did a Bachelor’s degree in Physics
I thought I wanted to do particle physics
➤ Then I did a Master’s degree in Physics (Statistical Mechanics)
I’ve studied the evolution of Influenza virus
0 2 4 6 8 10 12 14 16 18 20 10−3
10−2
10−1
1
10
0
1
2
3
4
5
6
7
S
E
0 2 4 6 8 10 12 14 16 18 20 10−3
10−2
10−1
1
10
0
1
2
3
4
5
6
7
S
E
βM
pM0.55
S
0
1
2
3
4
5
6
7
βM
pM0.55
S
0
1
2
3
4
5
6
7
Numerical model (using a genetic
algorithm) simulating how
the pathogen creates new variants
THE BORING BACKGROUND
➤ Then I did a PhD in Physics
I’ve explored how Natural Language evolves in time
0
0.2
0.4
0.6
0.8
1
10−5
10−4
10−3
10−2
10−1
I
fsum
burn
0
0.2
0.4
0.6
0.8
1
10−5
10−4
10−3
10−2
10−1
I
fsum
dwell
0
0.2
0.4
0.6
0.8
1
10−5
10−4
10−3
10−2
10−1
I
fsum
hide
0
0.2
0.4
0.6
0.8
1
10−5
10−4
10−3
10−2
10−1
I
fsum
sing
verbs changing
inflection in time
hide became irregular
sing stayed irregular
burn stayed regular dwell oscillates
Data Mining
&
Simulations
THE BORING BACKGROUND
➤ I wanted a job in the industry, as a Data Scientist, so …
I’ve done a bootcamp in London, S2DS, working on a
commercial DS problem [1]
Physics gave me:
the ability to model reality
(mathematically)
a brain trained to deal with data
ideas about lots of more things to study
the scientific method to carry out
experiments
Ad

Recommended

Ethical hacking - Footprinting.pptx
Ethical hacking - Footprinting.pptxEthical hacking - Footprinting.pptx
Ethical hacking - Footprinting.pptxNargis Parveen
 
MBA Research Thesis Proposal presentation - Analysis on the Factors affecting...
MBA Research Thesis Proposal presentation - Analysis on the Factors affecting...MBA Research Thesis Proposal presentation - Analysis on the Factors affecting...
MBA Research Thesis Proposal presentation - Analysis on the Factors affecting...Umesha Gunasinghe
 
Secure Your Medical Devices From the Ground Up
Secure Your Medical Devices From the Ground Up Secure Your Medical Devices From the Ground Up
Secure Your Medical Devices From the Ground Up ICS
 
Applications of nanobiotechnology in medicine
Applications of nanobiotechnology in medicineApplications of nanobiotechnology in medicine
Applications of nanobiotechnology in medicineRameshPandi4
 

More Related Content

Similar to from_physics_to_data_science

Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Insight white paper_2014
Insight white paper_2014Insight white paper_2014
Insight white paper_2014Lin Todd
 
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...NOVA DATASCIENCE
 
Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Joanne Luciano
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientistTanujaSomvanshi1
 
Landing your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical InterviewLanding your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical InterviewAnidata
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceJuuso Parkkinen
 
10 Tips From A Young Data Scientist
10 Tips From A Young Data Scientist10 Tips From A Young Data Scientist
10 Tips From A Young Data ScientistNuno Carneiro
 
Hector Guerrero- Road to Business Analytics
Hector Guerrero- Road to Business AnalyticsHector Guerrero- Road to Business Analytics
Hector Guerrero- Road to Business AnalyticsErika Marr
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First CourseArnab Majumdar
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist prateek kumar
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceANOOP V S
 

Similar to from_physics_to_data_science (20)

Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Insight white paper_2014
Insight white paper_2014Insight white paper_2014
Insight white paper_2014
 
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...
 
Interview
InterviewInterview
Interview
 
Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020
 
How to crack down big data?
How to crack down big data? How to crack down big data?
How to crack down big data?
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientist
 
Landing your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical InterviewLanding your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical Interview
 
The field-guide-to-data-science
The field-guide-to-data-scienceThe field-guide-to-data-science
The field-guide-to-data-science
 
SENCER_panel.ppt
SENCER_panel.pptSENCER_panel.ppt
SENCER_panel.ppt
 
The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data Science
 
Data Scientist
Data ScientistData Scientist
Data Scientist
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data Science
 
10 Tips From A Young Data Scientist
10 Tips From A Young Data Scientist10 Tips From A Young Data Scientist
10 Tips From A Young Data Scientist
 
Hector Guerrero- Road to Business Analytics
Hector Guerrero- Road to Business AnalyticsHector Guerrero- Road to Business Analytics
Hector Guerrero- Road to Business Analytics
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First Course
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Information & data science (1) converted
Information & data science (1) convertedInformation & data science (1) converted
Information & data science (1) converted
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 

from_physics_to_data_science

  • 1. FROM PHYSICS TO DATA SCIENCE Martina Pugliese 17 December 2015 Scotland Data Science & Technology
  • 2. An outline of what we will discuss THE PARTS ABOUT ME, MY JOB, MY BACKGROUND WHAT (I LEARNED) IT MEANS TO DO DATA SCIENCE WHAT IS DATA SCIENCE AND ITS (AMBIGUOUS) RELATIONSHIP TO RESEARCH
  • 3. WHO AM I? Why am I here? What do I want?
  • 4. THE BORING BACKGROUND ➤ I did a Bachelor’s degree in Physics I thought I wanted to do particle physics ➤ Then I did a Master’s degree in Physics (Statistical Mechanics) I’ve studied the evolution of Influenza virus 0 2 4 6 8 10 12 14 16 18 20 10−3 10−2 10−1 1 10 0 1 2 3 4 5 6 7 S E 0 2 4 6 8 10 12 14 16 18 20 10−3 10−2 10−1 1 10 0 1 2 3 4 5 6 7 S E βM pM0.55 S 0 1 2 3 4 5 6 7 βM pM0.55 S 0 1 2 3 4 5 6 7 Numerical model (using a genetic algorithm) simulating how the pathogen creates new variants
  • 5. THE BORING BACKGROUND ➤ Then I did a PhD in Physics I’ve explored how Natural Language evolves in time 0 0.2 0.4 0.6 0.8 1 10−5 10−4 10−3 10−2 10−1 I fsum burn 0 0.2 0.4 0.6 0.8 1 10−5 10−4 10−3 10−2 10−1 I fsum dwell 0 0.2 0.4 0.6 0.8 1 10−5 10−4 10−3 10−2 10−1 I fsum hide 0 0.2 0.4 0.6 0.8 1 10−5 10−4 10−3 10−2 10−1 I fsum sing verbs changing inflection in time hide became irregular sing stayed irregular burn stayed regular dwell oscillates Data Mining & Simulations
  • 6. THE BORING BACKGROUND ➤ I wanted a job in the industry, as a Data Scientist, so … I’ve done a bootcamp in London, S2DS, working on a commercial DS problem [1] Physics gave me: the ability to model reality (mathematically) a brain trained to deal with data ideas about lots of more things to study the scientific method to carry out experiments
  • 7. DATA SCIENCE Trend or Hype? what do you mean by “science”?
  • 8. “The key word in “Data Science” is not Data, it is Science. -Jeff Leek
  • 9. DATA SCIENCE: A BABY COME OF AGE? NGram Viewer data There’s lots of talk these days on several buzzwords containing “data” But the science of extracting information out of raw data is much older than some think
  • 10. A WEE BIT OF HISTORY ➤ The ‘60s: Data Analysis bashfully starts branching out of Statistics as an empirical science [1] ➤ The ‘70s: Establishing the idea of converting data into knowledge ➤ The ‘80s: G. Piatetsky-Shapiro founds the KDD (Knowledge Discovery in Databases) conferences ➤ The ‘90s: companies have lots of data on customers! The term Data Science is first used in a conference name [2] ➤ the 2000s: Academic endeavours to define the field [3] Statistical models (the “irrelevant theory”) vs. Algorithms ➤ the 2010s: the BOOM! The “sexiest job of the 21st century” [4] Big Data is the new innovation [5] Growth in “analytics” and Data Science educational programs [6] Data Science in Business should be called “Decision Science” [7]
  • 11. But today, this is what’s happening: [Intel, What happens in an Internet Minute? 2012]
  • 12. So there came the need to have (many more) specialised people, in the industry, to understand this dirty, variegated, large data and leverage it to provide solutions The data we agree to give to services we use (social networks, apps …) is used to sell us tailored experiences There is a saying in Italian which goes (translated) as: “I know you as my pockets” It should now become something like “I know you as your phone” Where to get all these people from? DS academic programs Research on the rise ???
  • 14. The ugly fact: research has no room for all PhD graduates Growth of PhD graduates in S&E fields in time vs. growth of research positions [8] The academic bottleneck is in the after the PhD PhDs do not have real “transferable” skills (The Economist, [8])
  • 15. Is this a reason alone to transfer a PhD to the industry? NO A PhD is an academic qualification It is meant to train people for research And for the new challenges ahead, we need lots of scientists to study new solutions climate change ageing of population sustainable energy sources the human brain data science algorithms … Does it mean access to PhD programs should change? MAYBE
  • 16. Can we suggest Academia and industry should cooperate more? CERTAINLY Google cooperates (and hires from) Academia a lot They’re shaping the innovation landscape Considering them as separate worlds does not help They’re contributing to “traditional” academic research (Quantum Annealing, [9]) They’re pushing the current borders of AI (deep learning, anyone?)
  • 17. THE (OBVIOUS) DISADVANTAGES OF A PHD GRADUATE ➤ The “overqualified and unexperienced” curse ➤ Research trains you to sustain and cope with failure ➤ You know how to quickly learn new stuff alone ➤ You have a long history of communicating your findings THE (NOT-SO-OBVIOUS) ADVANTAGES OF A PHD GRADUATE I’d argue this is the best skill to have today ➤ The “age” and “expectations” problems www.phdcomics.com
  • 19. I believe the main and most important skill one needs in this role is that of being able to learn quickly and having the passion for doing so
  • 20. BUT PRACTICALLY SPEAKING… ➤ Mathematics & Statistics foundations This is the brain training you need to understand it all. I won’t list all the needed stuff because it wouldn't make sense, but in short…: Linear Algebra (matrices operations) Probability Theory, the concepts Graph Theory, the concepts Be proficient with Calculus and Mathematical Methods Statistical Tests and Techniques … ➤ Machine Learning You need to be able to understand an algorithm on pen and paper, otherwise it’s just pushing a button on a ML library. With practice you learn which to choose for what and how to assess its performance. As for libraries, it depends, but scikit-learn is great and very well documented, including the Maths behind algorithms so it’s a great resource.
  • 21. BUT PRACTICALLY SPEAKING… ➤ Programming It’s essential code quickly and product reusable, robust scripts. I have a thing for Python. I also use R sometimes for stats analyses. Shell commands proficiency helps a lot to save time Numerical simulations: something like C++ is very useful Basics of web development and of the software development process ➤ Data visualisation tools Visualisations help you and others around you understand information I use Python libraries for simple things, but the beauty of D3 is unbeatable ➤ Big Data Technologies This is the bit about which there’s lots of talk these days. Analytical skills also means you learn the Technologies (Hadoop/Spark/Mahout…) with practice.
  • 22. RIGHT, BUT WHAT EXACTLY DO YOU DO? tell me about your job!
  • 23. Mallzee is the fashion app for everyone You swipe product right (like) or left (dislike) You can create your own style feeds You can search for specific products and favourite brands You can buy products We have millions of “swipes” plus user data
  • 24. WHAT I DO IN MY JOB Follow the DS mantra: Exploratory Analyses Model Data pre-processing Product Insights Model Validation takes long time…[8] produce visualisations produce software
  • 25. THE ROLE CONSISTS OF SEVERAL THINGS Understand user behaviour in all parts of the app Predict what drives retention/usage Analyse numerical data on swipes to see what’s hot this season Improve product with tailored-to-you features Computer Vision to see what images features perform best for what sorts and whom Measure all indicators across the business Recommendations
  • 26. THE REFERENCES ➤ [1] Something I wrote for S2DS ➤ [1] Tukey, The Future of Data Analysis ➤ [2] Data Science, Classification and related methods, Kobe, Japan, 1996 ➤ [3] Leo Breiman, Statistical Modeling, the Two Cultures ➤ [4] HBR, Data Scientist: The Sexiest Job of the 21st Century ➤ [5] McKinsey, Big Data, the next frontier for innovation ➤ [6] KDNuggets, the boom in analytics education ➤ [7] TechCrunch, Why Decision Science matters ➤ [8] Nature Biotechnology, The missing piece to changing the university culture ➤ [8] The Economist, the disposable academic ➤ [9] What is the computational value of finite range tunnelling? ➤ [8] NY Times, the "Janitor work" is key hurdle for insight ➤ [8] M. Loudikes, What is Data Science? ➤ [9] The Edison European Project
  • 27. Thanks! … and a special thanks to W. Kandinsky