APS GDS data science talk by Trevor Rhone

A beginner’s guide to using
data science for physicists
Trevor David Rhone
Department of Physics, Applied Physics and Astronomy, Rensselaer Polytechnic Institute
1
I Keep Six Honest Serving-Men
I keep six honest serving-men
(They taught me all I knew);
Their names are What and Why and When
And How and Where and Who.
- by Rudyard Kipling
2
What is Data Science?
Statistics
Computer
Science
Digital
Data
Knowledge
base
3
What is Data Science?
Data Science
Machine
Learning
Statistics
Visualization
Databases
Data mining AI
4
Why do we care about data science?
o Netflix movie recommendations
o Materials Discovery
• Experiments slow
• Calculations expensive
• No analytical solution
o Uncover physical insights
5
o Targeted advertisements
o Self driving cars
o Thales of Miletus, Ancient Greece
(624 BC – 546 BC)
The When and Where of data science
o Age of big data
• Data are accessible
• Data analytics tools are accessible
6
o Observational astronomy
o Bioinformatics
o Social media and targeted advertising
The essential guide
How to do data science?
1. Get data
o Kaggle
o Google dataset search
2. What are good descriptors?
o Mathematical representation of
the data
o Domain knowledge
3. Data visualization
4. Model selection
5. Model validation
6. Model exploitation
Photo: https://commons.wikimedia.org/7
Data Science
Ecosystem
How to do data science?
8
Data science ~ data visualization + machine learning
y = f(x1, x2, …, xN) + 𝜺
Inputs of machine learning modelTarget
property
How to do data science?
Goal: learn or quantify some relationship
9
How to do data science?
o What are good descriptors?
o Data visualization
A. Baldominos et al., Appl. Sci. 2018, 8(11), 2321
Housing Prices
10
How to do data science?
x1
x2 x
x
x
x
x
x
x
x
x
x
x
Supervised versus Unsupervised learning
11
How to do data science?
x1
x2 x
x
x
x
x
x
x
x
x
x
x
Supervised versus Unsupervised learning
12
How to do data science?
x1
x2
x1
x2x
x
x
x
x
x
x
x
x
x
x
Supervised versus Unsupervised learning
13
How to do data science?
x1
x2
x1
x2x
x
x
x
x
x
x
x
x
x
x
Supervised versus Unsupervised learning
14
age
Student? Check rating?yes
yesno yes no
young
middle-
aged
senior
no no yesyes
How to do data science?
Statistical models
15
Machine learning models: Regression
x
y
Goal: Build predictive
model
Training data
How to do data science?
𝑓(𝑥) = 𝑚𝑥 + 𝑐
16
ethen8181.github.io
How to do data science?
Model validation techniques
x
y
Goal: Build predictive
model
Training data
Test data
How to do data science?
𝑓(𝑥) = 𝑚𝑥 + 𝑐
Machine learning models: Regression
18
G.A. Landrum, H. Genin, Journal of Solid State Chemistry
176 (2003) 587–593
Ferromagnetism in
ordered binary
transition metal alloys
Machine learning models: Classification
How to do data science?
19
G.A. Landrum, H. Genin, Journal of Solid State Chemistry
176 (2003) 587–593
Ferromagnetism in
ordered binary
transition metal alloys
Machine learning models: Classification
How to do data science?
20
G.A. Landrum, H. Genin, Journal of Solid State Chemistry
176 (2003) 587–593
Ferromagnetism in
ordered binary
transition metal alloys
Machine learning models: Classification
How to do data science?
21
Overfitting
How to do data science?
22
Regularization
o y = f(x) + 𝛆
o Constraints on coefficients of a model
o LASSO
• Linear regression with constraints:
o Neural Networks
• Drop out
How to do data science?
23
Hidden layer
Output layer
How to do data science?
Neural Networks
24
Hidden layer Output layer
Andrew Ng, cs229.stanford.edu/notes/
Neural Networks
25
Neural Networks
Architectures
Perceptron
Feed Forward
NN
Deep
NN
Autoencoder Recurrent
NN
26
M. Mattheakis, P. Protopapas, D. Sondak, M. Di Giovanni, E. Kaxiras, arXiv:1904.08991
Neural Networks
Architectures that incorporate physical principles
27
A2B2X6 crystal structure
Transition metal trichalcogenides are magnetic 2D crystals
1. Sivadas et al., PRB 91 235425 (2015) 2. C. Gong et al., Nature 546, 265 (2017)
o CrGeTe3 is a
ferromagnet
(FM)1,2
o CrSiTe3 is a zigzag
antiferromagnet
(zigzag-AFM)1
Machine Learning for Materials studies
A Case Study: Magnetic 2D crystals
X = Te X = Se X = S
magneticmoment[𝜇B]
Magnetic moment of A2B2X6
T.D. Rhone et al.,
arxiv:1806.07989
Magnetic moment, X=Te
DFT 𝝁
Predicted𝝁
Training data
Test data
var(#
spin↑
)
var(# valence e’)
m
axdif(#
valence
e’)
m
ean(polariz.)
chem
.spaceBoB
m
ean(#spin↑
)
Top 6 descriptors
Machine learning predictions
T.D. Rhone et al.,
arxiv:1806.07989
Machine learning results
Magnetic moment Formation Energy
Who can do data science?
o Programmers (python)
o Data wranglers
o Computer scientists
o Statisticians
32
o Physicists!!!
Resources
Data
o Kaggle
o Google’s dataset search
o Citrine datasets
o Materials project
Self-learning
o Coursera
• Andrew Ng (Machine
learning)
o Trevor Hastie (Intro to
Statistical Learning)
o Citrine newsletter
Workshops
o IPAM @ UCLA
o IACS computeFest @ Harvard
Data science tools
o Jupyter notebook
o Scikit-learn
o TensorFlow
o Keras
o PyTorch
materials-intelligence.com 33
Resources
Outlook
o Growing interest in data science
o Google and Facebook seeking collaboration with physicists
o Create machine learning tools with physics principles
o AI for knowledge discovery
• Beyond ‘black box’ model predictions
• Use AI to understand physics
34
Data science resources
https://materials-intelligence.com/
For additional resources please visit:
35
1 of 35

Recommended

Learning to Compose Domain-Specific Transformations for Data Augmentation by
Learning to Compose Domain-Specific Transformations for Data AugmentationLearning to Compose Domain-Specific Transformations for Data Augmentation
Learning to Compose Domain-Specific Transformations for Data AugmentationTatsuya Shirakawa
785 views27 slides
Crystallization classification semisupervised by
Crystallization classification semisupervisedCrystallization classification semisupervised
Crystallization classification semisupervisedMadhav Sigdel
480 views28 slides
Icml2017 overview by
Icml2017 overviewIcml2017 overview
Icml2017 overviewTatsuya Shirakawa
1.4K views33 slides
FDL 2018 Virtual Briefing 1 by
FDL 2018 Virtual Briefing 1FDL 2018 Virtual Briefing 1
FDL 2018 Virtual Briefing 1Leonard Silverberg
195 views43 slides
Deep learning and Healthcare by
Deep learning and HealthcareDeep learning and Healthcare
Deep learning and HealthcareThomas da Silva Paula
4.5K views36 slides
Braintalk cuso nm by
Braintalk cuso nmBraintalk cuso nm
Braintalk cuso nmeXascale Infolab
1.6K views50 slides

More Related Content

Similar to APS GDS data science talk by Trevor Rhone

50 Years of Data Science by
50 Years of Data Science50 Years of Data Science
50 Years of Data ScienceNafiseh Navabpour
739 views27 slides
Learning Systems for Science by
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
454 views34 slides
Oxford_15-03-22.pptx by
Oxford_15-03-22.pptxOxford_15-03-22.pptx
Oxford_15-03-22.pptxSaigeRutherford
7 views33 slides
DevelopingDataScienceProfession by
DevelopingDataScienceProfessionDevelopingDataScienceProfession
DevelopingDataScienceProfessionGary Rector
440 views93 slides
Interactive Visualization Systems and Data Integration Methods for Supporting... by
Interactive Visualization Systems and Data Integration Methods for Supporting...Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...Don Pellegrino
637 views56 slides
Turning Learning into Numbers - A Learning Analytics Framework by
Turning Learning into Numbers - A Learning Analytics FrameworkTurning Learning into Numbers - A Learning Analytics Framework
Turning Learning into Numbers - A Learning Analytics FrameworkHendrik Drachsler
6.2K views48 slides

Similar to APS GDS data science talk by Trevor Rhone(20)

Learning Systems for Science by Ian Foster
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
Ian Foster454 views
DevelopingDataScienceProfession by Gary Rector
DevelopingDataScienceProfessionDevelopingDataScienceProfession
DevelopingDataScienceProfession
Gary Rector440 views
Interactive Visualization Systems and Data Integration Methods for Supporting... by Don Pellegrino
Interactive Visualization Systems and Data Integration Methods for Supporting...Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...
Don Pellegrino637 views
Turning Learning into Numbers - A Learning Analytics Framework by Hendrik Drachsler
Turning Learning into Numbers - A Learning Analytics FrameworkTurning Learning into Numbers - A Learning Analytics Framework
Turning Learning into Numbers - A Learning Analytics Framework
Hendrik Drachsler6.2K views
Ci2004-10.doc by butest
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
butest305 views
PFCC special lecture on materials informatics_nanotech2023 by Matlantis
PFCC special lecture on materials informatics_nanotech2023PFCC special lecture on materials informatics_nanotech2023
PFCC special lecture on materials informatics_nanotech2023
Matlantis 112 views
The Art and Science of Analyzing Software Data by CS, NcState
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software Data
CS, NcState4.2K views
Data Science definition by CarloLauro1
Data Science definitionData Science definition
Data Science definition
CarloLauro124 views
Let's talk about Data Science by Carlo Lauro
Let's talk about Data ScienceLet's talk about Data Science
Let's talk about Data Science
Carlo Lauro76 views
CLIR Fellows - Science Data - 14_0730 by jeffreylancaster
CLIR Fellows - Science Data - 14_0730CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730
jeffreylancaster1.2K views
Data Sets as Facilitator for new Products and Services for Universities by Hendrik Drachsler
Data Sets as Facilitator for new Products and Services for UniversitiesData Sets as Facilitator for new Products and Services for Universities
Data Sets as Facilitator for new Products and Services for Universities
Hendrik Drachsler639 views
Physics inspired artificial intelligence/machine learning by KAMAL CHOUDHARY
Physics inspired artificial intelligence/machine learningPhysics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learning
KAMAL CHOUDHARY1.9K views
2012: Natural Computing - The Grand Challenges and Two Case Studies by Leandro de Castro
2012: Natural Computing - The Grand Challenges and Two Case Studies2012: Natural Computing - The Grand Challenges and Two Case Studies
2012: Natural Computing - The Grand Challenges and Two Case Studies
Leandro de Castro1.5K views
Supervised Learning of Sparsity-Promoting Regularizers for Denoising by Mike McCann
Supervised Learning of Sparsity-Promoting Regularizers for DenoisingSupervised Learning of Sparsity-Promoting Regularizers for Denoising
Supervised Learning of Sparsity-Promoting Regularizers for Denoising
Mike McCann92 views

Recently uploaded

TF-FAIR.pdf by
TF-FAIR.pdfTF-FAIR.pdf
TF-FAIR.pdfDirk Roorda
6 views120 slides
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... by
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...ILRI
5 views6 slides
MILK LIPIDS 2.pptx by
MILK LIPIDS 2.pptxMILK LIPIDS 2.pptx
MILK LIPIDS 2.pptxabhinambroze18
7 views15 slides
Disinfectants & Antiseptic by
Disinfectants & AntisepticDisinfectants & Antiseptic
Disinfectants & AntisepticSanket P Shinde
10 views36 slides
A training, certification and marketing scheme for informal dairy vendors in ... by
A training, certification and marketing scheme for informal dairy vendors in ...A training, certification and marketing scheme for informal dairy vendors in ...
A training, certification and marketing scheme for informal dairy vendors in ...ILRI
13 views13 slides
1978 NASA News Release Log by
1978 NASA News Release Log1978 NASA News Release Log
1978 NASA News Release Logpurrterminator
9 views146 slides

Recently uploaded(20)

Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... by ILRI
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
ILRI5 views
A training, certification and marketing scheme for informal dairy vendors in ... by ILRI
A training, certification and marketing scheme for informal dairy vendors in ...A training, certification and marketing scheme for informal dairy vendors in ...
A training, certification and marketing scheme for informal dairy vendors in ...
ILRI13 views
CSF -SHEEBA.D presentation.pptx by SheebaD7
CSF -SHEEBA.D presentation.pptxCSF -SHEEBA.D presentation.pptx
CSF -SHEEBA.D presentation.pptx
SheebaD711 views
"How can I develop my learning path in bioinformatics? by Bioinformy
"How can I develop my learning path in bioinformatics?"How can I develop my learning path in bioinformatics?
"How can I develop my learning path in bioinformatics?
Bioinformy23 views
How to be(come) a successful PhD student by Tom Mens
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD student
Tom Mens473 views
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... by ILRI
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
ILRI5 views
Metatheoretical Panda-Samaneh Borji.pdf by samanehborji
Metatheoretical Panda-Samaneh Borji.pdfMetatheoretical Panda-Samaneh Borji.pdf
Metatheoretical Panda-Samaneh Borji.pdf
samanehborji16 views
Artificial Intelligence Helps in Drug Designing and Discovery.pptx by abhinashsahoo2001
Artificial Intelligence Helps in Drug Designing and Discovery.pptxArtificial Intelligence Helps in Drug Designing and Discovery.pptx
Artificial Intelligence Helps in Drug Designing and Discovery.pptx
abhinashsahoo2001126 views
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl... by GIFT KIISI NKIN
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...
GIFT KIISI NKIN22 views
Conventional and non-conventional methods for improvement of cucurbits.pptx by gandhi976
Conventional and non-conventional methods for improvement of cucurbits.pptxConventional and non-conventional methods for improvement of cucurbits.pptx
Conventional and non-conventional methods for improvement of cucurbits.pptx
gandhi97618 views
Pollination By Nagapradheesh.M.pptx by MNAGAPRADHEESH
Pollination By Nagapradheesh.M.pptxPollination By Nagapradheesh.M.pptx
Pollination By Nagapradheesh.M.pptx
MNAGAPRADHEESH16 views
Experimental animal Guinea pigs.pptx by Mansee Arya
Experimental animal Guinea pigs.pptxExperimental animal Guinea pigs.pptx
Experimental animal Guinea pigs.pptx
Mansee Arya15 views

APS GDS data science talk by Trevor Rhone

  • 1. A beginner’s guide to using data science for physicists Trevor David Rhone Department of Physics, Applied Physics and Astronomy, Rensselaer Polytechnic Institute 1
  • 2. I Keep Six Honest Serving-Men I keep six honest serving-men (They taught me all I knew); Their names are What and Why and When And How and Where and Who. - by Rudyard Kipling 2
  • 3. What is Data Science? Statistics Computer Science Digital Data Knowledge base 3
  • 4. What is Data Science? Data Science Machine Learning Statistics Visualization Databases Data mining AI 4
  • 5. Why do we care about data science? o Netflix movie recommendations o Materials Discovery • Experiments slow • Calculations expensive • No analytical solution o Uncover physical insights 5 o Targeted advertisements o Self driving cars
  • 6. o Thales of Miletus, Ancient Greece (624 BC – 546 BC) The When and Where of data science o Age of big data • Data are accessible • Data analytics tools are accessible 6 o Observational astronomy o Bioinformatics o Social media and targeted advertising
  • 7. The essential guide How to do data science? 1. Get data o Kaggle o Google dataset search 2. What are good descriptors? o Mathematical representation of the data o Domain knowledge 3. Data visualization 4. Model selection 5. Model validation 6. Model exploitation Photo: https://commons.wikimedia.org/7
  • 8. Data Science Ecosystem How to do data science? 8
  • 9. Data science ~ data visualization + machine learning y = f(x1, x2, …, xN) + 𝜺 Inputs of machine learning modelTarget property How to do data science? Goal: learn or quantify some relationship 9
  • 10. How to do data science? o What are good descriptors? o Data visualization A. Baldominos et al., Appl. Sci. 2018, 8(11), 2321 Housing Prices 10
  • 11. How to do data science? x1 x2 x x x x x x x x x x x Supervised versus Unsupervised learning 11
  • 12. How to do data science? x1 x2 x x x x x x x x x x x Supervised versus Unsupervised learning 12
  • 13. How to do data science? x1 x2 x1 x2x x x x x x x x x x x Supervised versus Unsupervised learning 13
  • 14. How to do data science? x1 x2 x1 x2x x x x x x x x x x x Supervised versus Unsupervised learning 14
  • 15. age Student? Check rating?yes yesno yes no young middle- aged senior no no yesyes How to do data science? Statistical models 15
  • 16. Machine learning models: Regression x y Goal: Build predictive model Training data How to do data science? 𝑓(𝑥) = 𝑚𝑥 + 𝑐 16
  • 17. ethen8181.github.io How to do data science? Model validation techniques
  • 18. x y Goal: Build predictive model Training data Test data How to do data science? 𝑓(𝑥) = 𝑚𝑥 + 𝑐 Machine learning models: Regression 18
  • 19. G.A. Landrum, H. Genin, Journal of Solid State Chemistry 176 (2003) 587–593 Ferromagnetism in ordered binary transition metal alloys Machine learning models: Classification How to do data science? 19
  • 20. G.A. Landrum, H. Genin, Journal of Solid State Chemistry 176 (2003) 587–593 Ferromagnetism in ordered binary transition metal alloys Machine learning models: Classification How to do data science? 20
  • 21. G.A. Landrum, H. Genin, Journal of Solid State Chemistry 176 (2003) 587–593 Ferromagnetism in ordered binary transition metal alloys Machine learning models: Classification How to do data science? 21
  • 22. Overfitting How to do data science? 22
  • 23. Regularization o y = f(x) + 𝛆 o Constraints on coefficients of a model o LASSO • Linear regression with constraints: o Neural Networks • Drop out How to do data science? 23
  • 24. Hidden layer Output layer How to do data science? Neural Networks 24
  • 25. Hidden layer Output layer Andrew Ng, cs229.stanford.edu/notes/ Neural Networks 25
  • 27. M. Mattheakis, P. Protopapas, D. Sondak, M. Di Giovanni, E. Kaxiras, arXiv:1904.08991 Neural Networks Architectures that incorporate physical principles 27
  • 28. A2B2X6 crystal structure Transition metal trichalcogenides are magnetic 2D crystals 1. Sivadas et al., PRB 91 235425 (2015) 2. C. Gong et al., Nature 546, 265 (2017) o CrGeTe3 is a ferromagnet (FM)1,2 o CrSiTe3 is a zigzag antiferromagnet (zigzag-AFM)1 Machine Learning for Materials studies A Case Study: Magnetic 2D crystals
  • 29. X = Te X = Se X = S magneticmoment[𝜇B] Magnetic moment of A2B2X6 T.D. Rhone et al., arxiv:1806.07989
  • 30. Magnetic moment, X=Te DFT 𝝁 Predicted𝝁 Training data Test data var(# spin↑ ) var(# valence e’) m axdif(# valence e’) m ean(polariz.) chem .spaceBoB m ean(#spin↑ ) Top 6 descriptors Machine learning predictions T.D. Rhone et al., arxiv:1806.07989
  • 31. Machine learning results Magnetic moment Formation Energy
  • 32. Who can do data science? o Programmers (python) o Data wranglers o Computer scientists o Statisticians 32 o Physicists!!!
  • 33. Resources Data o Kaggle o Google’s dataset search o Citrine datasets o Materials project Self-learning o Coursera • Andrew Ng (Machine learning) o Trevor Hastie (Intro to Statistical Learning) o Citrine newsletter Workshops o IPAM @ UCLA o IACS computeFest @ Harvard Data science tools o Jupyter notebook o Scikit-learn o TensorFlow o Keras o PyTorch materials-intelligence.com 33 Resources
  • 34. Outlook o Growing interest in data science o Google and Facebook seeking collaboration with physicists o Create machine learning tools with physics principles o AI for knowledge discovery • Beyond ‘black box’ model predictions • Use AI to understand physics 34
  • 35. Data science resources https://materials-intelligence.com/ For additional resources please visit: 35