1. A beginner’s guide to using
data science for physicists
Trevor David Rhone
Department of Physics, Applied Physics and Astronomy, Rensselaer Polytechnic Institute
1
2. I Keep Six Honest Serving-Men
I keep six honest serving-men
(They taught me all I knew);
Their names are What and Why and When
And How and Where and Who.
- by Rudyard Kipling
2
3. What is Data Science?
Statistics
Computer
Science
Digital
Data
Knowledge
base
3
4. What is Data Science?
Data Science
Machine
Learning
Statistics
Visualization
Databases
Data mining AI
4
5. Why do we care about data science?
o Netflix movie recommendations
o Materials Discovery
• Experiments slow
• Calculations expensive
• No analytical solution
o Uncover physical insights
5
o Targeted advertisements
o Self driving cars
6. o Thales of Miletus, Ancient Greece
(624 BC – 546 BC)
The When and Where of data science
o Age of big data
• Data are accessible
• Data analytics tools are accessible
6
o Observational astronomy
o Bioinformatics
o Social media and targeted advertising
7. The essential guide
How to do data science?
1. Get data
o Kaggle
o Google dataset search
2. What are good descriptors?
o Mathematical representation of
the data
o Domain knowledge
3. Data visualization
4. Model selection
5. Model validation
6. Model exploitation
Photo: https://commons.wikimedia.org/7
9. Data science ~ data visualization + machine learning
y = f(x1, x2, …, xN) + 𝜺
Inputs of machine learning modelTarget
property
How to do data science?
Goal: learn or quantify some relationship
9
10. How to do data science?
o What are good descriptors?
o Data visualization
A. Baldominos et al., Appl. Sci. 2018, 8(11), 2321
Housing Prices
10
11. How to do data science?
x1
x2 x
x
x
x
x
x
x
x
x
x
x
Supervised versus Unsupervised learning
11
12. How to do data science?
x1
x2 x
x
x
x
x
x
x
x
x
x
x
Supervised versus Unsupervised learning
12
13. How to do data science?
x1
x2
x1
x2x
x
x
x
x
x
x
x
x
x
x
Supervised versus Unsupervised learning
13
14. How to do data science?
x1
x2
x1
x2x
x
x
x
x
x
x
x
x
x
x
Supervised versus Unsupervised learning
14
19. G.A. Landrum, H. Genin, Journal of Solid State Chemistry
176 (2003) 587–593
Ferromagnetism in
ordered binary
transition metal alloys
Machine learning models: Classification
How to do data science?
19
20. G.A. Landrum, H. Genin, Journal of Solid State Chemistry
176 (2003) 587–593
Ferromagnetism in
ordered binary
transition metal alloys
Machine learning models: Classification
How to do data science?
20
21. G.A. Landrum, H. Genin, Journal of Solid State Chemistry
176 (2003) 587–593
Ferromagnetism in
ordered binary
transition metal alloys
Machine learning models: Classification
How to do data science?
21
23. Regularization
o y = f(x) + 𝛆
o Constraints on coefficients of a model
o LASSO
• Linear regression with constraints:
o Neural Networks
• Drop out
How to do data science?
23
27. M. Mattheakis, P. Protopapas, D. Sondak, M. Di Giovanni, E. Kaxiras, arXiv:1904.08991
Neural Networks
Architectures that incorporate physical principles
27
28. A2B2X6 crystal structure
Transition metal trichalcogenides are magnetic 2D crystals
1. Sivadas et al., PRB 91 235425 (2015) 2. C. Gong et al., Nature 546, 265 (2017)
o CrGeTe3 is a
ferromagnet
(FM)1,2
o CrSiTe3 is a zigzag
antiferromagnet
(zigzag-AFM)1
Machine Learning for Materials studies
A Case Study: Magnetic 2D crystals
29. X = Te X = Se X = S
magneticmoment[𝜇B]
Magnetic moment of A2B2X6
T.D. Rhone et al.,
arxiv:1806.07989
30. Magnetic moment, X=Te
DFT 𝝁
Predicted𝝁
Training data
Test data
var(#
spin↑
)
var(# valence e’)
m
axdif(#
valence
e’)
m
ean(polariz.)
chem
.spaceBoB
m
ean(#spin↑
)
Top 6 descriptors
Machine learning predictions
T.D. Rhone et al.,
arxiv:1806.07989
32. Who can do data science?
o Programmers (python)
o Data wranglers
o Computer scientists
o Statisticians
32
o Physicists!!!
33. Resources
Data
o Kaggle
o Google’s dataset search
o Citrine datasets
o Materials project
Self-learning
o Coursera
• Andrew Ng (Machine
learning)
o Trevor Hastie (Intro to
Statistical Learning)
o Citrine newsletter
Workshops
o IPAM @ UCLA
o IACS computeFest @ Harvard
Data science tools
o Jupyter notebook
o Scikit-learn
o TensorFlow
o Keras
o PyTorch
materials-intelligence.com 33
Resources
34. Outlook
o Growing interest in data science
o Google and Facebook seeking collaboration with physicists
o Create machine learning tools with physics principles
o AI for knowledge discovery
• Beyond ‘black box’ model predictions
• Use AI to understand physics
34