Data Analysis, Statistics, Machine Learning
Leland Wilkinson
Adjunct Professor
UIC Computer Science
Chief Scienst
H2O.ai
leland.wilkinson@gmail.com
2
Data Analysis
o What is data analysis?
o Summaries of batches of data
o Methods for discovering paerns in data
o Methods for visualizing data
o Benefits
o Data analysis helps us support supposions
o Data analysis helps us discredit false explanaons
o Data analysis helps us generate new ideas to invesgate
http://blog.martinbellander.com/post/115411125748/the-colors-of-paintings-blue-is-the-new-orange
Copyright © 2016 Leland Wilkinson
3
Stascs
o What is (are) stascs?
o Summaries of samples from populaons
o Methods for analyzing samples
o Making inferences based on samples
o Benefits
o Stascs help us avoid false conclusions when evaluang evidence
o Stascs protect us from being fooled by randomness
o Stascs help us find paerns in nonrandom events
o Stascs quanfy risk
o Stascs counteract ingrained bias in human judgment
o Stascal models are understandable by humans
http://www.bmj.com/content/342/bmj.d671
Copyright © 2016 Leland Wilkinson
4
Machine Learning
o What is machine learning?
o Data mining systems
o Discover paerns in data
o Learning systems
o Adapt models over me
o Benefits
o ML helps to predict outcomes
o ML oen outperforms tradional stascal predicon methods
o ML models do not need to be understood by humans
o Most ML results are unintelligible (the excepons prove the rule)
o ML people care about the quality of a predicon, not the meaning of the result
o ML is hot (Deep Learning!, Big Data!)
http://swift.cmbi.ru.nl/teach/B2/bioinf_24.html
Copyright © 2016 Leland Wilkinson
5
Course Outline
1. Introducon
2. Data
3. Visualizing
4. Exploring
5. Summarizing
6. Distribuons
7. Inference
8. Predicng
9. Smoothing
10. Time Series
11. Comparing
12. Reducing
13. Grouping
14. Learning
15. Anomalies
16. Analyzing
Copyright © 2016 Leland Wilkinson

data science introduction sGDADGSAsghja.pdf

  • 1.
    Data Analysis, Statistics,Machine Learning Leland Wilkinson Adjunct Professor UIC Computer Science Chief Scienst H2O.ai leland.wilkinson@gmail.com
  • 2.
    2 Data Analysis o Whatis data analysis? o Summaries of batches of data o Methods for discovering paerns in data o Methods for visualizing data o Benefits o Data analysis helps us support supposions o Data analysis helps us discredit false explanaons o Data analysis helps us generate new ideas to invesgate http://blog.martinbellander.com/post/115411125748/the-colors-of-paintings-blue-is-the-new-orange Copyright © 2016 Leland Wilkinson
  • 3.
    3 Stascs o What is(are) stascs? o Summaries of samples from populaons o Methods for analyzing samples o Making inferences based on samples o Benefits o Stascs help us avoid false conclusions when evaluang evidence o Stascs protect us from being fooled by randomness o Stascs help us find paerns in nonrandom events o Stascs quanfy risk o Stascs counteract ingrained bias in human judgment o Stascal models are understandable by humans http://www.bmj.com/content/342/bmj.d671 Copyright © 2016 Leland Wilkinson
  • 4.
    4 Machine Learning o Whatis machine learning? o Data mining systems o Discover paerns in data o Learning systems o Adapt models over me o Benefits o ML helps to predict outcomes o ML oen outperforms tradional stascal predicon methods o ML models do not need to be understood by humans o Most ML results are unintelligible (the excepons prove the rule) o ML people care about the quality of a predicon, not the meaning of the result o ML is hot (Deep Learning!, Big Data!) http://swift.cmbi.ru.nl/teach/B2/bioinf_24.html Copyright © 2016 Leland Wilkinson
  • 5.
    5 Course Outline 1. Introducon 2.Data 3. Visualizing 4. Exploring 5. Summarizing 6. Distribuons 7. Inference 8. Predicng 9. Smoothing 10. Time Series 11. Comparing 12. Reducing 13. Grouping 14. Learning 15. Anomalies 16. Analyzing Copyright © 2016 Leland Wilkinson