Numerical Relativity as preparation for
Industrial Data Science:
a personal perspective
Ken Smith, CIO/CTO
APS April Meeti...
Who am I?
What is data science?
Why is it a viable (maybe even desirable)
career option for physicists?
How do you get sta...
Who am I?
2002 2004 2006 2008 2010 2012 2014
grad student
lecturer
sr. scientist CIO
sr. scientist
architect
physics
educa...
Selected projects
• Automatically categorizing text documents into
topics based solely on content
• Improving entity (pers...
WHAT IS DATA SCIENCE?
SKILLS
TRENDS
ACTIVITIES
5
―I keep saying the sexy job in the
next ten years will be statisticians.
People think I’m joking, but who
would’ve guessed...
Data Science Skills & Disciplines
7
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Data Science post-Prism
8http://joelgrus.com/2013/06/09/post-prism-data-science-venn-diagram
Trends: Data Storage
IBM 350 in 1956:
3.75 MB
6.4 kB/s data transfer
(50) 24-in diameter disk
platters
> 1 ton
Leased for ...
Trends: Data Storage
10http://www.mkomo.com/cost-per-gigabyte-update
Trends: Open Source
Software
11https://github.com/blog/1724-10-million-repositories
Trends: Quantized Self
The 2012 Feltron Report
12
http://feltron.com/ar12_02.html
Trends: Quantized Self
The 2012 Feltron Report
13http://feltron.com/ar12_02.html
Trends: Quantized Self &
Ubiquitous Sensors
14
Trends: Digital Exhaust
15
Father walks into a Minneapolis
Target store: ―My daughter got
this in the mail!‖ he said. ―She’s
still in high school, an...
―What differentiates data
science from statistics is that
data science is a holistic
approach. We’re increasingly
finding ...
What does a data scientist
do?
18
http://strata.oreilly.com/2013/09/data-analysis-just-one-component-of-the-data-science-w...
WHY IS DATA SCIENCE VIABLE
FOR PHYSICISTS?
19
―People often assume that data scientists need a
background in computer science. In my experience, that
hasn’t been the ca...
Insight Data Science Fellows
21
http://insightdatascience.com/
An intensive six week post-doctoral training
fellowship bri...
Projected Data Science Demand
22
https://www.mckinsey.com/~/media/McKinsey/dotcom/Insights%20and%20pubs/MGI/Research/Techn...
Recent NSF data on
employment at PhD award
23
http://www.nsf.gov/statistics/sed/digest/2012/
AIP Physics Career Statistics
24
http://aip.org/statistics/data-graphics/physics-phds-starting-salaries-classes-2009-2010
...
What you have:
• Analytical/problem-
solving mindset
• Presentation skills (oral,
written, & graphical)
• Mathematical pre...
Introduce statistical analysis
techniques into graduate (possibly
undergraduate) core physics
curriculum.
Make computer sc...
HOW DO YOU GET STARTED?
27
28
http://nirvacana.com/thoughts/becoming-a-data-scientist/
• Insight Data Science Fellows Program
http://insightdatascience.com/
• Coursera: Stanford Machine Learning
https://www.co...
Learn and compete!
“Kaggle is the world's largest
community of data
scientists. They compete
with each other to solve
comp...
Twitter: @Ken_2scientists
http://www.atsid.com
http://slidesha.re/1idf43d
Thanks!
31
Image Sources
32
Slide Source
7 http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
8 http://joelgrus.com/20...
Image Sources
33
Slide Source
22 https://www.mckinsey.com/~/media/McKinsey/dotcom/Insights%20and%20pu
bs/MGI/Research/Tech...
Upcoming SlideShare
Loading in …5
×

Numerical Relativity as preparation for Industrial Data Science: a personal perspective

422 views
358 views

Published on

Invited talk presented by Applied Technical Systems' CIO/CTO Ken Smith at the 2014 American Physical Society's April Meeting in Savannah, GA

Abstract:
Much of the conversation in commercial enterprises these days revolves around industry buzz words such as Big Data, Data Science, and being Data Driven. Beyond the hype surrounding these terms, there is a real, continuously growing movement for organizations to make better use of the data assets they have to inform decisions, strategy, and policy. This push is not unique to the commercial sector; governmental and academic organizations are also embracing such initiatives. The skills required to staff a Data Science project typically come from a number of disciplines, ranging from computer science, statistics, modeling and simulation, to information technology, but the emerging wisdom in the community is that the rigor and discipline of a scientific background often makes for the best data scientists. In this talk, I will offer a personal perspective on making the transition from a career in computational physics (specifically Numerical Relativity) to a career in industry, where I have focused on helping organizations make more informed decisions through better access and analysis of data at their disposal. I will identify the skills and training that carry over from a background in physics, discuss the gaps in that preparation, hypothesize as to where this industry is headed, and offer a frank look at a life outside of academia.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
422
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Numerical Relativity as preparation for Industrial Data Science: a personal perspective

  1. 1. Numerical Relativity as preparation for Industrial Data Science: a personal perspective Ken Smith, CIO/CTO APS April Meeting, 2014-04-06
  2. 2. Who am I? What is data science? Why is it a viable (maybe even desirable) career option for physicists? How do you get started? Overview Note: all image attributions will appear at the end of the slide deck. 2
  3. 3. Who am I? 2002 2004 2006 2008 2010 2012 2014 grad student lecturer sr. scientist CIO sr. scientist architect physics educationnumerical relativity / astrophysics machine learning natural language processing software architecture 3
  4. 4. Selected projects • Automatically categorizing text documents into topics based solely on content • Improving entity (person, location, organization) extraction techniques for large bodies of text within the US Army • Developing new tools for US Patent Examiners within the USPTO • Modeling and linking disparate datasets associated with supply & maintenance of US Navy systems • Designing systems to organize and visualize skills mix of employees within a company 4
  5. 5. WHAT IS DATA SCIENCE? SKILLS TRENDS ACTIVITIES 5
  6. 6. ―I keep saying the sexy job in the next ten years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s? The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades‖ Hal Varian, Chief Economist, Google January 2009 The sexiest job? http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1 http://www.mckinsey.com/insights/innovation/hal_varian_on_how_the_web_challenges_managers 6
  7. 7. Data Science Skills & Disciplines 7 http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
  8. 8. Data Science post-Prism 8http://joelgrus.com/2013/06/09/post-prism-data-science-venn-diagram
  9. 9. Trends: Data Storage IBM 350 in 1956: 3.75 MB 6.4 kB/s data transfer (50) 24-in diameter disk platters > 1 ton Leased for $3200/mo 9 http://old-photos.blogspot.com/2011/06/hard-drive.html
  10. 10. Trends: Data Storage 10http://www.mkomo.com/cost-per-gigabyte-update
  11. 11. Trends: Open Source Software 11https://github.com/blog/1724-10-million-repositories
  12. 12. Trends: Quantized Self The 2012 Feltron Report 12 http://feltron.com/ar12_02.html
  13. 13. Trends: Quantized Self The 2012 Feltron Report 13http://feltron.com/ar12_02.html
  14. 14. Trends: Quantized Self & Ubiquitous Sensors 14
  15. 15. Trends: Digital Exhaust 15
  16. 16. Father walks into a Minneapolis Target store: ―My daughter got this in the mail!‖ he said. ―She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?‖ Manager apologizes and calls back a few days later to apologize again ―I had a talk with my daughter,‖ he said. ―It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August. I owe you an apology.‖ Data mining determined a set of signals that a pregnant shopper may be getting near to her due date: • larger quantities of unscented lotion • supplements like calcium, magnesium and zinc. • scent-free soap and • extra-big bags of cotton balls • hand sanitizers • washcloths Trends: Targeted Marketing http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html 16
  17. 17. ―What differentiates data science from statistics is that data science is a holistic approach. We’re increasingly finding data in the wild, and data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.‖ What data scientists do 17 http://www.oreilly.com/data/free/what-is-data-science.csp
  18. 18. What does a data scientist do? 18 http://strata.oreilly.com/2013/09/data-analysis-just-one-component-of-the-data-science-workflow.html
  19. 19. WHY IS DATA SCIENCE VIABLE FOR PHYSICISTS? 19
  20. 20. ―People often assume that data scientists need a background in computer science. In my experience, that hasn’t been the case: my best data scientists have come from very different backgrounds. The inventor of LinkedIn’s People You May Know was an experimental physicist. A computational chemist on my decision sciences team had solved a 100-year-old problem on energy states of water. An oceanographer made major impacts on the way we identify fraud. Perhaps most surprising was the neurosurgeon who turned out to be a wizard at identifying rich underlying trends in the data.‖ DJ Patil, former Chief Scientist for LinkedIn Where do data scientists come from? http://radar.oreilly.com/2011/09/building-data-science-teams.html 20
  21. 21. Insight Data Science Fellows 21 http://insightdatascience.com/ An intensive six week post-doctoral training fellowship bridging the gap between academia and data science
  22. 22. Projected Data Science Demand 22 https://www.mckinsey.com/~/media/McKinsey/dotcom/Insights%20and%20pubs/MGI/Research/Technology %20and%20Innovation/Big%20Data/MGI_big_data_exec_summary.ashx
  23. 23. Recent NSF data on employment at PhD award 23 http://www.nsf.gov/statistics/sed/digest/2012/
  24. 24. AIP Physics Career Statistics 24 http://aip.org/statistics/data-graphics/physics-phds-starting-salaries-classes-2009-2010 http://aip.org/statistics/physics-trends/physics-phds-1-year-later
  25. 25. What you have: • Analytical/problem- solving mindset • Presentation skills (oral, written, & graphical) • Mathematical preparation • Curiosity • Understanding that reference frames can only ever be local What you are missing: • Sufficient training in statistics – Regression beyond linear – Classification techniques – Machine learning • SQL (Database) • Information Visualization (psychology of design) • Business/Finance acumen Physics prep for Data Science Warning: gross generalizations 25
  26. 26. Introduce statistical analysis techniques into graduate (possibly undergraduate) core physics curriculum. Make computer science courses available in high school. The ability to program is becoming a foundational skill along with reading, writing, and arithmetic. Curriculum Recommendations 26 http://www.amazon.com/Mathematical-Methods-Physicists-Fourth-Edition/dp/0120598159 http://csedweek.org/promote
  27. 27. HOW DO YOU GET STARTED? 27
  28. 28. 28 http://nirvacana.com/thoughts/becoming-a-data-scientist/
  29. 29. • Insight Data Science Fellows Program http://insightdatascience.com/ • Coursera: Stanford Machine Learning https://www.coursera.org/course/ml • Coursera: U. Washington Intro to Data Science https://www.coursera.org/course/datasci • Coursera: Princeton Algorithms Part I https://www.coursera.org/course/algs4partI • General Assembly Data Science https://generalassemb.ly/education/data-science Resources available 29
  30. 30. Learn and compete! “Kaggle is the world's largest community of data scientists. They compete with each other to solve complex data science problems, and the top competitors are invited to work on the most interesting and sensitive business problems from some of the world’s biggest companies through Masters competitions.” www.kaggle.com/about 30
  31. 31. Twitter: @Ken_2scientists http://www.atsid.com http://slidesha.re/1idf43d Thanks! 31
  32. 32. Image Sources 32 Slide Source 7 http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram 8 http://joelgrus.com/2013/06/09/post-prism-data-science-venn-diagram 9 http://old-photos.blogspot.com/2011/06/hard-drive.html 10 http://www.mkomo.com/cost-per-gigabyte-update 11 https://github.com/blog/1724-10-million-repositories 12,13 http://feltron.com/ar12_02.html 14 http://www.fitbit.com 15 https://chrome.google.com/webstore/detail/collusion-for- chrome/ganlifbpkcplnldliibcbegplfmcfigp 18 http://strata.oreilly.com/2013/09/data-analysis-just-one-component-of-the- data-science-workflow.html 21 http://insightdatascience.com/
  33. 33. Image Sources 33 Slide Source 22 https://www.mckinsey.com/~/media/McKinsey/dotcom/Insights%20and%20pu bs/MGI/Research/Technology%20and%20Innovation/Big%20Data/MGI_big_dat a_exec_summary.ashx 23 http://www.nsf.gov/statistics/sed/digest/2012/ 24 http://aip.org/statistics/data-graphics/physics-phds-starting-salaries-classes- 2009-2010 http://aip.org/statistics/physics-trends/physics-phds-1-year-later 26 http://www.amazon.com/Mathematical-Methods-Physicists-Fourth- Edition/dp/0120598159 http://csedweek.org/promote 28 http://nirvacana.com/thoughts/becoming-a-data-scientist/ 30 http://www.kaggle.com/competitions

×