Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014

  • 398 views
Uploaded on

Modern Data Science is enabling NASA's engineers uncover actionable information from our "dark" data coffers. From starting small to operating at scale, Rob will discuss applications in telemetry, …

Modern Data Science is enabling NASA's engineers uncover actionable information from our "dark" data coffers. From starting small to operating at scale, Rob will discuss applications in telemetry, workforce analytics and liberating data from the Mars Rovers. Tools include iPython, Pandas, Boto and more.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
398
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
7
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Dark Data A Data Scientist’s Exploration of the Unknown Rob Witoff ! Data Scientist IT CTO Office @rwitoff Jet Propulsion Laboratory California Institute of Technology
  • 2. NASA Explores our Universe
  • 3. Exploration brings home data …a lot of it
  • 4. This talk is about how we’re enabling _ to uncover and act on our Dark Data
  • 5. The Situation
  • 6. Goldstone, USA Madrid, Spain Canberra, Australia
  • 7. Amazon SWF Decider Data Processing Tasks File Transfer Tasks Decision Tasks Data Processing Workers EC2 EC2 EC2 EC2 S3 JPL Data Center Decider File Transfer Workers Data Processing Workers Polyphony Cloud Computing
  • 8. ! ! ! ! !
  • 9. 1967
  • 10. Today
  • 11. 209 Data Sources documented ^
  • 12. Dark Matter ! ! ! 84.5% of the known Universe
  • 13. Dark Data ?? % of our universe
  • 14. Dark Data is the data all around us, that we can’t use today.can’t use
  • 15. findaccess interpret understand share re-interface process store question experiment can’t use
  • 16. can’t imagine
  • 17. “If this species is to survive indefinitely we need to become a multi-planet species. We need to go to Mars, and Mars is a stepping stone to other solar systems.” ! NASA Administrator, Charlie Bolden
  • 18. Successful LandingCuriosity:
  • 19. Mars 2020
  • 20. Mars Sample Return
  • 21. And Beyond!
  • 22. Towards a solution
  • 23. “After exploring the universe for decades, what unfound discoveries lie in our data?”
  • 24. The Data Scientist
  • 25. The Data Scientist Hypothesize Experiment Explore
  • 26. http://www.cgtrader.com/3d-models/character-people/man/nasa-astronaut-apollo Tools Software & Libs World Watching Open Source Gloves Interpreted Languages Camera Data Viz Mission Control Active Communities Vehicles Scalable Storage & Compute
  • 27. one Liberate Dark Data two Enable Engineers
  • 28. Liberate
  • 29. JPL Data Center Decider File Transfer Workers Data Processing Workers Polyphony Amazon SWF Decider Data Processing Tasks File Transfer Tasks Decision Tasks Data Processing Workers EC2 EC2 EC2 EC2 S3 Cloud Computing
  • 30. http://json.jpl.nasa.gov
  • 31. Closed Silos
  • 32. How Do you Liberate New Data?
  • 33. Greedy Solution: Explore For Data Wise Solution: Explore For Problems Best Solution: Explore For Questions
  • 34. Greedy Solution: Explore For Data Find APIs!
  • 35. Greedy Solution: Explore For Data Explore Find Get Excited! Find APIs! miss the point
  • 36. Greedy Solution: Explore For Data Explore :-( Data is a means, not the end Find
  • 37. Wise Solution: Explore For Problems Explore Integrate Increment! Incremental Successes!
  • 38. Wise Solution: Explore For Problems Explore Integrate :-/ Incremental Expectations.
  • 39. Best Solution: Explore For Questions Explore Rapid Prototype Like it’s 2014 Reset Expectations
  • 40. Hacker News Heatmaps! Hum anity!
  • 41. DataTau ML! Optimization! Hum anL! Lessons! Hum anL! Markov Chains! NoSQL! Sports!
  • 42. Reddit!
  • 43. Know What’s Out There.
  • 44. Solve our Problems Together.
  • 45. Best Solution: Explore For Questions
  • 46. Data What can we do with this brain data?
  • 47. Data Problem How healthy is my lobe?
  • 48. Data Problem Question What if we could see the brain evolve?
  • 49. Simple Sankey Flow Diagram
  • 50. Added more data.
  • 51. Added dimensions.
  • 52. Data ProblemIdea What about the rest of our brain data?
  • 53. what about this? Or this?
  • 54. what about this? Or this? ?
  • 55. People ?Projects
  • 56. People Projects ? Investments Projects People Engineering Science ?Exploration
  • 57. ? Investments Projects People Engineering Science Exploration Calendars Degrees Orgs HR Helpdesk Finance Sentiment? Resumes ecruiting ? Confer
  • 58. ?Engineering Science Exploration Calendars Degrees HR Helpdesk Finance Sentiment? Resumes Recruiting ? Confer Investments Projects People Orgs
  • 59. ? Investments Helpdesk Finance Sentiment? Resumes ecruiting People Engineering Science Exploration Degrees Confer HR Projects Orgs Calendars
  • 60. ? Investments Helpdesk Finance Sentiment? Resumes ecruiting People Engineering Science Exploration Degrees Confer HR Projects Orgs Calendars Connect Your Dots
  • 61. Enable
  • 62. Expertise ! ! ! Data Expertise Data
  • 63. Expertise Data
  • 64. Python+ REPL Remote Browser
  • 65. AWESOME + Python REPL Remote Browser
  • 66. https://github.com/ipython/nbviewer
  • 67. https://github.com/ipython/nbviewer
  • 68. “Human Problems Won’t be Solved by Root Mean Square Error” -Drew Conway
  • 69. Engage
  • 70. Through Visualization
  • 71. Engage before you Answer
  • 72. http://xkcd.com/1356/
  • 73. Pandas! Vincent Vega D3
  • 74. Outside the Notebook
  • 75. 12k Interesting Files 12k Documents Dynamo Results ReST
  • 76. Making a Difference
  • 77. Amazon SWF Decider Data Processing Tasks File Transfer Tasks Decision Tasks Data Processing Workers EC2 EC2 EC2 EC2 S3 JPL Data Center Decider File Transfer Workers Data Processing Workers Polyphony
  • 78. Q What if we could interact with ALL of our Data?
  • 79. QWhat if we were even closer to our data?
  • 80. http://json.jpl.nasa.gov
  • 81. Liberate your Dark Data Enable your Engineers Let’s Grow Data Science Together
  • 82. Thank you! @rwitoff