Dark Data
A Data Scientist’s
Exploration
of the Unknown
Rob Witoff
!
Data Scientist
IT CTO Office
@rwitoff
Jet Propulsion ...
NASA Explores
our Universe
Exploration
brings home
data …a lot of it
This talk is about how
we’re enabling _ to
uncover and act on our
Dark Data
The Situation
Goldstone, USA Madrid, Spain Canberra, Australia
Amazon SWF
Decider
Data Processing Tasks
File Transfer Tasks
Decision Tasks
Data Processing Workers
EC2 EC2 EC2 EC2 S3
JPL...
!
!
!
!
!
1967
Today
209 Data Sources
documented
^
Dark Matter
!
!
!
84.5%
of the known Universe
Dark Data
?? %
of our universe
Dark Data is the data
all around us, that we
can’t use today.can’t use
findaccess interpret
understand
share
re-interface
process
store
question experiment
can’t use
can’t imagine
“If this species is to survive indefinitely we need to
become a multi-planet species. We need to go to Mars,
and Mars is a ...
Successful LandingCuriosity:
Mars 2020
Mars
Sample
Return
And
Beyond!
Towards
a solution
“After exploring the universe
for decades, what unfound
discoveries lie in our data?”
The Data Scientist
The Data Scientist
Hypothesize
Experiment
Explore
http://www.cgtrader.com/3d-models/character-people/man/nasa-astronaut-apollo
Tools
Software & Libs
World Watching
Open Sou...
one
Liberate
Dark Data
two
Enable
Engineers
Liberate
JPL Data Center
Decider
File Transfer
Workers
Data
Processing
Workers
Polyphony
Amazon SWF
Decider
Data Processing Tasks
F...
http://json.jpl.nasa.gov
Closed Silos
How Do you
Liberate New
Data?
Greedy Solution: Explore For Data
Wise Solution: Explore For Problems
Best Solution: Explore For Questions
Greedy Solution: Explore For Data
Find APIs!
Greedy Solution: Explore For Data
Explore Find
Get Excited!
Find APIs!
miss the point
Greedy Solution: Explore For Data
Explore
:-(
Data is a means, not the end
Find
Wise Solution: Explore For Problems
Explore Integrate
Increment!
Incremental Successes!
Wise Solution: Explore For Problems
Explore Integrate
:-/
Incremental Expectations.
Best Solution: Explore For Questions
Explore
Rapid Prototype
Like it’s
2014
Reset Expectations
Hacker News
Heatmaps!
Hum
anity!
DataTau
ML!
Optimization!
Hum
anL!
Lessons!
Hum
anL!
Markov Chains!
NoSQL!
Sports!
Reddit!
Know What’s Out
There.
Solve our Problems
Together.
Best Solution: Explore For Questions
Data What can we do with this brain data?
Data
Problem How healthy is my lobe?
Data
Problem
Question What if we could see the brain evolve?
Simple Sankey Flow Diagram
Added more data.
Added dimensions.
Data
ProblemIdea What about the rest of our brain data?
what about this?
Or this?
what about this?
Or this?
?
People
?Projects
People
Projects
?
Investments
Projects
People
Engineering
Science
?Exploration
?
Investments
Projects
People
Engineering
Science Exploration
Calendars
Degrees
Orgs
HR
Helpdesk
Finance
Sentiment?
Resume...
?Engineering
Science Exploration
Calendars
Degrees
HR
Helpdesk
Finance
Sentiment?
Resumes
Recruiting
?
Confer
Investments
...
?
Investments
Helpdesk
Finance
Sentiment?
Resumes
ecruiting
People
Engineering
Science Exploration Degrees
Confer
HR
Proje...
?
Investments
Helpdesk
Finance
Sentiment?
Resumes
ecruiting
People
Engineering
Science Exploration Degrees
Confer
HR
Proje...
Enable
Expertise
!
!
!
Data
Expertise Data
Expertise Data
Python+
REPL
Remote
Browser
AWESOME
+ Python
REPL
Remote
Browser
https://github.com/ipython/nbviewer
https://github.com/ipython/nbviewer
“Human Problems Won’t be Solved
by Root Mean Square Error”
-Drew Conway
Engage
Through
Visualization
Engage before you Answer
http://xkcd.com/1356/
Pandas!
Vincent
Vega
D3
Outside the Notebook
12k Interesting Files
12k Documents
Dynamo Results
ReST
Making a
Difference
Amazon SWF
Decider
Data Processing Tasks
File Transfer Tasks
Decision Tasks
Data Processing Workers
EC2 EC2 EC2 EC2 S3
JPL...
Q What if we could
interact with ALL of
our Data?
QWhat if we were even
closer to our data?
http://json.jpl.nasa.gov
Liberate your Dark Data
Enable your Engineers
Let’s Grow Data Science Together
Thank you!
@rwitoff
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014
Upcoming SlideShare
Loading in …5
×

Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014

1,007 views
879 views

Published on

Modern Data Science is enabling NASA's engineers uncover actionable information from our "dark" data coffers. From starting small to operating at scale, Rob will discuss applications in telemetry, workforce analytics and liberating data from the Mars Rovers. Tools include iPython, Pandas, Boto and more.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,007
On SlideShare
0
From Embeds
0
Number of Embeds
47
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData SV 2014

  1. 1. Dark Data A Data Scientist’s Exploration of the Unknown Rob Witoff ! Data Scientist IT CTO Office @rwitoff Jet Propulsion Laboratory California Institute of Technology
  2. 2. NASA Explores our Universe
  3. 3. Exploration brings home data …a lot of it
  4. 4. This talk is about how we’re enabling _ to uncover and act on our Dark Data
  5. 5. The Situation
  6. 6. Goldstone, USA Madrid, Spain Canberra, Australia
  7. 7. Amazon SWF Decider Data Processing Tasks File Transfer Tasks Decision Tasks Data Processing Workers EC2 EC2 EC2 EC2 S3 JPL Data Center Decider File Transfer Workers Data Processing Workers Polyphony Cloud Computing
  8. 8. ! ! ! ! !
  9. 9. 1967
  10. 10. Today
  11. 11. 209 Data Sources documented ^
  12. 12. Dark Matter ! ! ! 84.5% of the known Universe
  13. 13. Dark Data ?? % of our universe
  14. 14. Dark Data is the data all around us, that we can’t use today.can’t use
  15. 15. findaccess interpret understand share re-interface process store question experiment can’t use
  16. 16. can’t imagine
  17. 17. “If this species is to survive indefinitely we need to become a multi-planet species. We need to go to Mars, and Mars is a stepping stone to other solar systems.” ! NASA Administrator, Charlie Bolden
  18. 18. Successful LandingCuriosity:
  19. 19. Mars 2020
  20. 20. Mars Sample Return
  21. 21. And Beyond!
  22. 22. Towards a solution
  23. 23. “After exploring the universe for decades, what unfound discoveries lie in our data?”
  24. 24. The Data Scientist
  25. 25. The Data Scientist Hypothesize Experiment Explore
  26. 26. http://www.cgtrader.com/3d-models/character-people/man/nasa-astronaut-apollo Tools Software & Libs World Watching Open Source Gloves Interpreted Languages Camera Data Viz Mission Control Active Communities Vehicles Scalable Storage & Compute
  27. 27. one Liberate Dark Data two Enable Engineers
  28. 28. Liberate
  29. 29. JPL Data Center Decider File Transfer Workers Data Processing Workers Polyphony Amazon SWF Decider Data Processing Tasks File Transfer Tasks Decision Tasks Data Processing Workers EC2 EC2 EC2 EC2 S3 Cloud Computing
  30. 30. http://json.jpl.nasa.gov
  31. 31. Closed Silos
  32. 32. How Do you Liberate New Data?
  33. 33. Greedy Solution: Explore For Data Wise Solution: Explore For Problems Best Solution: Explore For Questions
  34. 34. Greedy Solution: Explore For Data Find APIs!
  35. 35. Greedy Solution: Explore For Data Explore Find Get Excited! Find APIs! miss the point
  36. 36. Greedy Solution: Explore For Data Explore :-( Data is a means, not the end Find
  37. 37. Wise Solution: Explore For Problems Explore Integrate Increment! Incremental Successes!
  38. 38. Wise Solution: Explore For Problems Explore Integrate :-/ Incremental Expectations.
  39. 39. Best Solution: Explore For Questions Explore Rapid Prototype Like it’s 2014 Reset Expectations
  40. 40. Hacker News Heatmaps! Hum anity!
  41. 41. DataTau ML! Optimization! Hum anL! Lessons! Hum anL! Markov Chains! NoSQL! Sports!
  42. 42. Reddit!
  43. 43. Know What’s Out There.
  44. 44. Solve our Problems Together.
  45. 45. Best Solution: Explore For Questions
  46. 46. Data What can we do with this brain data?
  47. 47. Data Problem How healthy is my lobe?
  48. 48. Data Problem Question What if we could see the brain evolve?
  49. 49. Simple Sankey Flow Diagram
  50. 50. Added more data.
  51. 51. Added dimensions.
  52. 52. Data ProblemIdea What about the rest of our brain data?
  53. 53. what about this? Or this?
  54. 54. what about this? Or this? ?
  55. 55. People ?Projects
  56. 56. People Projects ? Investments Projects People Engineering Science ?Exploration
  57. 57. ? Investments Projects People Engineering Science Exploration Calendars Degrees Orgs HR Helpdesk Finance Sentiment? Resumes ecruiting ? Confer
  58. 58. ?Engineering Science Exploration Calendars Degrees HR Helpdesk Finance Sentiment? Resumes Recruiting ? Confer Investments Projects People Orgs
  59. 59. ? Investments Helpdesk Finance Sentiment? Resumes ecruiting People Engineering Science Exploration Degrees Confer HR Projects Orgs Calendars
  60. 60. ? Investments Helpdesk Finance Sentiment? Resumes ecruiting People Engineering Science Exploration Degrees Confer HR Projects Orgs Calendars Connect Your Dots
  61. 61. Enable
  62. 62. Expertise ! ! ! Data Expertise Data
  63. 63. Expertise Data
  64. 64. Python+ REPL Remote Browser
  65. 65. AWESOME + Python REPL Remote Browser
  66. 66. https://github.com/ipython/nbviewer
  67. 67. https://github.com/ipython/nbviewer
  68. 68. “Human Problems Won’t be Solved by Root Mean Square Error” -Drew Conway
  69. 69. Engage
  70. 70. Through Visualization
  71. 71. Engage before you Answer
  72. 72. http://xkcd.com/1356/
  73. 73. Pandas! Vincent Vega D3
  74. 74. Outside the Notebook
  75. 75. 12k Interesting Files 12k Documents Dynamo Results ReST
  76. 76. Making a Difference
  77. 77. Amazon SWF Decider Data Processing Tasks File Transfer Tasks Decision Tasks Data Processing Workers EC2 EC2 EC2 EC2 S3 JPL Data Center Decider File Transfer Workers Data Processing Workers Polyphony
  78. 78. Q What if we could interact with ALL of our Data?
  79. 79. QWhat if we were even closer to our data?
  80. 80. http://json.jpl.nasa.gov
  81. 81. Liberate your Dark Data Enable your Engineers Let’s Grow Data Science Together
  82. 82. Thank you! @rwitoff

×