Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
DataEngineBy Patrick McSweeney
Dave Mills● PhD electrical engineering● 10-15 Number of experiments  per month● Raw data: 1 GB● Processed data : 5-10 MB● ...
State of the onionResearchers are using many different methods to collect or generate data from sensorsand CCDs to superco...
State of the onion
Data imported
Data provenance
Data manipulation
Choose Visualsiation
Save visualisation
Lots of possibilities
Take home●   An important step on the road to data science●   Make the repository a tool●   Get the data at the point of c...
The outlook is good
Upcoming SlideShare
Loading in …5
×

Data Engine

819 views

Published on

Slides to accompany Patrick McSweeney's winning pitch in the Open Repositories 2012 DevCSI Developer Challenge.

More information about this entry can be found at http://devcsi.ukoln.ac.uk/or2012-developer-challenge-data-engine

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Data Engine

  1. 1. DataEngineBy Patrick McSweeney
  2. 2. Dave Mills● PhD electrical engineering● 10-15 Number of experiments per month● Raw data: 1 GB● Processed data : 5-10 MB● Processed with MATLAB.● Raw data when zipped: 450 MB Dave Patrick
  3. 3. State of the onionResearchers are using many different methods to collect or generate data from sensorsand CCDs to supercomputers and particle colliders. When the data finally shows up inyour computer, what do you do with all this information that is now in your digitalshoebox? People are continually seeking me out and saying, “Help! I’ve got all this data.What am I supposed to do with it? My Excel spreadsheets are getting out of hand!”The suggestion that I have been making is that we now have terrible data managementtools for most of the science disciplines. Commercial organizations like Walmart canafford to build their own data management software, but in science we do not have thatluxury. At present, we have hardly any data visualization and analysis tools. Someresearch communities use MATLAB, for example, but the funding agencies in the U.S.and elsewhere need to do a lo more to foster the building of tools to make scientistsmore productive. When you go and look at what scientists are doing, day in and day out,in terms of data analysis, it is truly dreadful. And I suspect that many of you are in thesame state that I am in where essentially the only tools I have at my disposal areMATLAB and Excel!
  4. 4. State of the onion
  5. 5. Data imported
  6. 6. Data provenance
  7. 7. Data manipulation
  8. 8. Choose Visualsiation
  9. 9. Save visualisation
  10. 10. Lots of possibilities
  11. 11. Take home● An important step on the road to data science● Make the repository a tool● Get the data at the point of creatation● Repeatable experiments
  12. 12. The outlook is good

×