Your SlideShare is downloading. ×
0
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Platforms for Data Science - Computing on the Brink

1,224

Published on

Talk at

Talk at

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,224
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. There is no magicThere is only awesome Platforms for data science D e e p a k S i n g h
  • 2. bioinformaticsimage: Ethan Hein
  • 3. 3
  • 4. collection
  • 5. curation
  • 6. analysis
  • 7. what’s the big deal?
  • 8. Source: http://www.nature.com/news/specials/bigdata/index.html
  • 9. Image:Yael Fitzpatrick (AAAS)
  • 10. Image:Yael Fitzpatrick (AAAS)
  • 11. lots of data
  • 12. lots of people
  • 13. lots of places
  • 14. constant change
  • 15. we want to make ourdata more effective
  • 16. versioning
  • 17. provenance
  • 18. filter
  • 19. aggregate
  • 20. extend
  • 21. mashup
  • 22. human interfaces
  • 23. image: Leo Reynolds
  • 24. hard problem
  • 25. really hard problem
  • 26. so how doget there?
  • 27. information platforms
  • 28. Image: Drew Conway
  • 29. dataspacesFurther reading: Jeff Hammerbacher, Information Platforms and the rise of the data scientist, Beautiful Data
  • 30. the unreasonable effectiveness of dataHalevy, et al. IEEE Intelligent Systems, 24, 8-12 (2009)
  • 31. accept all data formats
  • 32. evolve APIs
  • 33. beyond databases and the data warehouse
  • 34. data as aprogrammable resource
  • 35. data is aroyal garden
  • 36. compute is afungible commodity
  • 37. optimizing the most valuable resource
  • 38. compute, storage, workflows, memory,transmission, algorithms, cost, …
  • 39. peopleCredit: Pieter Musterd a CC-BY-NC-ND license
  • 40. Image: Chris Dagdigian
  • 41. my bias
  • 42. cloud services
  • 43. distributed systems
  • 44. scale
  • 45. global
  • 46. consumption models
  • 47. on-demand
  • 48. what is the value of your data?
  • 49. Credit: Angel Pizzaro, U. Penn
  • 50. mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
  • 51. Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
  • 52. 30,472 cores
  • 53. $1279/hr
  • 54. http://cloudbiolinux.org/
  • 55. http://usegalaxy.org/cloud
  • 56. in summary
  • 57. large scale datarequires a rethink
  • 58. data architecture
  • 59. compute architecture
  • 60. distributed,programmable infrastructure
  • 61. cloud services
  • 62. remove constraints
  • 63. can we build datascience platforms?
  • 64. there is no magicthere is only awesome
  • 65. deesingh@amazon.com Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood& Larry LessigCredit” Oberazzi under a CC-BY-NC-SA license

×