Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Platforms for data science


Published on

Life science research, data platforms and cloud computing

Published in: Technology
  • Interesting presentation. Data science platforms like Kaggle and GingerBrain are also quite interesting!
    Are you sure you want to  Yes  No
    Your message goes here

Platforms for data science

  1. Platforms for data science Deepak Singh, Ph.D. Amazon Web Services Data transmission for international genomics projects 2010
  2. the new reality
  3. lots and lots and lots and lots and lots of data
  4. lots and lots and lots and lots and lots of people
  5. lots and lots and lots and lots and lots of places
  6. constant change
  7. science in a new reality
  8. science in a new reality ^
  9. data science in a new reality ^
  10. data as a programmable resource
  11. versioning
  12. provenance capture
  13. filter
  14. aggregate
  15. integrate
  16. extend
  17. mashup
  18. automate
  19. human interfaces
  20. tough problem
  21. really tough problem in the new reality
  22. goal
  23. optimize the most valuable resource
  24. compute, storage, workflows, memory, transmission, algorithms, cost, …
  25. people Credit: Pieter Musterd a CC-BY-NC-ND license
  26. enter the cloud
  27. what is the cloud?
  28. infrastructure
  29. scalable
  30. highly available
  31. dynamic
  32. extensible
  33. secure
  34. a utility
  35. programmable
  36. class Instance attr_accessor :aws_hash, :elastic_ip def initialize(hash, elastic_ip = nil) @aws_hash = hash @elastic_ip = elastic_ip end def public_dns @aws_hash[:dns_name] || "" end def friendly_name public_dns.empty? ? status.capitalize : public_dns.split(".")[0] end def id @aws_hash[:aws_instance_id] end end
  37. include_recipe "packages" include_recipe "ruby" include_recipe "apache2" if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" end else %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| package pkg do action :upgrade end end end gem_package "passenger" do version node[:passenger][:version] end execute "passenger_module" do command 'echo -en "nnnn" | passenger-install-apache2-module' creates node[:passenger][:module_path] end
  38. a data science platform
  39. dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise of the data scientist, Beautiful Data
  40. accept all data formats
  41. evolve APIs
  42. beyond the database and the data warehouse
  43. move compute to the data
  44. data is a royal garden
  45. compute is a fungible commodity
  46. “I terminate the instance and relaunch it. Thats my error handling” Source: @jtimberman on Twitter
  47. the cloud is an architectural and cultural fit for data science
  48. amazon web services
  49. your data science platform
  50. s3://1000genomes
  51. Credit: Angel Pizzaro, U. Penn
  53. mapreduce for genomics
  54. AWS knows massively scalable infrastructure
  55. you know the needs of the science
  56. we can make this work together
  57. Twitter:@mndoci Inspiration and ideas from Matt Wood, James Hamilton & Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license