Cloud computing and Hadoop introduction

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    2 Favorites & 1 Group

    Cloud computing and Hadoop introduction - Presentation Transcript

    1. BioCloud Random large-scale tools that you can use
    2. Disclaimer I'm working on computer security research... no biology background anywhere in my field, not even on computer virus ;) While working, I stumbled across hadoop for scalable web spidering purposes. I'm not a bioinformatician (yet)... but I saw a powerful tool that could be useful in your research field(s): "biodatacrunching" ?
    3. Glossary • Cluster (beowulf) • Grid • Cloud
    4. Biology and computer science • Increasingly resource-hungry applications o Nowadays, they can be approached by "brute force" o More data means more "iron" to crunch it • Local IT team nor budget keep up with this pace o €€€ spent on new hardware o €€€ spent on IT personnel o Isn't it wiser to scale one machine at a time ? • Developers get angry or frustrated on o Delays on software installation and config o Unscheduled downtimes o Delays as a result of not enough computing power
    5. What is cloud computing ? In plain english: http://www.youtube.com/watch?v=XdBd14rjcs0
    6. Infrastructure layer
    7. Cloud niche
    8. Infraestructure • Amazon o EC2 o S3 o AMI  Recently added BioInformatic appliances  Public data sets • Eukalyptus o EC2 + AMI server-side open source implementation o We run it for our internal projects • Enomalism • Rightscale & Service Cloud o Tools/Consultants for the upcoming cloud issues
    9. Application layer • Tecnologias para paralelizar aplicaciones
    10. Application layer • Hadoop o Open source mapreduce implementation o Java based, but any language can be used • Cloudburst-bio o MapReduce fine tuned implementation for Bio (XXX)
    11. Easy mapreduce
    12. What is hadoop Quotation from official web page: "Hadoop is a software platform that lets one easily write and run applications that process vast amounts of data." "vast amounts of data (ATGTTAG...)" + "easily" = sounds good   isn't it ? or is it vaporware ?
    13. Why is it used for ? • Attack problems that imply several GB, TB even PB of data • The programmer does not care on job management o The focus is on data transformation, piping (useful work) • Not intended for realtime processing • Suitable to offload databases from long batch jobs
    14. What is MapReduce Joel on software explanation Useful to crunch *tons* of data parallellized by design
    15. HDFS: Hadoop Distributed FileSystem
    16. What about Jobs control ?
    17. Who is using it ? • Google o Lots of internal projects (proprietary MapReduce)  GMail spam machine learning  Google maps  ... • Yahoo o Internal web graph (powers search engine) o Pig (sqlish abstraction) o Sort 1 terabyte of data in 209 seconds • Facebook o Users big graph, used for data mining (Hive)
    18. Hadoop has (lots of) new friends • Nutch • Mahout • Hbase • Hama • Pig • ZooKeeper • Smartfrog • ...
    19. Next steps ? Identify resource-hungry applications (batch vs interactive) Migrate apps to cloud 1) Allocate a certain fixed amount of money 2) Give a try on amazon EC2 3) Optional: Build (local) rocks cluster with Eukaliptus cloud Test, deploy, automate, automate and automate ... puppet ?
    20. (a few) References http://www.cloudera.com/hadoop-training-thinking-at-scale http://www.slideshare.net/tag/hadoop http://sourceforge.net/projects/cloudburst-bio/ http://hadoop.apache.org/core/ http://people.apache.org/~rdonkin/hadoop-talk/hadoop.html
    SlideShare Zeitgeist 2009

    + christian.perezchristian.perez Nominate

    custom

    662 views, 2 favs, 0 embeds more stats

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 662
      • 662 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 2
    • Downloads 38
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories

    Groups / Events