Cloud platforms - Cloud Computing

498 views

Published on

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
498
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
26
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Nutch -- Architecture wouldn’t scale to index billions of pagesPaper about GFS provided the info needed to solve their storage needs for the very large files generated as a part of the web crawl and indexing process. In particular, GFS would free up time being spent on administrative tasks such as managing storage nodes. NDFS was an open source implementation of the GFSGoogle introduced MapReduce to the world, by mid 2005 the Nutch project developed an open source implementationDoug Cutting joined Yahoo!, which proviede a dedicated team and the resources to turn Hadoop into a system that ran at the web scale. This was demonstrated in February 2008 when yahoo! announced that it’s production search index was being generated by a 10,000 core Hadoop ClusterThe NY Times used Amazon’s EC2 compute cloud to crunch through 4 terabytes of scanned arhives from the paper converting them to PDFs fro the Web. The processing took less than 24 hours to run using 100 machines, and the project probably wouldn’t have been embarked on without the combination of Amazon’s pay by the hour model and hadoops easy to use parallel programming model. Broke a world record to become the fastest system to sort a terabyte of data. Running on a 910 node cluster, Hadoop sorted one terabyte in 209 seconds. In November of the same year, Google announced its MapReduce implementation sorted one terabyte in 68 secods. By 2009, Yahoo! used Hadoop to sort one terabyte in 62 seconds.
  • Yahoo – 10,000 core Linux clusterFacebook – claims to have the largest Hadoop cluster in the world at 30 PB
  • Cloud platforms - Cloud Computing

    1. 1. Cloud Platforms in Industry
    2. 2. Cloud Computing  Allow use of computing resources  Services are delivered over the network  Services are divided into:  Infrastructure-as-a-Service (IaaS)  Platform-as-a-Service (PaaS)  Software-as-a-Service (SaaS)
    3. 3. Defining a Cloud  no physical object  an electronic structure  server behave as  one large storage space and processor  server clusters can provide a cloud setup
    4. 4. Benefits of Cloud Hosting  Elasticity  Pay by use  Self Service
    5. 5. Popular Providers  Amazon Web Services  Salesforce.com  Microsoft Azure  Google App Engine  Hadoop  Manjrasoft Aneka
    6. 6. Amazon Web Services(AWS)  AWS is collection of Web services providing  Compute power  Storage  Content delivery  Services available in AWS ecosystem are:     Compute service Storage service Communication service Additional services
    7. 7. Amazon EC2  Offers compute service and delivers IaaS  EC2 deploy servers as virtual machines  Signature features:  Amazon Machine Image (AMI)  EC2 instance and environment  AWS CloudFormation  AWS Elastic Beanstalk
    8. 8. Amazon Machine Image (AMI)  Templates to create virtual machine  Contains  Physical file system layout : Amazon Ramdisk Image  Predefined OS installed : Amazon Kernel Image  AMI created stored in S3 bucket  Product code can be associated for revenue
    9. 9. EC2 Instance  Represent virtual machines  Created by selecting  No. of cores  Computing power  Installed memory  Currently available configurations  Standard instances  Micro instances  Cluster GPU instances  EC2 instances can run by using  Command line tools  AWS console
    10. 10. EC2 Environment  EC2 instances are executed in virtual environment  EC2 environment is in charge of  Allocating addresses  Attaching storage volumes  Configuring security
    11. 11. Amazon S3  Amazon Simple Storage Service(S3) is distributed object store.  S3 provides services for data storage and information management  The components are:  Buckets  Objects
    12. 12. Amazon S3 Vs Distributed File System  Storage is hierarchical  Objects cannot be manipulated like standard files  Content is not immediately available to users  Request will occasionally fail
    13. 13. Features of S3  Resource Naming  Buckets  Objects and meta data  Access control and Security  Advanced features
    14. 14. Google App Engine Paas implementation Distributed and scalable runtime environment Usage can be metered
    15. 15. Components Platform Google AppEngine Google App Engine Runtime Environment Set of Scalable Services Google Storage Google’s Infrastructure
    16. 16. Runtime Environment  Sand Boxing  Supported runtimes
    17. 17. Distributed Meme: Divide & Conquer Specialized services Memcache URL Fetch Mail XMPP Task Queue Images 1 7 Datastore Cron jobs User Service
    18. 18. Hadoop  Software platform that lets one easily write and run applications that process vast amounts of data.  It includes: – MapReduce – offline computing engine – HDFS – Hadoop distributed file system
    19. 19. What does it do?  Implements Google’s MapReduce, using HDFS  MapReduce divides applications into many small blocks of work  HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster  MapReduce can then process the data where it is located  Hadoop ‘s target is to run on clusters of the order of 10,000-nodes
    20. 20. What Hadoop provides:  Ability to read and write data in parallel to or from multiple disks  Enables applications to work with thousands of nodes and petabytes of data  A reliable shared storage and analysis system (HDFS and MapReduce)  Advantages :     Scalable Economical Efficient Reliable
    21. 21. Who uses Hadoop?
    22. 22. Presented By: Aditi Rai Annapurna Tiwari

    ×