SlideShare a Scribd company logo
1 of 49
Download to read offline
On Cloud Computing….

  “We in academia and the government labs have not
    kept up with the times, Universities really need to
    get on board.”
    - Randal E. Bryant, Dean of the Computer
    Science School at Carnegie Mellon University.




  source: http://www.nytimes.com/2007/10/08/technology/08cloud.html
What is Amazon?




                  3
Amazon.com and AWS
                               Bandwidth consumed by
                               Amazon Web Services




                 Bandwidth consumed by
                 Amazon’s global websites




   2001
   1996   2002
          1997       2003
                     1998      2004
                               1999         2005
                                            2000   2006
                                                   2001   2007
                                                          2002   2008
AWS Customer Momentum (490,000)

    Q1 2006


    Q1 2007


    Q1 2008


    Q4 2008

              0   100   200   300   400   500   600
Amazon S3 Momentum




     800,000,000   5,000,000,000   10,000,000,000   40,000,000,000


        Q2             Q2              Q3               Q4
       2006           2007            2007             2008

              Total Objects Stored in Amazon S3
                                                        6
Why Are People So Excited ?
Most Companies Worry About This

Your Idea    Undifferentiated         Successful
             “Heavy Lifting”           Product
                Power/Cooling
            Hardware Management
            Bandwidth Management
              Contract Negotiations
                  Maintenance
                  Deployment
              Purchasing Decisions
             Load Balancing/Scaling
               Managing Growth
70/30 Switch
Focus on Innovation

 Your Idea    Undifferentiated   Successful
              “Heavy Lifting”     Product




             Cloud Computing
Amazon Cloud Computing

 Elastic Unlimited Capacity      Get Big Fast

      Pay As You Go           Spend Cash Wisely

   Simple, Reliable, Fast     Focus On Your Idea
Amazon             Amazon
EC2                SQS



 Amazon   Amazon     Amazon

   S3     Simple     EC2-
            DB       EBS
ANIMOTO.COM
Scale: 50 servers to 5000 servers in 3 days

                                                                       Amazon EC2 easily scaled
                                                                       to handle additional traffic
       Number of EC2 Instances


                                                                       Peak of 5000 instances




                                     Launch of Facebook modification.


                                     Steady state of ~40 instances



                                 4/12/2008   4/13/2008   4/14/2008   4/15/2008   4/16/2008   4/17/2008   4/18/2008   4/19/2008   4/20/2008
“TimesMachine” from NY Times

                  1851-1922 Articles
                  TIFF -> PDF
                  Input: 11 Million
                  Articles (4TB of data)
                  What did he do ?
                    100 EC2 Instances for
                    24 hours
                    All data on S3
                    Output: 1.5 TB of Data
                    Hadoop, iText, JetS3t
26
CS290F : Scalable Internet Services
                 USCB Fall 2006
                   Prof created an app to manage team
                   usage
                   Ruby on Rails
                   Complete Stack: From Load balancer,
                   App Server to DB
                   Learn how to scale: Simulated load
                   Generated Graphs
                   All course contents, students
                   assignments, lessons learned are on the
                   Wiki
CS345a : Data Mining @ Stanford

  Tools used:            Class organization:
    Shell/Linux/Java     Stanford Winter 2007
    Hadoop on EC2          30-35 Students
    Data set on S3         Each Team spawns 10-
    Datasets :NetFlix,     15 Hadoop slave nodes
    Alexa, IR datasets     TA created Getting-
    from TREC              Started AMIs (& scripts)
                           TA managed the
                           students usage
Bioinformatics @ Northwestern University


  • Using Hadoop to perform sequence
    alignments on large genomic datasets
    – Northwestern University (Flatow & Lin) presented
      a talk at the Next-gen Sequencing Data Analysis
      meeting
       • “An understanding of the industrial strength map-
         reduce paradigm will be invaluable to those looking to
         cope with the next-generation datasets. Combined with
         the power of elastic computing clouds, many of the
         potential barriers to dealing with such large-scale data
         can be completely eliminated.”

                               31
Cloud Architectures


Hardware
Infrastructure/Cost
            Job execution time




                                 time
Shrink your processing time
 CPUs




                              time
Shrink your processing time
 CPUs




                              time
Main Problems

             • How to co-ordinate jobs
               between machines             Hadoop
               (distributed processing) ?
 Technical   • What if a machine fails ?      Web
             • How will I Scale-out ?       Services


             • How do I get management
               signoff ?
             • Resources to manage the        Cloud
 Business      infrastructure?              Computing
             • How do I get rid of the
               Idle Infrastructure?
GrepTheWeb
What’s so cool about GrepTheWeb ?




   RegEx
                 WWW
Examples of Patterns
    Source Code
      int x = 40 + i
    Any thing with punctuation
      “Hey!” he said, “Are you ok?”
    Case Sensitive
      Function CallOrderController()
    Equations
      f(x) = x^2
    Other Patterns
      (dis)integration of life, Email Address
Zoom Level 1

               Input dataset (List
               of Document Urls)
                                     GetStatus
     RegEx            Alexa
                                     Subset of
                  GrepTheWeb         document
                     Service         URLs that
                                     matched
                                     the RegEx
Zoom Level 2
  Amazon SQS                                           Input Files
  Distributed Transient                                (Alexa Crawl)
  Buffer
                      Amazon S3
  Never Lose a messageInfinitely Scalable Storage in the cloud
        StartGrep
         RegEx                        Amazon
                          Highly         Amazon SimpleDB
                                   Available, Durable and Reliable
                                        SQS
  Ideal for small short-lived
    Amazon EC2                                 Database in the cloud
  messages Computing
    Resizable
                                                  Manage phases
                          Private and Public Storage
    Capacity in the cloud                   Controller
                                               Lightweight Query-able
  Access control          Pay by the GB
                              User info,       AttributeMonitor,
                                                    Launch, Store
    Spawn Server Instances    Job status info       Shutdown
  Message Locking
    using a Web Service call                   Distributed and
                                              Amazon
                                               Partitioned
                                                EC2
                           Amazon
    Root Level Access SimpleDB
         GetStatus                            Cluster    Input  Amazon
                                                                       Get Output
                             DB                          Output S3
                                               Pay by GB, Pay per
    Pay by the hour                            Query
Zoom Level 3
             Amazon SQS
                                                                        Billing
                                                                        Queue



StartGrep      Launch                            Monitor                      Shut                       Billing
               Queue                             Queue                       down                        Service
                                                                             Queue



                 Controller
                          Launch                       Monitor                       Shutdown            Billing
                          Controller                   Controller                    Controller          Controller

                                       launch                            Get EC2
                                                           ping          Info
                           Insert       Insert
                                        EC2                                            Check for
                           JobID,
                                        info                                           results
                           Status                                       Shutdown

                                                                    Master M
 GetStatus
                                                                     Slaves N                                         Get Output
                                                                    HDFS                            Output
                              Status                                                  Put
                                DB                                                    File

                                                                                                                      Input Files
                                                                                                                      (Alexa Crawl)
                                                                                                     Input
                                                                                     Get
                            Amazon                         Hadoop Cluster on         File
                           SimpleDB                          Amazon EC2
                                                                                                   Amazon S3
Zoom Level 4
                                  Combine
                          Map
             User1
             StartJob1    Map                        StopJob1
                          Map         Reduce

                          …..

   Service                Map                                   Store status
                          Tasks                                 and results
                                        Hadoop Job
                                                                               Get
                                                                               Result
                                  Combine
                          Map

                          Map

                          Map          Reduce
              User2                                  StopJob2
              StartJob2   …..

                          Map
                          Tasks
                                        Hadoop Job
SideTrack: WordCount Example
                                                               Input
 MAPPER: For each input record, extract
 a set of key/value pairs that we care                                 Input key
 about the each record                                                 value pairs
                                                               Map
 “Hi Hadoop, Bye Hadoop”

 (“Hi”, 1), (“Hadoop”, 1),                         key 1                    key 3
                                                                            Values..
 (“Bye”, 1), (“Hadoop”, 1)                         Values..


 REDUCER: For each extracted                                           Aggregate
                                               Key 1
 key/value pair, combine it with other         All Values..
 values that share the same key
                                                              Reduce
      (“Hadoop”, [1,1])

        (“Hadoop”, 2)                                             Final Key 1
                                                                  Values..

                     Source: Doug Cutting’s Slide Deck on Hadoop
Zoom Level 5 (Hadoop MapReduce)
 MAPPER: For each input record, extract a set                      Input

 of key/value pairs that we care about the each
                                                                           Input key
 record                                                                    value pairs

                                                                    Map
      (LineNumber, s3pointer)

                                                        key 1                     key 3
       (s3pointer, [matches])                           Values..                  Values..



                                                                            Aggregate
                                                  Key 1
                                                  All Values..
 REDUCER: For each extracted key/value pair,
 combine it with other values that share the
 same key                                                          Reduce


                                                                       Final Key 1 Values..
          Identity Function

                       Source: Doug Cutting’s Slide Deck on Hadoop
Cloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWeb

More Related Content

What's hot

Cloud architecture
Cloud architectureCloud architecture
Cloud architecture
Adeel Javaid
 
System Models in Software Engineering SE7
System Models in Software Engineering SE7System Models in Software Engineering SE7
System Models in Software Engineering SE7
koolkampus
 
Cloud Computing Principles and Paradigms: 11 t-systems cloud-based solutions ...
Cloud Computing Principles and Paradigms: 11 t-systems cloud-based solutions ...Cloud Computing Principles and Paradigms: 11 t-systems cloud-based solutions ...
Cloud Computing Principles and Paradigms: 11 t-systems cloud-based solutions ...
Majid Hajibaba
 
System models for distributed and cloud computing
System models for distributed and cloud computingSystem models for distributed and cloud computing
System models for distributed and cloud computing
purplesea
 

What's hot (20)

6.distributed shared memory
6.distributed shared memory6.distributed shared memory
6.distributed shared memory
 
Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
 
Unit 1
Unit 1Unit 1
Unit 1
 
Key management and distribution
Key management and distributionKey management and distribution
Key management and distribution
 
Naming in Distributed System
Naming in Distributed SystemNaming in Distributed System
Naming in Distributed System
 
Cloud architecture
Cloud architectureCloud architecture
Cloud architecture
 
System models in distributed system
System models in distributed systemSystem models in distributed system
System models in distributed system
 
Centralized shared memory architectures
Centralized shared memory architecturesCentralized shared memory architectures
Centralized shared memory architectures
 
System Models in Software Engineering SE7
System Models in Software Engineering SE7System Models in Software Engineering SE7
System Models in Software Engineering SE7
 
Cloud Computing Principles and Paradigms: 11 t-systems cloud-based solutions ...
Cloud Computing Principles and Paradigms: 11 t-systems cloud-based solutions ...Cloud Computing Principles and Paradigms: 11 t-systems cloud-based solutions ...
Cloud Computing Principles and Paradigms: 11 t-systems cloud-based solutions ...
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed System
 
Cloud Computing: Virtualization
Cloud Computing: VirtualizationCloud Computing: Virtualization
Cloud Computing: Virtualization
 
Distributed Computing system
Distributed Computing system Distributed Computing system
Distributed Computing system
 
Introduction to Google App Engine
Introduction to Google App EngineIntroduction to Google App Engine
Introduction to Google App Engine
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
Cloud Computing and Service oriented Architecture (SOA)
Cloud Computing and Service oriented Architecture (SOA)Cloud Computing and Service oriented Architecture (SOA)
Cloud Computing and Service oriented Architecture (SOA)
 
System models for distributed and cloud computing
System models for distributed and cloud computingSystem models for distributed and cloud computing
System models for distributed and cloud computing
 
Cloud Computing - Benefits and Challenges
Cloud Computing - Benefits and ChallengesCloud Computing - Benefits and Challenges
Cloud Computing - Benefits and Challenges
 
Cloud Ecosystem
Cloud EcosystemCloud Ecosystem
Cloud Ecosystem
 
Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed Computing
 

Viewers also liked

Anaptixi didaskalias mikromathimatos [λειτουργία συμβατότητας]
Anaptixi didaskalias mikromathimatos [λειτουργία συμβατότητας]Anaptixi didaskalias mikromathimatos [λειτουργία συμβατότητας]
Anaptixi didaskalias mikromathimatos [λειτουργία συμβατότητας]
Georgios Kiriakidis
 
Content, context, and community
Content, context, and communityContent, context, and community
Content, context, and community
Eric Reiss
 
Ellis Island History
Ellis Island HistoryEllis Island History
Ellis Island History
awltech
 
Lo Que Se Puede Hacer
Lo Que Se Puede HacerLo Que Se Puede Hacer
Lo Que Se Puede Hacer
Sibiu86
 
Textile Storyboard Version 3 Guru
Textile Storyboard Version 3 GuruTextile Storyboard Version 3 Guru
Textile Storyboard Version 3 Guru
guestc8832a4
 
Presentacion evaluation third project meeting in spain
Presentacion evaluation third project meeting in spainPresentacion evaluation third project meeting in spain
Presentacion evaluation third project meeting in spain
Jose Luis Leon Gonzalez
 

Viewers also liked (20)

The Cloud as a Platform
The Cloud as a PlatformThe Cloud as a Platform
The Cloud as a Platform
 
Anaptixi didaskalias mikromathimatos [λειτουργία συμβατότητας]
Anaptixi didaskalias mikromathimatos [λειτουργία συμβατότητας]Anaptixi didaskalias mikromathimatos [λειτουργία συμβατότητας]
Anaptixi didaskalias mikromathimatos [λειτουργία συμβατότητας]
 
อีเลิร์นนิ่งสำหรับผู้บริหารโรงเรียนสังกัด กทม.
อีเลิร์นนิ่งสำหรับผู้บริหารโรงเรียนสังกัด กทม.อีเลิร์นนิ่งสำหรับผู้บริหารโรงเรียนสังกัด กทม.
อีเลิร์นนิ่งสำหรับผู้บริหารโรงเรียนสังกัด กทม.
 
Innovation, Service, and Shared References
Innovation, Service, and Shared ReferencesInnovation, Service, and Shared References
Innovation, Service, and Shared References
 
Catavento cultural 31
Catavento cultural 31Catavento cultural 31
Catavento cultural 31
 
EPiServer Update October 2013
EPiServer Update October 2013EPiServer Update October 2013
EPiServer Update October 2013
 
Spain performance assessment of students
Spain performance assessment of studentsSpain performance assessment of students
Spain performance assessment of students
 
Of brains and buttons (UXCE, Berlin, Germany)
Of brains and buttons (UXCE, Berlin, Germany)Of brains and buttons (UXCE, Berlin, Germany)
Of brains and buttons (UXCE, Berlin, Germany)
 
Italian Version: Disasters 2.0: Collaborazione in Tempo Reale: Documentazione...
Italian Version: Disasters 2.0: Collaborazione in Tempo Reale: Documentazione...Italian Version: Disasters 2.0: Collaborazione in Tempo Reale: Documentazione...
Italian Version: Disasters 2.0: Collaborazione in Tempo Reale: Documentazione...
 
His m07t03c
His m07t03cHis m07t03c
His m07t03c
 
Ochoa marmex
Ochoa marmexOchoa marmex
Ochoa marmex
 
Content, context, and community
Content, context, and communityContent, context, and community
Content, context, and community
 
Ellis Island History
Ellis Island HistoryEllis Island History
Ellis Island History
 
wtfux?
wtfux?wtfux?
wtfux?
 
Lo Que Se Puede Hacer
Lo Que Se Puede HacerLo Que Se Puede Hacer
Lo Que Se Puede Hacer
 
Textile Storyboard Version 3 Guru
Textile Storyboard Version 3 GuruTextile Storyboard Version 3 Guru
Textile Storyboard Version 3 Guru
 
Pledge Drive Workshop
Pledge Drive WorkshopPledge Drive Workshop
Pledge Drive Workshop
 
Situational Awareness 2.0 #EMAG2011
Situational Awareness 2.0 #EMAG2011 Situational Awareness 2.0 #EMAG2011
Situational Awareness 2.0 #EMAG2011
 
Batxillerat1011v.5 05-10
Batxillerat1011v.5 05-10Batxillerat1011v.5 05-10
Batxillerat1011v.5 05-10
 
Presentacion evaluation third project meeting in spain
Presentacion evaluation third project meeting in spainPresentacion evaluation third project meeting in spain
Presentacion evaluation third project meeting in spain
 

Similar to Cloud Architectures - Jinesh Varia - GrepTheWeb

Cloud computing with AWS
Cloud computing with AWS Cloud computing with AWS
Cloud computing with AWS
ikanow
 
Jeff Barr Amazon Services Cloud Computing
Jeff Barr Amazon Services Cloud ComputingJeff Barr Amazon Services Cloud Computing
Jeff Barr Amazon Services Cloud Computing
deimos
 
Masterworks talk on Big Data and the implications of petascale science
Masterworks talk on Big Data and the implications of petascale scienceMasterworks talk on Big Data and the implications of petascale science
Masterworks talk on Big Data and the implications of petascale science
Deepak Singh
 

Similar to Cloud Architectures - Jinesh Varia - GrepTheWeb (20)

Jeff barr Seattle_interactive_2011_q4
Jeff barr Seattle_interactive_2011_q4Jeff barr Seattle_interactive_2011_q4
Jeff barr Seattle_interactive_2011_q4
 
Serving Media From The Edge - Miles Ward - AWS Summit 2012 Australia
Serving Media From The Edge - Miles Ward - AWS Summit 2012 AustraliaServing Media From The Edge - Miles Ward - AWS Summit 2012 Australia
Serving Media From The Edge - Miles Ward - AWS Summit 2012 Australia
 
Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom
Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcomRethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom
Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom
 
Cloud computing with AWS
Cloud computing with AWS Cloud computing with AWS
Cloud computing with AWS
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
 
Introduction to Amazon Web Services
Introduction to Amazon Web ServicesIntroduction to Amazon Web Services
Introduction to Amazon Web Services
 
2011 State of the Cloud: A Year's Worth of Innovation in 30 Minutes - Jinesh...
2011 State of the Cloud:  A Year's Worth of Innovation in 30 Minutes - Jinesh...2011 State of the Cloud:  A Year's Worth of Innovation in 30 Minutes - Jinesh...
2011 State of the Cloud: A Year's Worth of Innovation in 30 Minutes - Jinesh...
 
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
 
Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)
 
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
 
Scalable Database Options on AWS
Scalable Database Options on AWSScalable Database Options on AWS
Scalable Database Options on AWS
 
AWS Webcast - What is Cloud Computing?
AWS Webcast - What is Cloud Computing?AWS Webcast - What is Cloud Computing?
AWS Webcast - What is Cloud Computing?
 
AWS Boot Camp in Taipei
AWS Boot Camp in TaipeiAWS Boot Camp in Taipei
AWS Boot Camp in Taipei
 
Carlos Condè - Amazon Web Services
Carlos Condè - Amazon Web ServicesCarlos Condè - Amazon Web Services
Carlos Condè - Amazon Web Services
 
AWS Overview - Cloud for the Enterprise - AWS Enterprise Tour - SF - 2010, D...
AWS Overview  - Cloud for the Enterprise - AWS Enterprise Tour - SF - 2010, D...AWS Overview  - Cloud for the Enterprise - AWS Enterprise Tour - SF - 2010, D...
AWS Overview - Cloud for the Enterprise - AWS Enterprise Tour - SF - 2010, D...
 
Jeff Barr Amazon Services Cloud Computing
Jeff Barr Amazon Services Cloud ComputingJeff Barr Amazon Services Cloud Computing
Jeff Barr Amazon Services Cloud Computing
 
Masterworks talk on Big Data and the implications of petascale science
Masterworks talk on Big Data and the implications of petascale scienceMasterworks talk on Big Data and the implications of petascale science
Masterworks talk on Big Data and the implications of petascale science
 
(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and Scalable
(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and Scalable(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and Scalable
(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and Scalable
 
Werner Vogels
Werner Vogels Werner Vogels
Werner Vogels
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Cloud Architectures - Jinesh Varia - GrepTheWeb

  • 1.
  • 2. On Cloud Computing…. “We in academia and the government labs have not kept up with the times, Universities really need to get on board.” - Randal E. Bryant, Dean of the Computer Science School at Carnegie Mellon University. source: http://www.nytimes.com/2007/10/08/technology/08cloud.html
  • 4. Amazon.com and AWS Bandwidth consumed by Amazon Web Services Bandwidth consumed by Amazon’s global websites 2001 1996 2002 1997 2003 1998 2004 1999 2005 2000 2006 2001 2007 2002 2008
  • 5. AWS Customer Momentum (490,000) Q1 2006 Q1 2007 Q1 2008 Q4 2008 0 100 200 300 400 500 600
  • 6. Amazon S3 Momentum 800,000,000 5,000,000,000 10,000,000,000 40,000,000,000 Q2 Q2 Q3 Q4 2006 2007 2007 2008 Total Objects Stored in Amazon S3 6
  • 7. Why Are People So Excited ?
  • 8.
  • 9. Most Companies Worry About This Your Idea Undifferentiated Successful “Heavy Lifting” Product Power/Cooling Hardware Management Bandwidth Management Contract Negotiations Maintenance Deployment Purchasing Decisions Load Balancing/Scaling Managing Growth
  • 11. Focus on Innovation Your Idea Undifferentiated Successful “Heavy Lifting” Product Cloud Computing
  • 12. Amazon Cloud Computing Elastic Unlimited Capacity Get Big Fast Pay As You Go Spend Cash Wisely Simple, Reliable, Fast Focus On Your Idea
  • 13. Amazon Amazon EC2 SQS Amazon Amazon Amazon S3 Simple EC2- DB EBS
  • 15. Scale: 50 servers to 5000 servers in 3 days Amazon EC2 easily scaled to handle additional traffic Number of EC2 Instances Peak of 5000 instances Launch of Facebook modification. Steady state of ~40 instances 4/12/2008 4/13/2008 4/14/2008 4/15/2008 4/16/2008 4/17/2008 4/18/2008 4/19/2008 4/20/2008
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21. “TimesMachine” from NY Times 1851-1922 Articles TIFF -> PDF Input: 11 Million Articles (4TB of data) What did he do ? 100 EC2 Instances for 24 hours All data on S3 Output: 1.5 TB of Data Hadoop, iText, JetS3t
  • 22.
  • 23.
  • 24.
  • 25.
  • 26. 26
  • 27.
  • 28.
  • 29. CS290F : Scalable Internet Services USCB Fall 2006 Prof created an app to manage team usage Ruby on Rails Complete Stack: From Load balancer, App Server to DB Learn how to scale: Simulated load Generated Graphs All course contents, students assignments, lessons learned are on the Wiki
  • 30. CS345a : Data Mining @ Stanford Tools used: Class organization: Shell/Linux/Java Stanford Winter 2007 Hadoop on EC2 30-35 Students Data set on S3 Each Team spawns 10- Datasets :NetFlix, 15 Hadoop slave nodes Alexa, IR datasets TA created Getting- from TREC Started AMIs (& scripts) TA managed the students usage
  • 31. Bioinformatics @ Northwestern University • Using Hadoop to perform sequence alignments on large genomic datasets – Northwestern University (Flatow & Lin) presented a talk at the Next-gen Sequencing Data Analysis meeting • “An understanding of the industrial strength map- reduce paradigm will be invaluable to those looking to cope with the next-generation datasets. Combined with the power of elastic computing clouds, many of the potential barriers to dealing with such large-scale data can be completely eliminated.” 31
  • 32.
  • 33.
  • 34.
  • 36. Shrink your processing time CPUs time
  • 37. Shrink your processing time CPUs time
  • 38. Main Problems • How to co-ordinate jobs between machines Hadoop (distributed processing) ? Technical • What if a machine fails ? Web • How will I Scale-out ? Services • How do I get management signoff ? • Resources to manage the Cloud Business infrastructure? Computing • How do I get rid of the Idle Infrastructure?
  • 40. What’s so cool about GrepTheWeb ? RegEx WWW
  • 41. Examples of Patterns Source Code int x = 40 + i Any thing with punctuation “Hey!” he said, “Are you ok?” Case Sensitive Function CallOrderController() Equations f(x) = x^2 Other Patterns (dis)integration of life, Email Address
  • 42. Zoom Level 1 Input dataset (List of Document Urls) GetStatus RegEx Alexa Subset of GrepTheWeb document Service URLs that matched the RegEx
  • 43. Zoom Level 2 Amazon SQS Input Files Distributed Transient (Alexa Crawl) Buffer Amazon S3 Never Lose a messageInfinitely Scalable Storage in the cloud StartGrep RegEx Amazon Highly Amazon SimpleDB Available, Durable and Reliable SQS Ideal for small short-lived Amazon EC2 Database in the cloud messages Computing Resizable Manage phases Private and Public Storage Capacity in the cloud Controller Lightweight Query-able Access control Pay by the GB User info, AttributeMonitor, Launch, Store Spawn Server Instances Job status info Shutdown Message Locking using a Web Service call Distributed and Amazon Partitioned EC2 Amazon Root Level Access SimpleDB GetStatus Cluster Input Amazon Get Output DB Output S3 Pay by GB, Pay per Pay by the hour Query
  • 44. Zoom Level 3 Amazon SQS Billing Queue StartGrep Launch Monitor Shut Billing Queue Queue down Service Queue Controller Launch Monitor Shutdown Billing Controller Controller Controller Controller launch Get EC2 ping Info Insert Insert EC2 Check for JobID, info results Status Shutdown Master M GetStatus Slaves N Get Output HDFS Output Status Put DB File Input Files (Alexa Crawl) Input Get Amazon Hadoop Cluster on File SimpleDB Amazon EC2 Amazon S3
  • 45. Zoom Level 4 Combine Map User1 StartJob1 Map StopJob1 Map Reduce ….. Service Map Store status Tasks and results Hadoop Job Get Result Combine Map Map Map Reduce User2 StopJob2 StartJob2 ….. Map Tasks Hadoop Job
  • 46. SideTrack: WordCount Example Input MAPPER: For each input record, extract a set of key/value pairs that we care Input key about the each record value pairs Map “Hi Hadoop, Bye Hadoop” (“Hi”, 1), (“Hadoop”, 1), key 1 key 3 Values.. (“Bye”, 1), (“Hadoop”, 1) Values.. REDUCER: For each extracted Aggregate Key 1 key/value pair, combine it with other All Values.. values that share the same key Reduce (“Hadoop”, [1,1]) (“Hadoop”, 2) Final Key 1 Values.. Source: Doug Cutting’s Slide Deck on Hadoop
  • 47. Zoom Level 5 (Hadoop MapReduce) MAPPER: For each input record, extract a set Input of key/value pairs that we care about the each Input key record value pairs Map (LineNumber, s3pointer) key 1 key 3 (s3pointer, [matches]) Values.. Values.. Aggregate Key 1 All Values.. REDUCER: For each extracted key/value pair, combine it with other values that share the same key Reduce Final Key 1 Values.. Identity Function Source: Doug Cutting’s Slide Deck on Hadoop