SlideShare a Scribd company logo
1 of 12
Download to read offline
BI on Cloud Computing
Rohit Chatter
BI & Data Architect
Data & Insights Group
Yahoo India R & D, Bangalore
Agenda

•   Cloud Computing – Quick look
•   Cloud@Yahoo – The Yahoo! way
•   Case study of BI on Cloud @ Yahoo – Live case
•   My personal views – sharing experience
•   Q&A
Cloud Computing

• What is it?
  – style of computing where massively scalable IT-related capabilities are
    provided “as a service” using Internet technologies to multiple
    external customers
  – E.g. Amazon EC2/S3, Yahoo, Google

• Key Features
  – Multi-Tenancy, On demand resources, Device & location independence, API
    based, scalability and others

• A Perspective
Focus on the business!

• Desire - start a new website, iwanttosell.com
• Service - provide listings of items for sale, jobs, etc.
• Business does well and more features needed
    – How do we scale for demand?
•   Store listings as (key, category, description)
•   Customers quickly ask for keyword search
•   Add photos to listings
•   And the business continues to grow and grow!
Cloud Computing – a perspective




                   •   Copyright University of California at Berkeley
Internet Scale Generates BigData
• Yahoo is the most Visited Site on the Internet
   –   600M+ Unique Visitors per Month
   –   Billions of Page Views per Day
   –   Billions of Searches per Month
   –   Billions of Emails per Month
   –   Terabytes of Data per Day!

• And we crawl the Web
   – 100+ Billion Pages
   – 5+ Trillion Links
   – Petabytes of data
• Reading 100 Terabytes could be overwhelming
       Std PC – 100Mbps         Server – 10Gbps   1000 Std PC
          ~ 11 days                 ~ 1 day        ~ 15 mins
How is Yahoo seeing the space?

•   Yahoo sees two kinds of cloud services:            •   Technology – Open Source adoption
     – Horizontal Cloud Services                            – Hadoop – Grid
         • Functionality enabling tenants to build
            applications or new services on top of          – PIG – Programming language
            the cloud
         • The focus of CCDI                                – ZooKeeper -- High-Availability Directory
     – Functional Cloud Services                              and Configuration Service
         • Functionality that is useful in and of
            itself to tenants.                              – Oozie – Workflow engine
               – Yahoo!’s IndexTools; Yahoo!
                   properties aimed at end-users
                   e.g., flickr, Groups, Mail, News,
                   Shopping
         • Could be build on top of horizontal
            cloud services or from scratch
BI on Cloud – Case study

• Motivation
    •   Report & Data requirements unknown
    •   Evolving needs
    •   Large data processing on demand
    •   Web based access

•   Architecture
•   Functional View
•   What is computed where?
•   Few screenshots
•   Benefits
BI on Cloud – Architecture

  Functional View                                     What is computed
                                                           where
Apache Web Server          Load balanced web
      PHP
 Microstrategy/Hom                                  Derived Metrics – CTR, Depth,
                          App Server – BI layer            RPM, Coverage
      e Grown

  Oracle RDBMS       Aggregates &                     Rollups, Type 2 Dimension,

                     Metadata layer                       Alerts & Messaging
  BI Aggregates
    (H,D,W,M)
                                                             Metrics
 Utility Computing        Hadoop Grid + PIG        Impressions, Revenue, Clicks,
 Build Aggregates                                   Conversions, Quality Score,
                               Cloud                       Top keywords



  Data Source          Data – 100+ Gigabytes/Day
Dimension & Fact
BI on Cloud – Screenshot
In My Opinion

• Is Cloud ready for DW & BI?
• Pros & Cons of BI on Cloud
• Options looked at:
  – Custom solution & Hive
  – Microstrategy & Hive
  – Pentaho
‘Determine that things can and shall be done, and then we shall find the way!’
                                                                           A. Lincoln

More Related Content

Similar to BI on Cloud Computing: Leveraging Scalable Infrastructure

SharePoint User Experience Best Practices
SharePoint User Experience Best PracticesSharePoint User Experience Best Practices
SharePoint User Experience Best PracticesPerficient, Inc.
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranJAX London
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingyalisassoon
 
MS Azure with IoT - Final Version
MS Azure with IoT - Final VersionMS Azure with IoT - Final Version
MS Azure with IoT - Final VersionJanani Eshwaran
 
MS Azure with IoT - Final Version
MS Azure with IoT - Final VersionMS Azure with IoT - Final Version
MS Azure with IoT - Final VersionJanani Eshwaran
 
Drupal for Public Sector Organisations
Drupal for Public Sector OrganisationsDrupal for Public Sector Organisations
Drupal for Public Sector Organisationspanlogic
 
William Makower, CEO, Panlogic Ltd; Robert Hill , Web Services Manager, Oxfor...
William Makower, CEO, Panlogic Ltd; Robert Hill , Web Services Manager, Oxfor...William Makower, CEO, Panlogic Ltd; Robert Hill , Web Services Manager, Oxfor...
William Makower, CEO, Panlogic Ltd; Robert Hill , Web Services Manager, Oxfor...Lucia Garcia
 
MEEC Baltimore SharePoint 2010 presentation
MEEC Baltimore SharePoint 2010 presentationMEEC Baltimore SharePoint 2010 presentation
MEEC Baltimore SharePoint 2010 presentationDaniel Cohen-Dumani
 
SharePoint My Sites: Aligning Business Needs, Corporate Culture & SharePoint ...
SharePoint My Sites: Aligning Business Needs, Corporate Culture & SharePoint ...SharePoint My Sites: Aligning Business Needs, Corporate Culture & SharePoint ...
SharePoint My Sites: Aligning Business Needs, Corporate Culture & SharePoint ...Perficient, Inc.
 
Codestrong 2012 breakout session the role of cloud services in your next ge...
Codestrong 2012 breakout session   the role of cloud services in your next ge...Codestrong 2012 breakout session   the role of cloud services in your next ge...
Codestrong 2012 breakout session the role of cloud services in your next ge...Axway Appcelerator
 
U of A Web Strategy and Sitecore
U of A Web Strategy and SitecoreU of A Web Strategy and Sitecore
U of A Web Strategy and SitecoreTim Schneider
 
IDC Bari-12print
IDC Bari-12printIDC Bari-12print
IDC Bari-12printVMEngine
 
SaaS - Taking a Closer Look
SaaS - Taking a Closer LookSaaS - Taking a Closer Look
SaaS - Taking a Closer LookAnja Rej
 
Time to Fly - Why Predictive Analytics is Going Mainstream
Time to Fly - Why Predictive Analytics is Going MainstreamTime to Fly - Why Predictive Analytics is Going Mainstream
Time to Fly - Why Predictive Analytics is Going MainstreamInside Analysis
 
Nosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use CasesNosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use CasesMongoDB
 

Similar to BI on Cloud Computing: Leveraging Scalable Infrastructure (20)

Highlights from SharePoint Conference 2011
Highlights from SharePoint Conference 2011Highlights from SharePoint Conference 2011
Highlights from SharePoint Conference 2011
 
SharePoint User Experience Best Practices
SharePoint User Experience Best PracticesSharePoint User Experience Best Practices
SharePoint User Experience Best Practices
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changing
 
MS Azure with IoT - Final Version
MS Azure with IoT - Final VersionMS Azure with IoT - Final Version
MS Azure with IoT - Final Version
 
MS Azure with IoT - Final Version
MS Azure with IoT - Final VersionMS Azure with IoT - Final Version
MS Azure with IoT - Final Version
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Drupal for Public Sector Organisations
Drupal for Public Sector OrganisationsDrupal for Public Sector Organisations
Drupal for Public Sector Organisations
 
William Makower, CEO, Panlogic Ltd; Robert Hill , Web Services Manager, Oxfor...
William Makower, CEO, Panlogic Ltd; Robert Hill , Web Services Manager, Oxfor...William Makower, CEO, Panlogic Ltd; Robert Hill , Web Services Manager, Oxfor...
William Makower, CEO, Panlogic Ltd; Robert Hill , Web Services Manager, Oxfor...
 
MEEC Baltimore SharePoint 2010 presentation
MEEC Baltimore SharePoint 2010 presentationMEEC Baltimore SharePoint 2010 presentation
MEEC Baltimore SharePoint 2010 presentation
 
SharePoint My Sites: Aligning Business Needs, Corporate Culture & SharePoint ...
SharePoint My Sites: Aligning Business Needs, Corporate Culture & SharePoint ...SharePoint My Sites: Aligning Business Needs, Corporate Culture & SharePoint ...
SharePoint My Sites: Aligning Business Needs, Corporate Culture & SharePoint ...
 
Codestrong 2012 breakout session the role of cloud services in your next ge...
Codestrong 2012 breakout session   the role of cloud services in your next ge...Codestrong 2012 breakout session   the role of cloud services in your next ge...
Codestrong 2012 breakout session the role of cloud services in your next ge...
 
U of A Web Strategy and Sitecore
U of A Web Strategy and SitecoreU of A Web Strategy and Sitecore
U of A Web Strategy and Sitecore
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
IDC Bari-12print
IDC Bari-12printIDC Bari-12print
IDC Bari-12print
 
Semantic Web For Dummies
Semantic Web For DummiesSemantic Web For Dummies
Semantic Web For Dummies
 
SaaS - Taking a Closer Look
SaaS - Taking a Closer LookSaaS - Taking a Closer Look
SaaS - Taking a Closer Look
 
Time to Fly - Why Predictive Analytics is Going Mainstream
Time to Fly - Why Predictive Analytics is Going MainstreamTime to Fly - Why Predictive Analytics is Going Mainstream
Time to Fly - Why Predictive Analytics is Going Mainstream
 
Nosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use CasesNosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use Cases
 

More from tdwiindia

TDWI Speaker profiles
TDWI Speaker profilesTDWI Speaker profiles
TDWI Speaker profilestdwiindia
 
TDWI Inda BI on Cloud Future State Vision
TDWI Inda BI on Cloud Future State VisionTDWI Inda BI on Cloud Future State Vision
TDWI Inda BI on Cloud Future State Visiontdwiindia
 
Tdwi event summary
Tdwi event summaryTdwi event summary
Tdwi event summarytdwiindia
 
Intel IT Cloud Strategy
Intel IT Cloud StrategyIntel IT Cloud Strategy
Intel IT Cloud Strategytdwiindia
 
BI on Cloud - Perspective from SAP
BI on Cloud - Perspective from SAPBI on Cloud - Perspective from SAP
BI on Cloud - Perspective from SAPtdwiindia
 
Data Warehousing Infrastructure on Cloud
Data Warehousing Infrastructure on CloudData Warehousing Infrastructure on Cloud
Data Warehousing Infrastructure on Cloudtdwiindia
 
Big data appliances for BI on Cloud
Big data appliances for BI on CloudBig data appliances for BI on Cloud
Big data appliances for BI on Cloudtdwiindia
 
What is BI on Cloud
What is BI on CloudWhat is BI on Cloud
What is BI on Cloudtdwiindia
 
Business Intelligence on Cloud: A Business Perspective
Business Intelligence on Cloud: A Business PerspectiveBusiness Intelligence on Cloud: A Business Perspective
Business Intelligence on Cloud: A Business Perspectivetdwiindia
 

More from tdwiindia (9)

TDWI Speaker profiles
TDWI Speaker profilesTDWI Speaker profiles
TDWI Speaker profiles
 
TDWI Inda BI on Cloud Future State Vision
TDWI Inda BI on Cloud Future State VisionTDWI Inda BI on Cloud Future State Vision
TDWI Inda BI on Cloud Future State Vision
 
Tdwi event summary
Tdwi event summaryTdwi event summary
Tdwi event summary
 
Intel IT Cloud Strategy
Intel IT Cloud StrategyIntel IT Cloud Strategy
Intel IT Cloud Strategy
 
BI on Cloud - Perspective from SAP
BI on Cloud - Perspective from SAPBI on Cloud - Perspective from SAP
BI on Cloud - Perspective from SAP
 
Data Warehousing Infrastructure on Cloud
Data Warehousing Infrastructure on CloudData Warehousing Infrastructure on Cloud
Data Warehousing Infrastructure on Cloud
 
Big data appliances for BI on Cloud
Big data appliances for BI on CloudBig data appliances for BI on Cloud
Big data appliances for BI on Cloud
 
What is BI on Cloud
What is BI on CloudWhat is BI on Cloud
What is BI on Cloud
 
Business Intelligence on Cloud: A Business Perspective
Business Intelligence on Cloud: A Business PerspectiveBusiness Intelligence on Cloud: A Business Perspective
Business Intelligence on Cloud: A Business Perspective
 

Recently uploaded

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 

Recently uploaded (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 

BI on Cloud Computing: Leveraging Scalable Infrastructure

  • 1. BI on Cloud Computing Rohit Chatter BI & Data Architect Data & Insights Group Yahoo India R & D, Bangalore
  • 2. Agenda • Cloud Computing – Quick look • Cloud@Yahoo – The Yahoo! way • Case study of BI on Cloud @ Yahoo – Live case • My personal views – sharing experience • Q&A
  • 3. Cloud Computing • What is it? – style of computing where massively scalable IT-related capabilities are provided “as a service” using Internet technologies to multiple external customers – E.g. Amazon EC2/S3, Yahoo, Google • Key Features – Multi-Tenancy, On demand resources, Device & location independence, API based, scalability and others • A Perspective
  • 4. Focus on the business! • Desire - start a new website, iwanttosell.com • Service - provide listings of items for sale, jobs, etc. • Business does well and more features needed – How do we scale for demand? • Store listings as (key, category, description) • Customers quickly ask for keyword search • Add photos to listings • And the business continues to grow and grow!
  • 5. Cloud Computing – a perspective • Copyright University of California at Berkeley
  • 6. Internet Scale Generates BigData • Yahoo is the most Visited Site on the Internet – 600M+ Unique Visitors per Month – Billions of Page Views per Day – Billions of Searches per Month – Billions of Emails per Month – Terabytes of Data per Day! • And we crawl the Web – 100+ Billion Pages – 5+ Trillion Links – Petabytes of data • Reading 100 Terabytes could be overwhelming Std PC – 100Mbps Server – 10Gbps 1000 Std PC ~ 11 days ~ 1 day ~ 15 mins
  • 7. How is Yahoo seeing the space? • Yahoo sees two kinds of cloud services: • Technology – Open Source adoption – Horizontal Cloud Services – Hadoop – Grid • Functionality enabling tenants to build applications or new services on top of – PIG – Programming language the cloud • The focus of CCDI – ZooKeeper -- High-Availability Directory – Functional Cloud Services and Configuration Service • Functionality that is useful in and of itself to tenants. – Oozie – Workflow engine – Yahoo!’s IndexTools; Yahoo! properties aimed at end-users e.g., flickr, Groups, Mail, News, Shopping • Could be build on top of horizontal cloud services or from scratch
  • 8. BI on Cloud – Case study • Motivation • Report & Data requirements unknown • Evolving needs • Large data processing on demand • Web based access • Architecture • Functional View • What is computed where? • Few screenshots • Benefits
  • 9. BI on Cloud – Architecture Functional View What is computed where Apache Web Server Load balanced web PHP Microstrategy/Hom Derived Metrics – CTR, Depth, App Server – BI layer RPM, Coverage e Grown Oracle RDBMS Aggregates & Rollups, Type 2 Dimension, Metadata layer Alerts & Messaging BI Aggregates (H,D,W,M) Metrics Utility Computing Hadoop Grid + PIG Impressions, Revenue, Clicks, Build Aggregates Conversions, Quality Score, Cloud Top keywords Data Source Data – 100+ Gigabytes/Day Dimension & Fact
  • 10. BI on Cloud – Screenshot
  • 11. In My Opinion • Is Cloud ready for DW & BI? • Pros & Cons of BI on Cloud • Options looked at: – Custom solution & Hive – Microstrategy & Hive – Pentaho
  • 12. ‘Determine that things can and shall be done, and then we shall find the way!’ A. Lincoln