SlideShare a Scribd company logo
How Salesforce.com uses Hadoop


  Narayan Bharadwaj
  Data Science
      @nadubharadwaj

  Jed Crosby
  Data Science
      @JedCrosby

  #forcewebinar
                   Follow us @forcedotcom
Safe Harbor
  Safe harbor statement under the Private Securities Litigation Reform Act of 1995:

  This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such
  uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ
  materially from the results expressed or implied by the forward-looking statements we make. All statements other than
  statements of historical fact could be deemed forward-looking, including any projections of product or service availability,
  subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of
  management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or
  technology developments and customer contracts or use of our services.

  The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and
  delivering new functionality for our service, new products and services, our new business model, our past operating losses,
  possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our
  security measures, the outcome of any litigation, risks associated with completed and any possible mergers and
  acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain,
  and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our
  limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further
  information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report
  on Form 10-K for the most recent fiscal year ended January 31, 2011 and in our quarterly report on Form 10-Q for the most
  recent fiscal quarter ended October 31, 2011. These documents and others containing important disclosures are available
  on the SEC Filings section of the Investor Information section of our Web site.

  Any unreleased services or features referenced in this or other presentations, press releases or public statements are not
  currently available and may not be delivered on time or at all. Customers who purchase our services should make the
  purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does
  not intend to update these forward-looking statements.




                                                  Follow us @forcedotcom
Agenda

 §  Hadoop use cases
 §  Use case 1 - Product Metrics*
 §  Technology
 §  Use case 2- Collaborative Filtering*
 §  Q&A




             *Every time you see the elephant, we will attempt to
             explain a Hadoop related concept.


                         Follow us @forcedotcom
Got “Cloud Data”?




              130k customers      780 million transactions/day
              Millions of users   Terabytes/day




                       Follow us @forcedotcom
Hadoop Overview

 §  Started by Doug Cutting at Yahoo!
 §  Based on two Google papers
     –  Google File System (GFS): http://research.google.com/archive/gfs.html
     –  Google MapReduce: http://research.google.com/archive/mapreduce.html


 §  Hadoop is an open source Apache project
     –  Hadoop Distributed File System (HDFS)
     –  Distributed Processing Framework (MapReduce)


 §  Several related projects
     –  HBase, Hive, Pig, Flume, ZooKeeper, Mahout, Oozie, HCatalog




                                    Follow us @forcedotcom
Hadoop use cases


                       User behavior
   Product Metrics                            Capacity planning
                         analysis




      Monitoring        Performance
                                                  Security
     intelligence         analysis




     Ad-hoc log         Collaborative
                                              Search Relevancy
      searches            Filtering



                     Follow us @forcedotcom
Product Metrics
Product Metrics – Problem Statement



 §  Track feature usage/adoption across 130k+ customers
    –  Eg: Accounts, Contacts, Visualforce, Apex,…


 §  Track standard metrics across all features
    –  Eg: #Requests, #UniqueOrgs, #UniqueUsers,
       AvgResponseTime,…


 §  Track features and metrics across all channels
    –  API, UI, Mobile


 §  Primary audience: Executives, Product Managers

                          Follow us @forcedotcom
Data Pipeline

                                    Collaborate &          Fancy UI
        Feature (What?)
                                        Iterate           (Visualize)




        Feature Metadata                                Daily Summary
        (Instrumentation)                                  (Output)




                                     Crunch it
                                      (How?)




                            Storage & Processing




                               Follow us @forcedotcom
Product Metrics Pipeline

                    User Input                  Collaboration                            Reports,
                  (Page Layout)                   (Chatter)                             Dashboards




                                                                                                        Formula
       Workflow




                                                                                                         Fields
                   Feature Metrics                                                   Trend Metrics
                   (Custom Object)                                                   (Custom Object)




                                     API




                                                                               API
                                             Client Machine

                                               Java Program

                                            Pig script generator




                                                                    Workflow




                                                                                             Log Pull
                                              Hadoop
                                                                                                              Log Files




                                           Follow us @forcedotcom
Feature Metrics (Custom Object)


Id      Feature Name     PM      Instrumentation     Metric1      Metric2     Metric3      Metric4   Status


F0001   Accounts         John    /001                #requests    #UniqOrgs   #UniqUsers   AvgRT     Dev

F0002   Contacts         Nancy   /003                #requests    #UniqOrgs   #UniqUsers   AvgRT     Review

F0003   API              Eric    A                   #requests    #UniqOrgs   #UniqUsers   AvgRT     Deployed



F0004   Visualforce      Roger   V                   #requests    #UniqOrgs   #UniqUsers   AvgRT     Decom



F0005   Apex             Kim     axapx               #requests    #UniqOrgs   #UniqUsers   AvgRT     Deployed

F0006   Custom Objects   Chun    /aXX                #requests    #UniqOrgs   #UniqUsers   AvgRT     Deployed



F0008   Chatter          Jed     chcmd               #requests    #UniqOrgs   #UniqUsers   AvgRT     Deployed

F0009   Reports          Steve   R                   #requests    #UniqOrgs   #UniqUsers   AvgRT     Deployed




                                         Follow us @forcedotcom
Feature Metrics (Custom Object)




                         Follow us @forcedotcom
User Input (Page Layout)
                                                    Formula
                                                    Field




                                                      Workflow
                                                      Rule




                           Follow us @forcedotcom
User Input (Child Custom Object)




                                                  Child
                                                  Objects




                         Follow us @forcedotcom
Apache Pig
Basic Pig script construct

  -- Define UDFs
  DEFINE GFV GetFieldValue(‘/path/to/udf/file’);

  -- Load data
  A = LOAD ‘/path/to/cloud/data/log/files’ USING PigStorage();
  -- Filter data
  B = FILTER A BY GFV(row, ‘logRecordType’) == ‘U’;

  -- Extract Fields
  C = FOREACH B GENERATE GFV(*, ‘orgId’), LFV(*. ‘userId’) ……..
  -- Group

  G = GROUP C BY ……
  -- Compute output metrics
  O = FOREACH G {
                          orgs = C.orgId; uniqueOrgs = DISTINCT orgs;

                      }
  -- Store or Dump results
  STORE O INTO ‘/path/to/user/output’;



                                              Follow us @forcedotcom
Java Pig Script Generator (Client)




                          Follow us @forcedotcom
Trend Metrics (Custom Object)



                                  #Unique          #Unique   Avg
Id     Date         #Requests
                                  Orgs             Users     ResponseTime

 F0001 06/01/2012     <big>            <big>         <big>      <little>

 F0002 06/01/2012     <big>            <big>         <big>      <little>

 F0003 06/01/2012     <big>            <big>         <big>      <little>

 F0001 06/02/2012     <big>            <big>         <big>      <little>

 F0002 06/02/2012     <big>            <big>         <big>      <little>

 F0003 06/03/2012     <big>            <big>         <big>      <little>




                          Follow us @forcedotcom
Upload to Trend Metrics (Custom Object)




                         Follow us @forcedotcom
Visualization (Reports & Dashboards)




                         Follow us @forcedotcom
Visualization (Reports & Dashboards)




                         Follow us @forcedotcom
Collaborate, Iterate (Chatter)




                           Follow us @forcedotcom
Recap

                     User Input                  Collaboration                            Reports,
                   (Page Layout)                   (Chatter)                             Dashboards




                                                                                                         Formula
        Workflow




                                                                                                          Fields
                    Feature Metrics                                                   Trend Metrics
                    (Custom Object)                                                   (Custom Object)




                                      API




                                                                                API
                                              Client Machine

                                                Java Program

                                             Pig script generator




                                                                     Workflow




                                                                                              Log Pull
                                               Hadoop
                                                                                                               Log Files




                                            Follow us @forcedotcom
Technology
Hadoop ecosystem




      Apache Hadoop
      Version=0.20.2




                       Follow us @forcedotcom
Contributions

     @pRaShAnT1784 : Prashant Kommireddi




    Lars Hofhansl                         @thefutureian : Ian Varley




                        Follow us @forcedotcom
Data Science tools ecosystem




       Apache Pig
       Version=0.9.1




                       Follow us @forcedotcom
Collaborative Filtering
Collaborative Filtering – Problem Statement




 §  Show similar files within an organization
    –  Content-based approach
    –  Community-base approach




                         Follow us @forcedotcom
Popular File




               Follow us @forcedotcom
Related File




               Follow us @forcedotcom
We found this relationship using item-to-item collaborative
filtering




 §  Amazon published this algorithm in 2003.
    –  Amazon.com Recommendations: Item-to-Item Collaborative Filtering,
       by Gregory Linden, Brent Smith, and Jeremy York. IEEE Internet
       Computing, January-February 2003.

 §  At Salesforce, we adapted this algorithm for Hadoop,
     and we use it to recommend files to view and users to
     follow.




                            Follow us @forcedotcom
Example: CF on 5 files

                                                         Vision Statement
                Annual Report




Dilbert Comic

                                                                Darth Vader Cartoon




                                Disk Usage Report




                                Follow us @forcedotcom
View History Table




              Annual   Vision           Dilbert       Darth     Disk
              Report   Statement        Cartoon       Vader     Usage
                                                      Cartoon   Report
 Miranda          1         1                     1       0         0
 (CEO)
 Bob (CFO)        1         1                     1       0         0
 Susan            0         1                     1       1         0
 (Sales)
 Chun             0         0                     1       1         0
 (Sales)
 Alice (IT)       0         0                     1       1         1




                         Follow us @forcedotcom
Relationships between the files




                   Annual Report                      Vision Statement




                                                                         Darth Vader
                                                                         Cartoon
         Dilbert
         Cartoon




                                        Disk Usage
                                        Report



                                   Follow us @forcedotcom
Relationships between the files



                    Annual
                    Report                   2            Vision Statement




                                                     0              1
                                      3
                    2


                                                         0                   Darth Vader
                                 0                                           Cartoon
          Dilbert
          Cartoon                             3



                                                              1
                             1



                                           Disk Usage
                                           Report



                                     Follow us @forcedotcom
Sorted relationships for each file




Annual                Vision               Dilbert                Darth Vader        Disk Usage
Report                Statement            Cartoon                Cartoon            Report
Dilbert (2)           Dilbert (3)          Vision Stmt. (3)       Dilbert (3)        Dilbert (1)
Vision Stmt. (2)      Annual Rpt. (2)      Darth Vader (3)        Vision Stmt. (1)   Darth Vader (1)


                      Darth Vader (1)      Annual Rpt. (2)        Disk Usage (1)
                                           Disk Usage (1)



              The popularity problem: notice that Dilbert appears first in every list.
              This is probably not what we want.


              The solution: divide the relationship tallies by file popularities.



                                         Follow us @forcedotcom
Normalized relationships between the files



                 Annual Report                                Vision Statement
                                             .82




                                                      0                  .33
                                       .77
                     .63


                                                          0
                                 0                                               Darth Vader
                                                                                 Cartoon
           Dilbert
           Cartoon                             .77




                           .45                                 .58




                                             Disk Usage
                                             Report



                                     Follow us @forcedotcom
Sorted relationships for each file, normalized by file popularities




Annual Report Vision                    Dilbert               Darth Vader       Disk Usage
              Statement                 Cartoon               Cartoon           Report
Vision Stmt.        Annual Report       Darth Vader           Dilbert (.77)     Darth Vader
(.82)               (.82)               (.77)                                   (.58)
Dilbert (.63)       Dilbert (.77)       Vision Stmt.          Disk Usage        Dilbert
                                        (.77)                 (.58)             (.45)
                    Darth Vader         Annual Report         Vision Stmt.
                    (.33)               (.63)                 (.33)
                                        Disk Usage
                                        (.45)




          High relationship tallies AND similar popularity values now drive closeness.



                                     Follow us @forcedotcom
The item-to-item CF algorithm




 1)  Compute file popularities
 2)  Compute relationship tallies and divide by file
     popularities
 3)  Sort and store the results




                         Follow us @forcedotcom
MapReduce Overview
    Map                        Shuffle                       Reduce




      (adapted from http://code.google.com/p/mapreduce-framework/wiki/MapReduce)
                                Follow us @forcedotcom
1. Compute File Popularities



                                       <user, file>


                                                     Inverse identity map



                                    <file, List<user>>


                                                      Reduce



                                    <file, (user count)>


 Result is a table of (file, popularity) pairs that you store in the Hadoop distributed cache.


                                   Follow us @forcedotcom
Example: File popularity for Dilbert




  (Miranda, Dilbert), (Bob, Dilbert), (Susan, Dilbert), (Chun, Dilbert), (Alice, Dilbert)



                                                   Inverse identity map



                     <Dilbert, {Miranda, Bob, Susan, Chun, Alice}>



                                                   Reduce



                                         (Dilbert, 5)




                                     Follow us @forcedotcom
2a. Compute relationship tallies - find all relationships in view
history table



                                <user, file>

                                             Identity map


                             <user, List<file>>

                                             Reduce


                         <(file1, file2), Integer(1)>,
                         <(file1, file3), Integer(1)>,
                         …
                         <(file(n-1), file(n)), Integer(1)>


           Relationships have their file IDs in alphabetical order
           to avoid double counting.
                             Follow us @forcedotcom
Example 2a: Miranda’s (CEO) file relationship votes




     (Miranda, Annual Report), (Miranda, Vision Statement), (Miranda, Dilbert)


                                                Identity map


              <Miranda, {Annual Report, Vision Statement, Dilbert}>

                                                 Reduce


                      <(Annual Report, Dilbert), Integer(1)>,
                      <(Annual Report, Vision Statement), Integer(1)>,
                      <(Dilbert, Vision Statement), Integer(1)>




                                Follow us @forcedotcom
2b. Tally the relationship votes - just a word count, where each
relationship occurrence is a word




                              <(file1, file2), Integer(1)>


                                                   Identity map


                            <(file1, file2), List<Integer(1)>



                                                   Reduce: count and
                                                   divide by popularities


          <file1, (file2, similarity score)>, <file2, (file1, similarity score)>


  Note that we emit each result twice, one for each file that belongs to a
  relationship.
                                   Follow us @forcedotcom
Example 2b: the Dilbert/Darth Vader relationship




                           <(Dilbert, Vader), Integer(1)>,
                           <(Dilbert, Vader), Integer(1)>,
                           <(Dilbert, Vader), Integer(1)>


                                                Identity map


                           <(Dilbert, Vader), {1, 1, 1}>



                                                Reduce: count and
                                                divide by popularities


            <Dilbert, (Vader, sqrt(3/5))>, <Vader, (Dilbert, sqrt(3/5))>




                               Follow us @forcedotcom
3. Sort and store results



                        <file1, (file2, similarity score)>


                                                Identity map



                     <file1, List<(file2, similarity score)>>


                                                Reduce


                          <file1, {top n similar files}>




                  Store the results in your location of choice


                               Follow us @forcedotcom
Example 3: Sorting the results for Dilbert


                               <Dilbert, (Annual Report, .63)>,
                               <Dilbert, (Vision Statement, .77)>,
                               <Dilbert, (Disk Usage, .45)>,
                               <Dilbert, (Darth Vader, .77)>


                                                      Identity map


<Dilbert, {(Annual Report, .63), (Vision Statement, .77), (Disk Usage, .45), (Darth Vader, .77)}>


                                                      Reduce


                  <Dilbert, {Darth Vader, Vision Statement}> (Top 2 files)




                                        Store results
                                     Follow us @forcedotcom
Appendix




§  Cosine formula and normalization trick to avoid the
    distributed cache

                          A• B   A   B
              cosθ AB   =      =   •
                          A B    A   B
§  Mahout has CF
§  Asymptotic order of the algorithm is O(M*N2) in worst
     €
    case, but is helped by sparsity.




                        Follow us @forcedotcom
Summary




          Hadoop                                       Cloud Data




    Hadoop + Force.com =                        Recommendation algorithms




                       Follow us @forcedotcom
@forcedotcom / #forcewebinar


Developer Force Group


facebook.com/forcedotcom


Developer Force – Force.com
Community

   Follow us @forcedotcom
Upcoming Events

§  June 26 – Mobile CodeTalk
   –  http://bit.ly/mct-wr


§  June 27 – Painless Mobile App
    Development
   –  http://bit.ly/mobileapp-hp




                             http://bit.ly/mdc-hp
                               Follow us @forcedotcom
Q&A
                     http://bit.ly/
                    hadoopsurvey

Narayan Bharadwaj    Jed Crosby            Prashant Kommireddi   Santosh Rau
@nadubharadwaj       @JedCrosby            @pRaShAnT1784         @santoshrau

                              @SalesforceEng
                         Follow us @forcedotcom

More Related Content

What's hot

Modelling Software Requirements: Important diagrams and templates (lecture sl...
Modelling Software Requirements: Important diagrams and templates (lecture sl...Modelling Software Requirements: Important diagrams and templates (lecture sl...
Modelling Software Requirements: Important diagrams and templates (lecture sl...
Dagmar Monett
 
Oracle Implementation Project Template
Oracle Implementation Project TemplateOracle Implementation Project Template
Oracle Implementation Project Template
acribe
 
Requirements Management
Requirements ManagementRequirements Management
Requirements Management
Mohamed Mobarak
 
Better Together: Delivering Graph Value with AWS & Neo4j - Antony Prasad The...
Better Together:  Delivering Graph Value with AWS & Neo4j - Antony Prasad The...Better Together:  Delivering Graph Value with AWS & Neo4j - Antony Prasad The...
Better Together: Delivering Graph Value with AWS & Neo4j - Antony Prasad The...
Neo4j
 
Product metrics
Product metricsProduct metrics
Product metrics
Amey Phutane
 
e-Government Interoperability Framework
e-Government Interoperability Frameworke-Government Interoperability Framework
e-Government Interoperability Framework
Adrian Stevenson
 
Reference Architecture-Validated & Tested Approach to Define Network Design
Reference Architecture-Validated & Tested Approach to Define Network DesignReference Architecture-Validated & Tested Approach to Define Network Design
Reference Architecture-Validated & Tested Approach to Define Network Design
DataWorks Summit
 
Software Designing - Software Engineering
Software Designing - Software EngineeringSoftware Designing - Software Engineering
Software Designing - Software Engineering
Purvik Rana
 
MG6088 SOFTWARE PROJECT MANAGEMENT
MG6088 SOFTWARE PROJECT MANAGEMENTMG6088 SOFTWARE PROJECT MANAGEMENT
MG6088 SOFTWARE PROJECT MANAGEMENT
Kathirvel Ayyaswamy
 
NEXT generation enterprise applications
NEXT generation enterprise applicationsNEXT generation enterprise applications
NEXT generation enterprise applications
Dr. Jimmy Schwarzkopf
 
Rapid Application Development Model
Rapid Application Development ModelRapid Application Development Model
Rapid Application Development Model
Damian T. Gordon
 
SOA Service Oriented Architecture
SOA Service Oriented ArchitectureSOA Service Oriented Architecture
SOA Service Oriented Architecture
Vinay Rajadhyaksha
 
API Strategy Introduction
API Strategy IntroductionAPI Strategy Introduction
API Strategy Introduction
Doug Gregory
 
IT Project Management - Study Notes
IT Project Management - Study NotesIT Project Management - Study Notes
IT Project Management - Study Notes
Marius FAILLOT DEVARRE
 
Loc and function point
Loc and function pointLoc and function point
Loc and function point
Mitali Chugh
 
Software Project Management
Software Project ManagementSoftware Project Management
Software Project Management
karthikeyanC40
 
Building Business Platforms Using an API Driven Marketplace
Building Business Platforms Using an  API Driven MarketplaceBuilding Business Platforms Using an  API Driven Marketplace
Building Business Platforms Using an API Driven Marketplace
WSO2
 
Event-Driven Architecture (EDA)
Event-Driven Architecture (EDA)Event-Driven Architecture (EDA)
Event-Driven Architecture (EDA)
WSO2
 
Software Development Plan
Software Development PlanSoftware Development Plan
Software Development Plan
Ronald Dove
 
Bringing API Management to AWS Powered Backends
Bringing API Management to AWS Powered BackendsBringing API Management to AWS Powered Backends
Bringing API Management to AWS Powered Backends
Apigee | Google Cloud
 

What's hot (20)

Modelling Software Requirements: Important diagrams and templates (lecture sl...
Modelling Software Requirements: Important diagrams and templates (lecture sl...Modelling Software Requirements: Important diagrams and templates (lecture sl...
Modelling Software Requirements: Important diagrams and templates (lecture sl...
 
Oracle Implementation Project Template
Oracle Implementation Project TemplateOracle Implementation Project Template
Oracle Implementation Project Template
 
Requirements Management
Requirements ManagementRequirements Management
Requirements Management
 
Better Together: Delivering Graph Value with AWS & Neo4j - Antony Prasad The...
Better Together:  Delivering Graph Value with AWS & Neo4j - Antony Prasad The...Better Together:  Delivering Graph Value with AWS & Neo4j - Antony Prasad The...
Better Together: Delivering Graph Value with AWS & Neo4j - Antony Prasad The...
 
Product metrics
Product metricsProduct metrics
Product metrics
 
e-Government Interoperability Framework
e-Government Interoperability Frameworke-Government Interoperability Framework
e-Government Interoperability Framework
 
Reference Architecture-Validated & Tested Approach to Define Network Design
Reference Architecture-Validated & Tested Approach to Define Network DesignReference Architecture-Validated & Tested Approach to Define Network Design
Reference Architecture-Validated & Tested Approach to Define Network Design
 
Software Designing - Software Engineering
Software Designing - Software EngineeringSoftware Designing - Software Engineering
Software Designing - Software Engineering
 
MG6088 SOFTWARE PROJECT MANAGEMENT
MG6088 SOFTWARE PROJECT MANAGEMENTMG6088 SOFTWARE PROJECT MANAGEMENT
MG6088 SOFTWARE PROJECT MANAGEMENT
 
NEXT generation enterprise applications
NEXT generation enterprise applicationsNEXT generation enterprise applications
NEXT generation enterprise applications
 
Rapid Application Development Model
Rapid Application Development ModelRapid Application Development Model
Rapid Application Development Model
 
SOA Service Oriented Architecture
SOA Service Oriented ArchitectureSOA Service Oriented Architecture
SOA Service Oriented Architecture
 
API Strategy Introduction
API Strategy IntroductionAPI Strategy Introduction
API Strategy Introduction
 
IT Project Management - Study Notes
IT Project Management - Study NotesIT Project Management - Study Notes
IT Project Management - Study Notes
 
Loc and function point
Loc and function pointLoc and function point
Loc and function point
 
Software Project Management
Software Project ManagementSoftware Project Management
Software Project Management
 
Building Business Platforms Using an API Driven Marketplace
Building Business Platforms Using an  API Driven MarketplaceBuilding Business Platforms Using an  API Driven Marketplace
Building Business Platforms Using an API Driven Marketplace
 
Event-Driven Architecture (EDA)
Event-Driven Architecture (EDA)Event-Driven Architecture (EDA)
Event-Driven Architecture (EDA)
 
Software Development Plan
Software Development PlanSoftware Development Plan
Software Development Plan
 
Bringing API Management to AWS Powered Backends
Bringing API Management to AWS Powered BackendsBringing API Management to AWS Powered Backends
Bringing API Management to AWS Powered Backends
 

Viewers also liked

Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Jonathan Seidman
 
APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna Hiremane
IntelAPAC
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
Brock Noland
 
Mobile Shopping
Mobile ShoppingMobile Shopping
Mobile Shopping
Mom Central Consulting
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
David Groozman
 
Spring ’15 Release Preview - Platform Feature Highlights
Spring ’15 Release Preview - Platform Feature HighlightsSpring ’15 Release Preview - Platform Feature Highlights
Spring ’15 Release Preview - Platform Feature Highlights
Salesforce Developers
 
Spark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin KeynoteSpark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin Keynote
Databricks
 
Big Data Project using HIVE - college scorecard
Big Data Project using HIVE - college scorecardBig Data Project using HIVE - college scorecard
Big Data Project using HIVE - college scorecard
Abhishek Gupta
 
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...
ProductCamp Boston
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
Joel Koshy
 
RaffaelloTorraco_CoachTrainer
RaffaelloTorraco_CoachTrainerRaffaelloTorraco_CoachTrainer
RaffaelloTorraco_CoachTrainer
Raffaello Torraco
 
Social Sharing
Social Sharing Social Sharing
Social Sharing
Amit Agarwal
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop
MapR Technologies
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
Gwen (Chen) Shapira
 
Javascript
JavascriptJavascript
Javascript
Nagarajan
 
Case study of online machine learning for display advertising in Yahoo! JAPAN
Case study of online machine learning for display advertising in Yahoo! JAPANCase study of online machine learning for display advertising in Yahoo! JAPAN
Case study of online machine learning for display advertising in Yahoo! JAPAN
DataWorks Summit/Hadoop Summit
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
BMC Software
 
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluation
mattlieber
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
Jiangjie Qin
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
 

Viewers also liked (20)

Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
 
APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna Hiremane
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 
Mobile Shopping
Mobile ShoppingMobile Shopping
Mobile Shopping
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Spring ’15 Release Preview - Platform Feature Highlights
Spring ’15 Release Preview - Platform Feature HighlightsSpring ’15 Release Preview - Platform Feature Highlights
Spring ’15 Release Preview - Platform Feature Highlights
 
Spark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin KeynoteSpark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin Keynote
 
Big Data Project using HIVE - college scorecard
Big Data Project using HIVE - college scorecardBig Data Project using HIVE - college scorecard
Big Data Project using HIVE - college scorecard
 
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
 
RaffaelloTorraco_CoachTrainer
RaffaelloTorraco_CoachTrainerRaffaelloTorraco_CoachTrainer
RaffaelloTorraco_CoachTrainer
 
Social Sharing
Social Sharing Social Sharing
Social Sharing
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Javascript
JavascriptJavascript
Javascript
 
Case study of online machine learning for display advertising in Yahoo! JAPAN
Case study of online machine learning for display advertising in Yahoo! JAPANCase study of online machine learning for display advertising in Yahoo! JAPAN
Case study of online machine learning for display advertising in Yahoo! JAPAN
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
 
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluation
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 

Similar to How Salesforce.com uses Hadoop

How salesforce.com Uses Hadoop Webinar
How salesforce.com Uses Hadoop WebinarHow salesforce.com Uses Hadoop Webinar
How salesforce.com Uses Hadoop Webinar
Salesforce Developers
 
How Salesforce.com Uses Hadoop
How Salesforce.com Uses HadoopHow Salesforce.com Uses Hadoop
How Salesforce.com Uses Hadoop
Salesforce Developers
 
Dreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_CasesDreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_Cases
Narayan Bharadwaj
 
Hadoop + Forcedotcom = Like
Hadoop + Forcedotcom = LikeHadoop + Forcedotcom = Like
Hadoop + Forcedotcom = Like
Narayan Bharadwaj
 
Hadoop Summit San Diego Feb2013
Hadoop Summit San Diego Feb2013Hadoop Summit San Diego Feb2013
Hadoop Summit San Diego Feb2013
Narayan Bharadwaj
 
SWIMing in a Standards Soup
SWIMing in a Standards SoupSWIMing in a Standards Soup
SWIMing in a Standards Soup
Snowflake Software
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
AppDynamics
 
Webinar september 2013
Webinar september 2013Webinar september 2013
Webinar september 2013
Marc Gille
 
JamfNation Roadshow Frankfurt-2019 - Security & Business Intelligence
JamfNation Roadshow Frankfurt-2019 - Security & Business IntelligenceJamfNation Roadshow Frankfurt-2019 - Security & Business Intelligence
JamfNation Roadshow Frankfurt-2019 - Security & Business Intelligence
Henry Stamerjohann
 
Using Web Technologies to Build Native iPhone & Android Applications
Using Web Technologies to Build Native iPhone & Android ApplicationsUsing Web Technologies to Build Native iPhone & Android Applications
Using Web Technologies to Build Native iPhone & Android Applications
Axway Appcelerator
 
Open Source World : Using Web Technologies to build native iPhone and Android...
Open Source World : Using Web Technologies to build native iPhone and Android...Open Source World : Using Web Technologies to build native iPhone and Android...
Open Source World : Using Web Technologies to build native iPhone and Android...
Jeff Haynie
 
Social ent. with java on heroku
Social ent. with java on herokuSocial ent. with java on heroku
Social ent. with java on heroku
Anand B Narasimhan
 
Social Enterprise Java Apps on Heroku Webinar
Social Enterprise Java Apps on Heroku WebinarSocial Enterprise Java Apps on Heroku Webinar
Social Enterprise Java Apps on Heroku Webinar
Salesforce Developers
 
Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April
Triggr In
 
Data Mining with SpagoBI suite
Data Mining with SpagoBI suiteData Mining with SpagoBI suite
Data Mining with SpagoBI suite
SpagoWorld
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web Applications
Graham Dumpleton
 
AI and ML Series - Generative Extraction and Classification of Documents in S...
AI and ML Series - Generative Extraction and Classification of Documents in S...AI and ML Series - Generative Extraction and Classification of Documents in S...
AI and ML Series - Generative Extraction and Classification of Documents in S...
DianaGray10
 
Agados POC Report to Build/Rebuild for ERP PKG
Agados POC Report to Build/Rebuild for ERP PKG Agados POC Report to Build/Rebuild for ERP PKG
Agados POC Report to Build/Rebuild for ERP PKG
Yongkyoo Park
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult Steps
DigitalOcean
 
Introduction To Jira Slide Share
Introduction To Jira Slide ShareIntroduction To Jira Slide Share
Introduction To Jira Slide Share
Renjith V
 

Similar to How Salesforce.com uses Hadoop (20)

How salesforce.com Uses Hadoop Webinar
How salesforce.com Uses Hadoop WebinarHow salesforce.com Uses Hadoop Webinar
How salesforce.com Uses Hadoop Webinar
 
How Salesforce.com Uses Hadoop
How Salesforce.com Uses HadoopHow Salesforce.com Uses Hadoop
How Salesforce.com Uses Hadoop
 
Dreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_CasesDreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_Cases
 
Hadoop + Forcedotcom = Like
Hadoop + Forcedotcom = LikeHadoop + Forcedotcom = Like
Hadoop + Forcedotcom = Like
 
Hadoop Summit San Diego Feb2013
Hadoop Summit San Diego Feb2013Hadoop Summit San Diego Feb2013
Hadoop Summit San Diego Feb2013
 
SWIMing in a Standards Soup
SWIMing in a Standards SoupSWIMing in a Standards Soup
SWIMing in a Standards Soup
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
 
Webinar september 2013
Webinar september 2013Webinar september 2013
Webinar september 2013
 
JamfNation Roadshow Frankfurt-2019 - Security & Business Intelligence
JamfNation Roadshow Frankfurt-2019 - Security & Business IntelligenceJamfNation Roadshow Frankfurt-2019 - Security & Business Intelligence
JamfNation Roadshow Frankfurt-2019 - Security & Business Intelligence
 
Using Web Technologies to Build Native iPhone & Android Applications
Using Web Technologies to Build Native iPhone & Android ApplicationsUsing Web Technologies to Build Native iPhone & Android Applications
Using Web Technologies to Build Native iPhone & Android Applications
 
Open Source World : Using Web Technologies to build native iPhone and Android...
Open Source World : Using Web Technologies to build native iPhone and Android...Open Source World : Using Web Technologies to build native iPhone and Android...
Open Source World : Using Web Technologies to build native iPhone and Android...
 
Social ent. with java on heroku
Social ent. with java on herokuSocial ent. with java on heroku
Social ent. with java on heroku
 
Social Enterprise Java Apps on Heroku Webinar
Social Enterprise Java Apps on Heroku WebinarSocial Enterprise Java Apps on Heroku Webinar
Social Enterprise Java Apps on Heroku Webinar
 
Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April
 
Data Mining with SpagoBI suite
Data Mining with SpagoBI suiteData Mining with SpagoBI suite
Data Mining with SpagoBI suite
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web Applications
 
AI and ML Series - Generative Extraction and Classification of Documents in S...
AI and ML Series - Generative Extraction and Classification of Documents in S...AI and ML Series - Generative Extraction and Classification of Documents in S...
AI and ML Series - Generative Extraction and Classification of Documents in S...
 
Agados POC Report to Build/Rebuild for ERP PKG
Agados POC Report to Build/Rebuild for ERP PKG Agados POC Report to Build/Rebuild for ERP PKG
Agados POC Report to Build/Rebuild for ERP PKG
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult Steps
 
Introduction To Jira Slide Share
Introduction To Jira Slide ShareIntroduction To Jira Slide Share
Introduction To Jira Slide Share
 

Recently uploaded

zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
maazsz111
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 

Recently uploaded (20)

zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 

How Salesforce.com uses Hadoop

  • 1. How Salesforce.com uses Hadoop Narayan Bharadwaj Data Science @nadubharadwaj Jed Crosby Data Science @JedCrosby #forcewebinar Follow us @forcedotcom
  • 2. Safe Harbor Safe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year ended January 31, 2011 and in our quarterly report on Form 10-Q for the most recent fiscal quarter ended October 31, 2011. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements. Follow us @forcedotcom
  • 3. Agenda §  Hadoop use cases §  Use case 1 - Product Metrics* §  Technology §  Use case 2- Collaborative Filtering* §  Q&A *Every time you see the elephant, we will attempt to explain a Hadoop related concept. Follow us @forcedotcom
  • 4. Got “Cloud Data”? 130k customers 780 million transactions/day Millions of users Terabytes/day Follow us @forcedotcom
  • 5. Hadoop Overview §  Started by Doug Cutting at Yahoo! §  Based on two Google papers –  Google File System (GFS): http://research.google.com/archive/gfs.html –  Google MapReduce: http://research.google.com/archive/mapreduce.html §  Hadoop is an open source Apache project –  Hadoop Distributed File System (HDFS) –  Distributed Processing Framework (MapReduce) §  Several related projects –  HBase, Hive, Pig, Flume, ZooKeeper, Mahout, Oozie, HCatalog Follow us @forcedotcom
  • 6. Hadoop use cases User behavior Product Metrics Capacity planning analysis Monitoring Performance Security intelligence analysis Ad-hoc log Collaborative Search Relevancy searches Filtering Follow us @forcedotcom
  • 8. Product Metrics – Problem Statement §  Track feature usage/adoption across 130k+ customers –  Eg: Accounts, Contacts, Visualforce, Apex,… §  Track standard metrics across all features –  Eg: #Requests, #UniqueOrgs, #UniqueUsers, AvgResponseTime,… §  Track features and metrics across all channels –  API, UI, Mobile §  Primary audience: Executives, Product Managers Follow us @forcedotcom
  • 9. Data Pipeline Collaborate & Fancy UI Feature (What?) Iterate (Visualize) Feature Metadata Daily Summary (Instrumentation) (Output) Crunch it (How?) Storage & Processing Follow us @forcedotcom
  • 10. Product Metrics Pipeline User Input Collaboration Reports, (Page Layout) (Chatter) Dashboards Formula Workflow Fields Feature Metrics Trend Metrics (Custom Object) (Custom Object) API API Client Machine Java Program Pig script generator Workflow Log Pull Hadoop Log Files Follow us @forcedotcom
  • 11. Feature Metrics (Custom Object) Id Feature Name PM Instrumentation Metric1 Metric2 Metric3 Metric4 Status F0001 Accounts John /001 #requests #UniqOrgs #UniqUsers AvgRT Dev F0002 Contacts Nancy /003 #requests #UniqOrgs #UniqUsers AvgRT Review F0003 API Eric A #requests #UniqOrgs #UniqUsers AvgRT Deployed F0004 Visualforce Roger V #requests #UniqOrgs #UniqUsers AvgRT Decom F0005 Apex Kim axapx #requests #UniqOrgs #UniqUsers AvgRT Deployed F0006 Custom Objects Chun /aXX #requests #UniqOrgs #UniqUsers AvgRT Deployed F0008 Chatter Jed chcmd #requests #UniqOrgs #UniqUsers AvgRT Deployed F0009 Reports Steve R #requests #UniqOrgs #UniqUsers AvgRT Deployed Follow us @forcedotcom
  • 12. Feature Metrics (Custom Object) Follow us @forcedotcom
  • 13. User Input (Page Layout) Formula Field Workflow Rule Follow us @forcedotcom
  • 14. User Input (Child Custom Object) Child Objects Follow us @forcedotcom
  • 16. Basic Pig script construct -- Define UDFs DEFINE GFV GetFieldValue(‘/path/to/udf/file’); -- Load data A = LOAD ‘/path/to/cloud/data/log/files’ USING PigStorage(); -- Filter data B = FILTER A BY GFV(row, ‘logRecordType’) == ‘U’; -- Extract Fields C = FOREACH B GENERATE GFV(*, ‘orgId’), LFV(*. ‘userId’) …….. -- Group G = GROUP C BY …… -- Compute output metrics O = FOREACH G { orgs = C.orgId; uniqueOrgs = DISTINCT orgs; } -- Store or Dump results STORE O INTO ‘/path/to/user/output’; Follow us @forcedotcom
  • 17. Java Pig Script Generator (Client) Follow us @forcedotcom
  • 18. Trend Metrics (Custom Object) #Unique #Unique Avg Id Date #Requests Orgs Users ResponseTime F0001 06/01/2012 <big> <big> <big> <little> F0002 06/01/2012 <big> <big> <big> <little> F0003 06/01/2012 <big> <big> <big> <little> F0001 06/02/2012 <big> <big> <big> <little> F0002 06/02/2012 <big> <big> <big> <little> F0003 06/03/2012 <big> <big> <big> <little> Follow us @forcedotcom
  • 19. Upload to Trend Metrics (Custom Object) Follow us @forcedotcom
  • 20. Visualization (Reports & Dashboards) Follow us @forcedotcom
  • 21. Visualization (Reports & Dashboards) Follow us @forcedotcom
  • 22. Collaborate, Iterate (Chatter) Follow us @forcedotcom
  • 23. Recap User Input Collaboration Reports, (Page Layout) (Chatter) Dashboards Formula Workflow Fields Feature Metrics Trend Metrics (Custom Object) (Custom Object) API API Client Machine Java Program Pig script generator Workflow Log Pull Hadoop Log Files Follow us @forcedotcom
  • 25. Hadoop ecosystem Apache Hadoop Version=0.20.2 Follow us @forcedotcom
  • 26. Contributions @pRaShAnT1784 : Prashant Kommireddi Lars Hofhansl @thefutureian : Ian Varley Follow us @forcedotcom
  • 27. Data Science tools ecosystem Apache Pig Version=0.9.1 Follow us @forcedotcom
  • 29. Collaborative Filtering – Problem Statement §  Show similar files within an organization –  Content-based approach –  Community-base approach Follow us @forcedotcom
  • 30. Popular File Follow us @forcedotcom
  • 31. Related File Follow us @forcedotcom
  • 32. We found this relationship using item-to-item collaborative filtering §  Amazon published this algorithm in 2003. –  Amazon.com Recommendations: Item-to-Item Collaborative Filtering, by Gregory Linden, Brent Smith, and Jeremy York. IEEE Internet Computing, January-February 2003. §  At Salesforce, we adapted this algorithm for Hadoop, and we use it to recommend files to view and users to follow. Follow us @forcedotcom
  • 33. Example: CF on 5 files Vision Statement Annual Report Dilbert Comic Darth Vader Cartoon Disk Usage Report Follow us @forcedotcom
  • 34. View History Table Annual Vision Dilbert Darth Disk Report Statement Cartoon Vader Usage Cartoon Report Miranda 1 1 1 0 0 (CEO) Bob (CFO) 1 1 1 0 0 Susan 0 1 1 1 0 (Sales) Chun 0 0 1 1 0 (Sales) Alice (IT) 0 0 1 1 1 Follow us @forcedotcom
  • 35. Relationships between the files Annual Report Vision Statement Darth Vader Cartoon Dilbert Cartoon Disk Usage Report Follow us @forcedotcom
  • 36. Relationships between the files Annual Report 2 Vision Statement 0 1 3 2 0 Darth Vader 0 Cartoon Dilbert Cartoon 3 1 1 Disk Usage Report Follow us @forcedotcom
  • 37. Sorted relationships for each file Annual Vision Dilbert Darth Vader Disk Usage Report Statement Cartoon Cartoon Report Dilbert (2) Dilbert (3) Vision Stmt. (3) Dilbert (3) Dilbert (1) Vision Stmt. (2) Annual Rpt. (2) Darth Vader (3) Vision Stmt. (1) Darth Vader (1) Darth Vader (1) Annual Rpt. (2) Disk Usage (1) Disk Usage (1) The popularity problem: notice that Dilbert appears first in every list. This is probably not what we want. The solution: divide the relationship tallies by file popularities. Follow us @forcedotcom
  • 38. Normalized relationships between the files Annual Report Vision Statement .82 0 .33 .77 .63 0 0 Darth Vader Cartoon Dilbert Cartoon .77 .45 .58 Disk Usage Report Follow us @forcedotcom
  • 39. Sorted relationships for each file, normalized by file popularities Annual Report Vision Dilbert Darth Vader Disk Usage Statement Cartoon Cartoon Report Vision Stmt. Annual Report Darth Vader Dilbert (.77) Darth Vader (.82) (.82) (.77) (.58) Dilbert (.63) Dilbert (.77) Vision Stmt. Disk Usage Dilbert (.77) (.58) (.45) Darth Vader Annual Report Vision Stmt. (.33) (.63) (.33) Disk Usage (.45) High relationship tallies AND similar popularity values now drive closeness. Follow us @forcedotcom
  • 40. The item-to-item CF algorithm 1)  Compute file popularities 2)  Compute relationship tallies and divide by file popularities 3)  Sort and store the results Follow us @forcedotcom
  • 41. MapReduce Overview Map Shuffle Reduce (adapted from http://code.google.com/p/mapreduce-framework/wiki/MapReduce) Follow us @forcedotcom
  • 42. 1. Compute File Popularities <user, file> Inverse identity map <file, List<user>> Reduce <file, (user count)> Result is a table of (file, popularity) pairs that you store in the Hadoop distributed cache. Follow us @forcedotcom
  • 43. Example: File popularity for Dilbert (Miranda, Dilbert), (Bob, Dilbert), (Susan, Dilbert), (Chun, Dilbert), (Alice, Dilbert) Inverse identity map <Dilbert, {Miranda, Bob, Susan, Chun, Alice}> Reduce (Dilbert, 5) Follow us @forcedotcom
  • 44. 2a. Compute relationship tallies - find all relationships in view history table <user, file> Identity map <user, List<file>> Reduce <(file1, file2), Integer(1)>, <(file1, file3), Integer(1)>, … <(file(n-1), file(n)), Integer(1)> Relationships have their file IDs in alphabetical order to avoid double counting. Follow us @forcedotcom
  • 45. Example 2a: Miranda’s (CEO) file relationship votes (Miranda, Annual Report), (Miranda, Vision Statement), (Miranda, Dilbert) Identity map <Miranda, {Annual Report, Vision Statement, Dilbert}> Reduce <(Annual Report, Dilbert), Integer(1)>, <(Annual Report, Vision Statement), Integer(1)>, <(Dilbert, Vision Statement), Integer(1)> Follow us @forcedotcom
  • 46. 2b. Tally the relationship votes - just a word count, where each relationship occurrence is a word <(file1, file2), Integer(1)> Identity map <(file1, file2), List<Integer(1)> Reduce: count and divide by popularities <file1, (file2, similarity score)>, <file2, (file1, similarity score)> Note that we emit each result twice, one for each file that belongs to a relationship. Follow us @forcedotcom
  • 47. Example 2b: the Dilbert/Darth Vader relationship <(Dilbert, Vader), Integer(1)>, <(Dilbert, Vader), Integer(1)>, <(Dilbert, Vader), Integer(1)> Identity map <(Dilbert, Vader), {1, 1, 1}> Reduce: count and divide by popularities <Dilbert, (Vader, sqrt(3/5))>, <Vader, (Dilbert, sqrt(3/5))> Follow us @forcedotcom
  • 48. 3. Sort and store results <file1, (file2, similarity score)> Identity map <file1, List<(file2, similarity score)>> Reduce <file1, {top n similar files}> Store the results in your location of choice Follow us @forcedotcom
  • 49. Example 3: Sorting the results for Dilbert <Dilbert, (Annual Report, .63)>, <Dilbert, (Vision Statement, .77)>, <Dilbert, (Disk Usage, .45)>, <Dilbert, (Darth Vader, .77)> Identity map <Dilbert, {(Annual Report, .63), (Vision Statement, .77), (Disk Usage, .45), (Darth Vader, .77)}> Reduce <Dilbert, {Darth Vader, Vision Statement}> (Top 2 files) Store results Follow us @forcedotcom
  • 50. Appendix §  Cosine formula and normalization trick to avoid the distributed cache A• B A B cosθ AB = = • A B A B §  Mahout has CF §  Asymptotic order of the algorithm is O(M*N2) in worst € case, but is helped by sparsity. Follow us @forcedotcom
  • 51. Summary Hadoop Cloud Data Hadoop + Force.com = Recommendation algorithms Follow us @forcedotcom
  • 52. @forcedotcom / #forcewebinar Developer Force Group facebook.com/forcedotcom Developer Force – Force.com Community Follow us @forcedotcom
  • 53. Upcoming Events §  June 26 – Mobile CodeTalk –  http://bit.ly/mct-wr §  June 27 – Painless Mobile App Development –  http://bit.ly/mobileapp-hp http://bit.ly/mdc-hp Follow us @forcedotcom
  • 54. Q&A http://bit.ly/ hadoopsurvey Narayan Bharadwaj Jed Crosby Prashant Kommireddi Santosh Rau @nadubharadwaj @JedCrosby @pRaShAnT1784 @santoshrau @SalesforceEng Follow us @forcedotcom