SlideShare a Scribd company logo
1 of 54
Download to read offline
How Salesforce.com uses Hadoop


  Narayan Bharadwaj
  Data Science
      @nadubharadwaj

  Jed Crosby
  Data Science
      @JedCrosby

  #forcewebinar
                   Follow us @forcedotcom
Safe Harbor
  Safe harbor statement under the Private Securities Litigation Reform Act of 1995:

  This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such
  uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ
  materially from the results expressed or implied by the forward-looking statements we make. All statements other than
  statements of historical fact could be deemed forward-looking, including any projections of product or service availability,
  subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of
  management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or
  technology developments and customer contracts or use of our services.

  The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and
  delivering new functionality for our service, new products and services, our new business model, our past operating losses,
  possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our
  security measures, the outcome of any litigation, risks associated with completed and any possible mergers and
  acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain,
  and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our
  limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further
  information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report
  on Form 10-K for the most recent fiscal year ended January 31, 2011 and in our quarterly report on Form 10-Q for the most
  recent fiscal quarter ended October 31, 2011. These documents and others containing important disclosures are available
  on the SEC Filings section of the Investor Information section of our Web site.

  Any unreleased services or features referenced in this or other presentations, press releases or public statements are not
  currently available and may not be delivered on time or at all. Customers who purchase our services should make the
  purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does
  not intend to update these forward-looking statements.




                                                  Follow us @forcedotcom
Agenda

 §  Hadoop use cases
 §  Use case 1 - Product Metrics*
 §  Technology
 §  Use case 2- Collaborative Filtering*
 §  Q&A




             *Every time you see the elephant, we will attempt to
             explain a Hadoop related concept.


                         Follow us @forcedotcom
Got “Cloud Data”?




              130k customers      780 million transactions/day
              Millions of users   Terabytes/day




                       Follow us @forcedotcom
Hadoop Overview

 §  Started by Doug Cutting at Yahoo!
 §  Based on two Google papers
     –  Google File System (GFS): http://research.google.com/archive/gfs.html
     –  Google MapReduce: http://research.google.com/archive/mapreduce.html


 §  Hadoop is an open source Apache project
     –  Hadoop Distributed File System (HDFS)
     –  Distributed Processing Framework (MapReduce)


 §  Several related projects
     –  HBase, Hive, Pig, Flume, ZooKeeper, Mahout, Oozie, HCatalog




                                    Follow us @forcedotcom
Hadoop use cases


                       User behavior
   Product Metrics                            Capacity planning
                         analysis




      Monitoring        Performance
                                                  Security
     intelligence         analysis




     Ad-hoc log         Collaborative
                                              Search Relevancy
      searches            Filtering



                     Follow us @forcedotcom
Product Metrics
Product Metrics – Problem Statement



 §  Track feature usage/adoption across 130k+ customers
    –  Eg: Accounts, Contacts, Visualforce, Apex,…


 §  Track standard metrics across all features
    –  Eg: #Requests, #UniqueOrgs, #UniqueUsers,
       AvgResponseTime,…


 §  Track features and metrics across all channels
    –  API, UI, Mobile


 §  Primary audience: Executives, Product Managers

                          Follow us @forcedotcom
Data Pipeline

                                    Collaborate &          Fancy UI
        Feature (What?)
                                        Iterate           (Visualize)




        Feature Metadata                                Daily Summary
        (Instrumentation)                                  (Output)




                                     Crunch it
                                      (How?)




                            Storage & Processing




                               Follow us @forcedotcom
Product Metrics Pipeline

                    User Input                  Collaboration                            Reports,
                  (Page Layout)                   (Chatter)                             Dashboards




                                                                                                        Formula
       Workflow




                                                                                                         Fields
                   Feature Metrics                                                   Trend Metrics
                   (Custom Object)                                                   (Custom Object)




                                     API




                                                                               API
                                             Client Machine

                                               Java Program

                                            Pig script generator




                                                                    Workflow




                                                                                             Log Pull
                                              Hadoop
                                                                                                              Log Files




                                           Follow us @forcedotcom
Feature Metrics (Custom Object)


Id      Feature Name     PM      Instrumentation     Metric1      Metric2     Metric3      Metric4   Status


F0001   Accounts         John    /001                #requests    #UniqOrgs   #UniqUsers   AvgRT     Dev

F0002   Contacts         Nancy   /003                #requests    #UniqOrgs   #UniqUsers   AvgRT     Review

F0003   API              Eric    A                   #requests    #UniqOrgs   #UniqUsers   AvgRT     Deployed



F0004   Visualforce      Roger   V                   #requests    #UniqOrgs   #UniqUsers   AvgRT     Decom



F0005   Apex             Kim     axapx               #requests    #UniqOrgs   #UniqUsers   AvgRT     Deployed

F0006   Custom Objects   Chun    /aXX                #requests    #UniqOrgs   #UniqUsers   AvgRT     Deployed



F0008   Chatter          Jed     chcmd               #requests    #UniqOrgs   #UniqUsers   AvgRT     Deployed

F0009   Reports          Steve   R                   #requests    #UniqOrgs   #UniqUsers   AvgRT     Deployed




                                         Follow us @forcedotcom
Feature Metrics (Custom Object)




                         Follow us @forcedotcom
User Input (Page Layout)
                                                    Formula
                                                    Field




                                                      Workflow
                                                      Rule




                           Follow us @forcedotcom
User Input (Child Custom Object)




                                                  Child
                                                  Objects




                         Follow us @forcedotcom
Apache Pig
Basic Pig script construct

  -- Define UDFs
  DEFINE GFV GetFieldValue(‘/path/to/udf/file’);

  -- Load data
  A = LOAD ‘/path/to/cloud/data/log/files’ USING PigStorage();
  -- Filter data
  B = FILTER A BY GFV(row, ‘logRecordType’) == ‘U’;

  -- Extract Fields
  C = FOREACH B GENERATE GFV(*, ‘orgId’), LFV(*. ‘userId’) ……..
  -- Group

  G = GROUP C BY ……
  -- Compute output metrics
  O = FOREACH G {
                          orgs = C.orgId; uniqueOrgs = DISTINCT orgs;

                      }
  -- Store or Dump results
  STORE O INTO ‘/path/to/user/output’;



                                              Follow us @forcedotcom
Java Pig Script Generator (Client)




                          Follow us @forcedotcom
Trend Metrics (Custom Object)



                                  #Unique          #Unique   Avg
Id     Date         #Requests
                                  Orgs             Users     ResponseTime

 F0001 06/01/2012     <big>            <big>         <big>      <little>

 F0002 06/01/2012     <big>            <big>         <big>      <little>

 F0003 06/01/2012     <big>            <big>         <big>      <little>

 F0001 06/02/2012     <big>            <big>         <big>      <little>

 F0002 06/02/2012     <big>            <big>         <big>      <little>

 F0003 06/03/2012     <big>            <big>         <big>      <little>




                          Follow us @forcedotcom
Upload to Trend Metrics (Custom Object)




                         Follow us @forcedotcom
Visualization (Reports & Dashboards)




                         Follow us @forcedotcom
Visualization (Reports & Dashboards)




                         Follow us @forcedotcom
Collaborate, Iterate (Chatter)




                           Follow us @forcedotcom
Recap

                     User Input                  Collaboration                            Reports,
                   (Page Layout)                   (Chatter)                             Dashboards




                                                                                                         Formula
        Workflow




                                                                                                          Fields
                    Feature Metrics                                                   Trend Metrics
                    (Custom Object)                                                   (Custom Object)




                                      API




                                                                                API
                                              Client Machine

                                                Java Program

                                             Pig script generator




                                                                     Workflow




                                                                                              Log Pull
                                               Hadoop
                                                                                                               Log Files




                                            Follow us @forcedotcom
Technology
Hadoop ecosystem




      Apache Hadoop
      Version=0.20.2




                       Follow us @forcedotcom
Contributions

     @pRaShAnT1784 : Prashant Kommireddi




    Lars Hofhansl                         @thefutureian : Ian Varley




                        Follow us @forcedotcom
Data Science tools ecosystem




       Apache Pig
       Version=0.9.1




                       Follow us @forcedotcom
Collaborative Filtering
Collaborative Filtering – Problem Statement




 §  Show similar files within an organization
    –  Content-based approach
    –  Community-base approach




                         Follow us @forcedotcom
Popular File




               Follow us @forcedotcom
Related File




               Follow us @forcedotcom
We found this relationship using item-to-item collaborative
filtering




 §  Amazon published this algorithm in 2003.
    –  Amazon.com Recommendations: Item-to-Item Collaborative Filtering,
       by Gregory Linden, Brent Smith, and Jeremy York. IEEE Internet
       Computing, January-February 2003.

 §  At Salesforce, we adapted this algorithm for Hadoop,
     and we use it to recommend files to view and users to
     follow.




                            Follow us @forcedotcom
Example: CF on 5 files

                                                         Vision Statement
                Annual Report




Dilbert Comic

                                                                Darth Vader Cartoon




                                Disk Usage Report




                                Follow us @forcedotcom
View History Table




              Annual   Vision           Dilbert       Darth     Disk
              Report   Statement        Cartoon       Vader     Usage
                                                      Cartoon   Report
 Miranda          1         1                     1       0         0
 (CEO)
 Bob (CFO)        1         1                     1       0         0
 Susan            0         1                     1       1         0
 (Sales)
 Chun             0         0                     1       1         0
 (Sales)
 Alice (IT)       0         0                     1       1         1




                         Follow us @forcedotcom
Relationships between the files




                   Annual Report                      Vision Statement




                                                                         Darth Vader
                                                                         Cartoon
         Dilbert
         Cartoon




                                        Disk Usage
                                        Report



                                   Follow us @forcedotcom
Relationships between the files



                    Annual
                    Report                   2            Vision Statement




                                                     0              1
                                      3
                    2


                                                         0                   Darth Vader
                                 0                                           Cartoon
          Dilbert
          Cartoon                             3



                                                              1
                             1



                                           Disk Usage
                                           Report



                                     Follow us @forcedotcom
Sorted relationships for each file




Annual                Vision               Dilbert                Darth Vader        Disk Usage
Report                Statement            Cartoon                Cartoon            Report
Dilbert (2)           Dilbert (3)          Vision Stmt. (3)       Dilbert (3)        Dilbert (1)
Vision Stmt. (2)      Annual Rpt. (2)      Darth Vader (3)        Vision Stmt. (1)   Darth Vader (1)


                      Darth Vader (1)      Annual Rpt. (2)        Disk Usage (1)
                                           Disk Usage (1)



              The popularity problem: notice that Dilbert appears first in every list.
              This is probably not what we want.


              The solution: divide the relationship tallies by file popularities.



                                         Follow us @forcedotcom
Normalized relationships between the files



                 Annual Report                                Vision Statement
                                             .82




                                                      0                  .33
                                       .77
                     .63


                                                          0
                                 0                                               Darth Vader
                                                                                 Cartoon
           Dilbert
           Cartoon                             .77




                           .45                                 .58




                                             Disk Usage
                                             Report



                                     Follow us @forcedotcom
Sorted relationships for each file, normalized by file popularities




Annual Report Vision                    Dilbert               Darth Vader       Disk Usage
              Statement                 Cartoon               Cartoon           Report
Vision Stmt.        Annual Report       Darth Vader           Dilbert (.77)     Darth Vader
(.82)               (.82)               (.77)                                   (.58)
Dilbert (.63)       Dilbert (.77)       Vision Stmt.          Disk Usage        Dilbert
                                        (.77)                 (.58)             (.45)
                    Darth Vader         Annual Report         Vision Stmt.
                    (.33)               (.63)                 (.33)
                                        Disk Usage
                                        (.45)




          High relationship tallies AND similar popularity values now drive closeness.



                                     Follow us @forcedotcom
The item-to-item CF algorithm




 1)  Compute file popularities
 2)  Compute relationship tallies and divide by file
     popularities
 3)  Sort and store the results




                         Follow us @forcedotcom
MapReduce Overview
    Map                        Shuffle                       Reduce




      (adapted from http://code.google.com/p/mapreduce-framework/wiki/MapReduce)
                                Follow us @forcedotcom
1. Compute File Popularities



                                       <user, file>


                                                     Inverse identity map



                                    <file, List<user>>


                                                      Reduce



                                    <file, (user count)>


 Result is a table of (file, popularity) pairs that you store in the Hadoop distributed cache.


                                   Follow us @forcedotcom
Example: File popularity for Dilbert




  (Miranda, Dilbert), (Bob, Dilbert), (Susan, Dilbert), (Chun, Dilbert), (Alice, Dilbert)



                                                   Inverse identity map



                     <Dilbert, {Miranda, Bob, Susan, Chun, Alice}>



                                                   Reduce



                                         (Dilbert, 5)




                                     Follow us @forcedotcom
2a. Compute relationship tallies - find all relationships in view
history table



                                <user, file>

                                             Identity map


                             <user, List<file>>

                                             Reduce


                         <(file1, file2), Integer(1)>,
                         <(file1, file3), Integer(1)>,
                         …
                         <(file(n-1), file(n)), Integer(1)>


           Relationships have their file IDs in alphabetical order
           to avoid double counting.
                             Follow us @forcedotcom
Example 2a: Miranda’s (CEO) file relationship votes




     (Miranda, Annual Report), (Miranda, Vision Statement), (Miranda, Dilbert)


                                                Identity map


              <Miranda, {Annual Report, Vision Statement, Dilbert}>

                                                 Reduce


                      <(Annual Report, Dilbert), Integer(1)>,
                      <(Annual Report, Vision Statement), Integer(1)>,
                      <(Dilbert, Vision Statement), Integer(1)>




                                Follow us @forcedotcom
2b. Tally the relationship votes - just a word count, where each
relationship occurrence is a word




                              <(file1, file2), Integer(1)>


                                                   Identity map


                            <(file1, file2), List<Integer(1)>



                                                   Reduce: count and
                                                   divide by popularities


          <file1, (file2, similarity score)>, <file2, (file1, similarity score)>


  Note that we emit each result twice, one for each file that belongs to a
  relationship.
                                   Follow us @forcedotcom
Example 2b: the Dilbert/Darth Vader relationship




                           <(Dilbert, Vader), Integer(1)>,
                           <(Dilbert, Vader), Integer(1)>,
                           <(Dilbert, Vader), Integer(1)>


                                                Identity map


                           <(Dilbert, Vader), {1, 1, 1}>



                                                Reduce: count and
                                                divide by popularities


            <Dilbert, (Vader, sqrt(3/5))>, <Vader, (Dilbert, sqrt(3/5))>




                               Follow us @forcedotcom
3. Sort and store results



                        <file1, (file2, similarity score)>


                                                Identity map



                     <file1, List<(file2, similarity score)>>


                                                Reduce


                          <file1, {top n similar files}>




                  Store the results in your location of choice


                               Follow us @forcedotcom
Example 3: Sorting the results for Dilbert


                               <Dilbert, (Annual Report, .63)>,
                               <Dilbert, (Vision Statement, .77)>,
                               <Dilbert, (Disk Usage, .45)>,
                               <Dilbert, (Darth Vader, .77)>


                                                      Identity map


<Dilbert, {(Annual Report, .63), (Vision Statement, .77), (Disk Usage, .45), (Darth Vader, .77)}>


                                                      Reduce


                  <Dilbert, {Darth Vader, Vision Statement}> (Top 2 files)




                                        Store results
                                     Follow us @forcedotcom
Appendix




§  Cosine formula and normalization trick to avoid the
    distributed cache

                          A• B   A   B
              cosθ AB   =      =   •
                          A B    A   B
§  Mahout has CF
§  Asymptotic order of the algorithm is O(M*N2) in worst
     €
    case, but is helped by sparsity.




                        Follow us @forcedotcom
Summary




          Hadoop                                       Cloud Data




    Hadoop + Force.com =                        Recommendation algorithms




                       Follow us @forcedotcom
@forcedotcom / #forcewebinar


Developer Force Group


facebook.com/forcedotcom


Developer Force – Force.com
Community

   Follow us @forcedotcom
Upcoming Events

§  June 26 – Mobile CodeTalk
   –  http://bit.ly/mct-wr


§  June 27 – Painless Mobile App
    Development
   –  http://bit.ly/mobileapp-hp




                             http://bit.ly/mdc-hp
                               Follow us @forcedotcom
Q&A
                     http://bit.ly/
                    hadoopsurvey

Narayan Bharadwaj    Jed Crosby            Prashant Kommireddi   Santosh Rau
@nadubharadwaj       @JedCrosby            @pRaShAnT1784         @santoshrau

                              @SalesforceEng
                         Follow us @forcedotcom

More Related Content

What's hot

20191126 AWS Black Belt Online Seminar Amazon AppStream 2.0
20191126 AWS Black Belt Online Seminar Amazon AppStream 2.020191126 AWS Black Belt Online Seminar Amazon AppStream 2.0
20191126 AWS Black Belt Online Seminar Amazon AppStream 2.0Amazon Web Services Japan
 
"Kong Summit, Japan 2022" パートナーセッション:Kong on AWS で実現するスケーラブルな API 基盤の構築
"Kong Summit, Japan 2022" パートナーセッション:Kong on AWS で実現するスケーラブルな API 基盤の構築"Kong Summit, Japan 2022" パートナーセッション:Kong on AWS で実現するスケーラブルな API 基盤の構築
"Kong Summit, Japan 2022" パートナーセッション:Kong on AWS で実現するスケーラブルな API 基盤の構築Junji Nishihara
 
Leading Practices in Multi-Pillar Oracle Cloud Implementations
Leading Practices in Multi-Pillar Oracle Cloud ImplementationsLeading Practices in Multi-Pillar Oracle Cloud Implementations
Leading Practices in Multi-Pillar Oracle Cloud ImplementationsAlithya
 
10 features to check out in your subscription management solution
10 features to check out in your subscription management solution10 features to check out in your subscription management solution
10 features to check out in your subscription management solutionTechcello
 
Creating a Context-Aware solution, Complex Event Processing with FIWARE Perseo
Creating a Context-Aware solution, Complex Event Processing with FIWARE PerseoCreating a Context-Aware solution, Complex Event Processing with FIWARE Perseo
Creating a Context-Aware solution, Complex Event Processing with FIWARE PerseoFernando Lopez Aguilar
 
Amazon SageMaker 紹介 & ハンズオン(2018/07/03 実施)
Amazon SageMaker 紹介 & ハンズオン(2018/07/03 実施)Amazon SageMaker 紹介 & ハンズオン(2018/07/03 実施)
Amazon SageMaker 紹介 & ハンズオン(2018/07/03 実施)Amazon Web Services Japan
 
Hosting And Co Location
Hosting And Co LocationHosting And Co Location
Hosting And Co Locationmcini
 
SAP BI Requirements Gathering Process
SAP BI Requirements Gathering ProcessSAP BI Requirements Gathering Process
SAP BI Requirements Gathering Processsilvaft
 
ERP Implementation cycle
ERP Implementation cycleERP Implementation cycle
ERP Implementation cycleMantavya Gajjar
 
AWS Black Belt Techシリーズ Amazon Redshift
AWS Black Belt Techシリーズ  Amazon RedshiftAWS Black Belt Techシリーズ  Amazon Redshift
AWS Black Belt Techシリーズ Amazon RedshiftAmazon Web Services Japan
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query ParsingErik Hatcher
 
オンプレミスRDBMSをAWSへ移行する手法
オンプレミスRDBMSをAWSへ移行する手法オンプレミスRDBMSをAWSへ移行する手法
オンプレミスRDBMSをAWSへ移行する手法Amazon Web Services Japan
 
20211203 AWS Black Belt Online Seminar AWS re:Invent 2021アップデート速報
20211203 AWS Black Belt Online Seminar AWS re:Invent 2021アップデート速報20211203 AWS Black Belt Online Seminar AWS re:Invent 2021アップデート速報
20211203 AWS Black Belt Online Seminar AWS re:Invent 2021アップデート速報Amazon Web Services Japan
 
マイクロサービス時代の認証と認可 - AWS Dev Day Tokyo 2018 #AWSDevDay
マイクロサービス時代の認証と認可 - AWS Dev Day Tokyo 2018 #AWSDevDayマイクロサービス時代の認証と認可 - AWS Dev Day Tokyo 2018 #AWSDevDay
マイクロサービス時代の認証と認可 - AWS Dev Day Tokyo 2018 #AWSDevDay都元ダイスケ Miyamoto
 
[AKIBA.AWS] VGWのルーティング仕様
[AKIBA.AWS] VGWのルーティング仕様[AKIBA.AWS] VGWのルーティング仕様
[AKIBA.AWS] VGWのルーティング仕様Shuji Kikuchi
 

What's hot (20)

20191126 AWS Black Belt Online Seminar Amazon AppStream 2.0
20191126 AWS Black Belt Online Seminar Amazon AppStream 2.020191126 AWS Black Belt Online Seminar Amazon AppStream 2.0
20191126 AWS Black Belt Online Seminar Amazon AppStream 2.0
 
"Kong Summit, Japan 2022" パートナーセッション:Kong on AWS で実現するスケーラブルな API 基盤の構築
"Kong Summit, Japan 2022" パートナーセッション:Kong on AWS で実現するスケーラブルな API 基盤の構築"Kong Summit, Japan 2022" パートナーセッション:Kong on AWS で実現するスケーラブルな API 基盤の構築
"Kong Summit, Japan 2022" パートナーセッション:Kong on AWS で実現するスケーラブルな API 基盤の構築
 
AWS Black Belt - AWS Glue
AWS Black Belt - AWS GlueAWS Black Belt - AWS Glue
AWS Black Belt - AWS Glue
 
Leading Practices in Multi-Pillar Oracle Cloud Implementations
Leading Practices in Multi-Pillar Oracle Cloud ImplementationsLeading Practices in Multi-Pillar Oracle Cloud Implementations
Leading Practices in Multi-Pillar Oracle Cloud Implementations
 
10 features to check out in your subscription management solution
10 features to check out in your subscription management solution10 features to check out in your subscription management solution
10 features to check out in your subscription management solution
 
IBM Maximo Asset Management
IBM Maximo Asset ManagementIBM Maximo Asset Management
IBM Maximo Asset Management
 
Creating a Context-Aware solution, Complex Event Processing with FIWARE Perseo
Creating a Context-Aware solution, Complex Event Processing with FIWARE PerseoCreating a Context-Aware solution, Complex Event Processing with FIWARE Perseo
Creating a Context-Aware solution, Complex Event Processing with FIWARE Perseo
 
Amazon SageMaker 紹介 & ハンズオン(2018/07/03 実施)
Amazon SageMaker 紹介 & ハンズオン(2018/07/03 実施)Amazon SageMaker 紹介 & ハンズオン(2018/07/03 実施)
Amazon SageMaker 紹介 & ハンズオン(2018/07/03 実施)
 
Hosting And Co Location
Hosting And Co LocationHosting And Co Location
Hosting And Co Location
 
SAP BI Requirements Gathering Process
SAP BI Requirements Gathering ProcessSAP BI Requirements Gathering Process
SAP BI Requirements Gathering Process
 
AtoM, Authenticity, and the Chain of Custody
AtoM, Authenticity, and the Chain of CustodyAtoM, Authenticity, and the Chain of Custody
AtoM, Authenticity, and the Chain of Custody
 
ERP Implementation cycle
ERP Implementation cycleERP Implementation cycle
ERP Implementation cycle
 
AWS Black Belt Techシリーズ Amazon Redshift
AWS Black Belt Techシリーズ  Amazon RedshiftAWS Black Belt Techシリーズ  Amazon Redshift
AWS Black Belt Techシリーズ Amazon Redshift
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
Togaf 9 template requirements impact assessment
Togaf 9 template   requirements impact assessmentTogaf 9 template   requirements impact assessment
Togaf 9 template requirements impact assessment
 
oSC22ww4.pdf
oSC22ww4.pdfoSC22ww4.pdf
oSC22ww4.pdf
 
オンプレミスRDBMSをAWSへ移行する手法
オンプレミスRDBMSをAWSへ移行する手法オンプレミスRDBMSをAWSへ移行する手法
オンプレミスRDBMSをAWSへ移行する手法
 
20211203 AWS Black Belt Online Seminar AWS re:Invent 2021アップデート速報
20211203 AWS Black Belt Online Seminar AWS re:Invent 2021アップデート速報20211203 AWS Black Belt Online Seminar AWS re:Invent 2021アップデート速報
20211203 AWS Black Belt Online Seminar AWS re:Invent 2021アップデート速報
 
マイクロサービス時代の認証と認可 - AWS Dev Day Tokyo 2018 #AWSDevDay
マイクロサービス時代の認証と認可 - AWS Dev Day Tokyo 2018 #AWSDevDayマイクロサービス時代の認証と認可 - AWS Dev Day Tokyo 2018 #AWSDevDay
マイクロサービス時代の認証と認可 - AWS Dev Day Tokyo 2018 #AWSDevDay
 
[AKIBA.AWS] VGWのルーティング仕様
[AKIBA.AWS] VGWのルーティング仕様[AKIBA.AWS] VGWのルーティング仕様
[AKIBA.AWS] VGWのルーティング仕様
 

Viewers also liked

Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010Jonathan Seidman
 
APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna HiremaneIntelAPAC
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopBrock Noland
 
Spring ’15 Release Preview - Platform Feature Highlights
Spring ’15 Release Preview - Platform Feature HighlightsSpring ’15 Release Preview - Platform Feature Highlights
Spring ’15 Release Preview - Platform Feature HighlightsSalesforce Developers
 
Spark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin KeynoteSpark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin KeynoteDatabricks
 
Big Data Project using HIVE - college scorecard
Big Data Project using HIVE - college scorecardBig Data Project using HIVE - college scorecard
Big Data Project using HIVE - college scorecardAbhishek Gupta
 
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...ProductCamp Boston
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in KafkaJoel Koshy
 
RaffaelloTorraco_CoachTrainer
RaffaelloTorraco_CoachTrainerRaffaelloTorraco_CoachTrainer
RaffaelloTorraco_CoachTrainerRaffaello Torraco
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop MapR Technologies
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereGwen (Chen) Shapira
 
Case study of online machine learning for display advertising in Yahoo! JAPAN
Case study of online machine learning for display advertising in Yahoo! JAPANCase study of online machine learning for display advertising in Yahoo! JAPAN
Case study of online machine learning for display advertising in Yahoo! JAPANDataWorks Summit/Hadoop Summit
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXBMC Software
 
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationmattlieber
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafkaJiangjie Qin
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersDatabricks
 

Viewers also liked (20)

Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
 
APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna Hiremane
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 
Mobile Shopping
Mobile ShoppingMobile Shopping
Mobile Shopping
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Spring ’15 Release Preview - Platform Feature Highlights
Spring ’15 Release Preview - Platform Feature HighlightsSpring ’15 Release Preview - Platform Feature Highlights
Spring ’15 Release Preview - Platform Feature Highlights
 
Spark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin KeynoteSpark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin Keynote
 
Big Data Project using HIVE - college scorecard
Big Data Project using HIVE - college scorecardBig Data Project using HIVE - college scorecard
Big Data Project using HIVE - college scorecard
 
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
 
RaffaelloTorraco_CoachTrainer
RaffaelloTorraco_CoachTrainerRaffaelloTorraco_CoachTrainer
RaffaelloTorraco_CoachTrainer
 
Social Sharing
Social Sharing Social Sharing
Social Sharing
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Javascript
JavascriptJavascript
Javascript
 
Case study of online machine learning for display advertising in Yahoo! JAPAN
Case study of online machine learning for display advertising in Yahoo! JAPANCase study of online machine learning for display advertising in Yahoo! JAPAN
Case study of online machine learning for display advertising in Yahoo! JAPAN
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
 
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluation
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 

Similar to How Salesforce.com uses Hadoop

How salesforce.com Uses Hadoop Webinar
How salesforce.com Uses Hadoop WebinarHow salesforce.com Uses Hadoop Webinar
How salesforce.com Uses Hadoop WebinarSalesforce Developers
 
Dreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_CasesDreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_CasesNarayan Bharadwaj
 
Hadoop Summit San Diego Feb2013
Hadoop Summit San Diego Feb2013Hadoop Summit San Diego Feb2013
Hadoop Summit San Diego Feb2013Narayan Bharadwaj
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16AppDynamics
 
Webinar september 2013
Webinar september 2013Webinar september 2013
Webinar september 2013Marc Gille
 
JamfNation Roadshow Frankfurt-2019 - Security & Business Intelligence
JamfNation Roadshow Frankfurt-2019 - Security & Business IntelligenceJamfNation Roadshow Frankfurt-2019 - Security & Business Intelligence
JamfNation Roadshow Frankfurt-2019 - Security & Business IntelligenceHenry Stamerjohann
 
Using Web Technologies to Build Native iPhone & Android Applications
Using Web Technologies to Build Native iPhone & Android ApplicationsUsing Web Technologies to Build Native iPhone & Android Applications
Using Web Technologies to Build Native iPhone & Android ApplicationsAxway Appcelerator
 
Open Source World : Using Web Technologies to build native iPhone and Android...
Open Source World : Using Web Technologies to build native iPhone and Android...Open Source World : Using Web Technologies to build native iPhone and Android...
Open Source World : Using Web Technologies to build native iPhone and Android...Jeff Haynie
 
Social ent. with java on heroku
Social ent. with java on herokuSocial ent. with java on heroku
Social ent. with java on herokuAnand B Narasimhan
 
Social Enterprise Java Apps on Heroku Webinar
Social Enterprise Java Apps on Heroku WebinarSocial Enterprise Java Apps on Heroku Webinar
Social Enterprise Java Apps on Heroku WebinarSalesforce Developers
 
Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April Triggr In
 
Data Mining with SpagoBI suite
Data Mining with SpagoBI suiteData Mining with SpagoBI suite
Data Mining with SpagoBI suiteSpagoWorld
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsGraham Dumpleton
 
AI and ML Series - Generative Extraction and Classification of Documents in S...
AI and ML Series - Generative Extraction and Classification of Documents in S...AI and ML Series - Generative Extraction and Classification of Documents in S...
AI and ML Series - Generative Extraction and Classification of Documents in S...DianaGray10
 
Agados POC Report to Build/Rebuild for ERP PKG
Agados POC Report to Build/Rebuild for ERP PKG Agados POC Report to Build/Rebuild for ERP PKG
Agados POC Report to Build/Rebuild for ERP PKG Yongkyoo Park
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsDigitalOcean
 
Introduction To Jira Slide Share
Introduction To Jira Slide ShareIntroduction To Jira Slide Share
Introduction To Jira Slide ShareRenjith V
 

Similar to How Salesforce.com uses Hadoop (20)

How salesforce.com Uses Hadoop Webinar
How salesforce.com Uses Hadoop WebinarHow salesforce.com Uses Hadoop Webinar
How salesforce.com Uses Hadoop Webinar
 
How Salesforce.com Uses Hadoop
How Salesforce.com Uses HadoopHow Salesforce.com Uses Hadoop
How Salesforce.com Uses Hadoop
 
Dreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_CasesDreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_Cases
 
Hadoop + Forcedotcom = Like
Hadoop + Forcedotcom = LikeHadoop + Forcedotcom = Like
Hadoop + Forcedotcom = Like
 
Hadoop Summit San Diego Feb2013
Hadoop Summit San Diego Feb2013Hadoop Summit San Diego Feb2013
Hadoop Summit San Diego Feb2013
 
SWIMing in a Standards Soup
SWIMing in a Standards SoupSWIMing in a Standards Soup
SWIMing in a Standards Soup
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
 
Webinar september 2013
Webinar september 2013Webinar september 2013
Webinar september 2013
 
JamfNation Roadshow Frankfurt-2019 - Security & Business Intelligence
JamfNation Roadshow Frankfurt-2019 - Security & Business IntelligenceJamfNation Roadshow Frankfurt-2019 - Security & Business Intelligence
JamfNation Roadshow Frankfurt-2019 - Security & Business Intelligence
 
Using Web Technologies to Build Native iPhone & Android Applications
Using Web Technologies to Build Native iPhone & Android ApplicationsUsing Web Technologies to Build Native iPhone & Android Applications
Using Web Technologies to Build Native iPhone & Android Applications
 
Open Source World : Using Web Technologies to build native iPhone and Android...
Open Source World : Using Web Technologies to build native iPhone and Android...Open Source World : Using Web Technologies to build native iPhone and Android...
Open Source World : Using Web Technologies to build native iPhone and Android...
 
Social ent. with java on heroku
Social ent. with java on herokuSocial ent. with java on heroku
Social ent. with java on heroku
 
Social Enterprise Java Apps on Heroku Webinar
Social Enterprise Java Apps on Heroku WebinarSocial Enterprise Java Apps on Heroku Webinar
Social Enterprise Java Apps on Heroku Webinar
 
Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April
 
Data Mining with SpagoBI suite
Data Mining with SpagoBI suiteData Mining with SpagoBI suite
Data Mining with SpagoBI suite
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web Applications
 
AI and ML Series - Generative Extraction and Classification of Documents in S...
AI and ML Series - Generative Extraction and Classification of Documents in S...AI and ML Series - Generative Extraction and Classification of Documents in S...
AI and ML Series - Generative Extraction and Classification of Documents in S...
 
Agados POC Report to Build/Rebuild for ERP PKG
Agados POC Report to Build/Rebuild for ERP PKG Agados POC Report to Build/Rebuild for ERP PKG
Agados POC Report to Build/Rebuild for ERP PKG
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult Steps
 
Introduction To Jira Slide Share
Introduction To Jira Slide ShareIntroduction To Jira Slide Share
Introduction To Jira Slide Share
 

Recently uploaded

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 

Recently uploaded (20)

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 

How Salesforce.com uses Hadoop

  • 1. How Salesforce.com uses Hadoop Narayan Bharadwaj Data Science @nadubharadwaj Jed Crosby Data Science @JedCrosby #forcewebinar Follow us @forcedotcom
  • 2. Safe Harbor Safe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year ended January 31, 2011 and in our quarterly report on Form 10-Q for the most recent fiscal quarter ended October 31, 2011. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements. Follow us @forcedotcom
  • 3. Agenda §  Hadoop use cases §  Use case 1 - Product Metrics* §  Technology §  Use case 2- Collaborative Filtering* §  Q&A *Every time you see the elephant, we will attempt to explain a Hadoop related concept. Follow us @forcedotcom
  • 4. Got “Cloud Data”? 130k customers 780 million transactions/day Millions of users Terabytes/day Follow us @forcedotcom
  • 5. Hadoop Overview §  Started by Doug Cutting at Yahoo! §  Based on two Google papers –  Google File System (GFS): http://research.google.com/archive/gfs.html –  Google MapReduce: http://research.google.com/archive/mapreduce.html §  Hadoop is an open source Apache project –  Hadoop Distributed File System (HDFS) –  Distributed Processing Framework (MapReduce) §  Several related projects –  HBase, Hive, Pig, Flume, ZooKeeper, Mahout, Oozie, HCatalog Follow us @forcedotcom
  • 6. Hadoop use cases User behavior Product Metrics Capacity planning analysis Monitoring Performance Security intelligence analysis Ad-hoc log Collaborative Search Relevancy searches Filtering Follow us @forcedotcom
  • 8. Product Metrics – Problem Statement §  Track feature usage/adoption across 130k+ customers –  Eg: Accounts, Contacts, Visualforce, Apex,… §  Track standard metrics across all features –  Eg: #Requests, #UniqueOrgs, #UniqueUsers, AvgResponseTime,… §  Track features and metrics across all channels –  API, UI, Mobile §  Primary audience: Executives, Product Managers Follow us @forcedotcom
  • 9. Data Pipeline Collaborate & Fancy UI Feature (What?) Iterate (Visualize) Feature Metadata Daily Summary (Instrumentation) (Output) Crunch it (How?) Storage & Processing Follow us @forcedotcom
  • 10. Product Metrics Pipeline User Input Collaboration Reports, (Page Layout) (Chatter) Dashboards Formula Workflow Fields Feature Metrics Trend Metrics (Custom Object) (Custom Object) API API Client Machine Java Program Pig script generator Workflow Log Pull Hadoop Log Files Follow us @forcedotcom
  • 11. Feature Metrics (Custom Object) Id Feature Name PM Instrumentation Metric1 Metric2 Metric3 Metric4 Status F0001 Accounts John /001 #requests #UniqOrgs #UniqUsers AvgRT Dev F0002 Contacts Nancy /003 #requests #UniqOrgs #UniqUsers AvgRT Review F0003 API Eric A #requests #UniqOrgs #UniqUsers AvgRT Deployed F0004 Visualforce Roger V #requests #UniqOrgs #UniqUsers AvgRT Decom F0005 Apex Kim axapx #requests #UniqOrgs #UniqUsers AvgRT Deployed F0006 Custom Objects Chun /aXX #requests #UniqOrgs #UniqUsers AvgRT Deployed F0008 Chatter Jed chcmd #requests #UniqOrgs #UniqUsers AvgRT Deployed F0009 Reports Steve R #requests #UniqOrgs #UniqUsers AvgRT Deployed Follow us @forcedotcom
  • 12. Feature Metrics (Custom Object) Follow us @forcedotcom
  • 13. User Input (Page Layout) Formula Field Workflow Rule Follow us @forcedotcom
  • 14. User Input (Child Custom Object) Child Objects Follow us @forcedotcom
  • 16. Basic Pig script construct -- Define UDFs DEFINE GFV GetFieldValue(‘/path/to/udf/file’); -- Load data A = LOAD ‘/path/to/cloud/data/log/files’ USING PigStorage(); -- Filter data B = FILTER A BY GFV(row, ‘logRecordType’) == ‘U’; -- Extract Fields C = FOREACH B GENERATE GFV(*, ‘orgId’), LFV(*. ‘userId’) …….. -- Group G = GROUP C BY …… -- Compute output metrics O = FOREACH G { orgs = C.orgId; uniqueOrgs = DISTINCT orgs; } -- Store or Dump results STORE O INTO ‘/path/to/user/output’; Follow us @forcedotcom
  • 17. Java Pig Script Generator (Client) Follow us @forcedotcom
  • 18. Trend Metrics (Custom Object) #Unique #Unique Avg Id Date #Requests Orgs Users ResponseTime F0001 06/01/2012 <big> <big> <big> <little> F0002 06/01/2012 <big> <big> <big> <little> F0003 06/01/2012 <big> <big> <big> <little> F0001 06/02/2012 <big> <big> <big> <little> F0002 06/02/2012 <big> <big> <big> <little> F0003 06/03/2012 <big> <big> <big> <little> Follow us @forcedotcom
  • 19. Upload to Trend Metrics (Custom Object) Follow us @forcedotcom
  • 20. Visualization (Reports & Dashboards) Follow us @forcedotcom
  • 21. Visualization (Reports & Dashboards) Follow us @forcedotcom
  • 22. Collaborate, Iterate (Chatter) Follow us @forcedotcom
  • 23. Recap User Input Collaboration Reports, (Page Layout) (Chatter) Dashboards Formula Workflow Fields Feature Metrics Trend Metrics (Custom Object) (Custom Object) API API Client Machine Java Program Pig script generator Workflow Log Pull Hadoop Log Files Follow us @forcedotcom
  • 25. Hadoop ecosystem Apache Hadoop Version=0.20.2 Follow us @forcedotcom
  • 26. Contributions @pRaShAnT1784 : Prashant Kommireddi Lars Hofhansl @thefutureian : Ian Varley Follow us @forcedotcom
  • 27. Data Science tools ecosystem Apache Pig Version=0.9.1 Follow us @forcedotcom
  • 29. Collaborative Filtering – Problem Statement §  Show similar files within an organization –  Content-based approach –  Community-base approach Follow us @forcedotcom
  • 30. Popular File Follow us @forcedotcom
  • 31. Related File Follow us @forcedotcom
  • 32. We found this relationship using item-to-item collaborative filtering §  Amazon published this algorithm in 2003. –  Amazon.com Recommendations: Item-to-Item Collaborative Filtering, by Gregory Linden, Brent Smith, and Jeremy York. IEEE Internet Computing, January-February 2003. §  At Salesforce, we adapted this algorithm for Hadoop, and we use it to recommend files to view and users to follow. Follow us @forcedotcom
  • 33. Example: CF on 5 files Vision Statement Annual Report Dilbert Comic Darth Vader Cartoon Disk Usage Report Follow us @forcedotcom
  • 34. View History Table Annual Vision Dilbert Darth Disk Report Statement Cartoon Vader Usage Cartoon Report Miranda 1 1 1 0 0 (CEO) Bob (CFO) 1 1 1 0 0 Susan 0 1 1 1 0 (Sales) Chun 0 0 1 1 0 (Sales) Alice (IT) 0 0 1 1 1 Follow us @forcedotcom
  • 35. Relationships between the files Annual Report Vision Statement Darth Vader Cartoon Dilbert Cartoon Disk Usage Report Follow us @forcedotcom
  • 36. Relationships between the files Annual Report 2 Vision Statement 0 1 3 2 0 Darth Vader 0 Cartoon Dilbert Cartoon 3 1 1 Disk Usage Report Follow us @forcedotcom
  • 37. Sorted relationships for each file Annual Vision Dilbert Darth Vader Disk Usage Report Statement Cartoon Cartoon Report Dilbert (2) Dilbert (3) Vision Stmt. (3) Dilbert (3) Dilbert (1) Vision Stmt. (2) Annual Rpt. (2) Darth Vader (3) Vision Stmt. (1) Darth Vader (1) Darth Vader (1) Annual Rpt. (2) Disk Usage (1) Disk Usage (1) The popularity problem: notice that Dilbert appears first in every list. This is probably not what we want. The solution: divide the relationship tallies by file popularities. Follow us @forcedotcom
  • 38. Normalized relationships between the files Annual Report Vision Statement .82 0 .33 .77 .63 0 0 Darth Vader Cartoon Dilbert Cartoon .77 .45 .58 Disk Usage Report Follow us @forcedotcom
  • 39. Sorted relationships for each file, normalized by file popularities Annual Report Vision Dilbert Darth Vader Disk Usage Statement Cartoon Cartoon Report Vision Stmt. Annual Report Darth Vader Dilbert (.77) Darth Vader (.82) (.82) (.77) (.58) Dilbert (.63) Dilbert (.77) Vision Stmt. Disk Usage Dilbert (.77) (.58) (.45) Darth Vader Annual Report Vision Stmt. (.33) (.63) (.33) Disk Usage (.45) High relationship tallies AND similar popularity values now drive closeness. Follow us @forcedotcom
  • 40. The item-to-item CF algorithm 1)  Compute file popularities 2)  Compute relationship tallies and divide by file popularities 3)  Sort and store the results Follow us @forcedotcom
  • 41. MapReduce Overview Map Shuffle Reduce (adapted from http://code.google.com/p/mapreduce-framework/wiki/MapReduce) Follow us @forcedotcom
  • 42. 1. Compute File Popularities <user, file> Inverse identity map <file, List<user>> Reduce <file, (user count)> Result is a table of (file, popularity) pairs that you store in the Hadoop distributed cache. Follow us @forcedotcom
  • 43. Example: File popularity for Dilbert (Miranda, Dilbert), (Bob, Dilbert), (Susan, Dilbert), (Chun, Dilbert), (Alice, Dilbert) Inverse identity map <Dilbert, {Miranda, Bob, Susan, Chun, Alice}> Reduce (Dilbert, 5) Follow us @forcedotcom
  • 44. 2a. Compute relationship tallies - find all relationships in view history table <user, file> Identity map <user, List<file>> Reduce <(file1, file2), Integer(1)>, <(file1, file3), Integer(1)>, … <(file(n-1), file(n)), Integer(1)> Relationships have their file IDs in alphabetical order to avoid double counting. Follow us @forcedotcom
  • 45. Example 2a: Miranda’s (CEO) file relationship votes (Miranda, Annual Report), (Miranda, Vision Statement), (Miranda, Dilbert) Identity map <Miranda, {Annual Report, Vision Statement, Dilbert}> Reduce <(Annual Report, Dilbert), Integer(1)>, <(Annual Report, Vision Statement), Integer(1)>, <(Dilbert, Vision Statement), Integer(1)> Follow us @forcedotcom
  • 46. 2b. Tally the relationship votes - just a word count, where each relationship occurrence is a word <(file1, file2), Integer(1)> Identity map <(file1, file2), List<Integer(1)> Reduce: count and divide by popularities <file1, (file2, similarity score)>, <file2, (file1, similarity score)> Note that we emit each result twice, one for each file that belongs to a relationship. Follow us @forcedotcom
  • 47. Example 2b: the Dilbert/Darth Vader relationship <(Dilbert, Vader), Integer(1)>, <(Dilbert, Vader), Integer(1)>, <(Dilbert, Vader), Integer(1)> Identity map <(Dilbert, Vader), {1, 1, 1}> Reduce: count and divide by popularities <Dilbert, (Vader, sqrt(3/5))>, <Vader, (Dilbert, sqrt(3/5))> Follow us @forcedotcom
  • 48. 3. Sort and store results <file1, (file2, similarity score)> Identity map <file1, List<(file2, similarity score)>> Reduce <file1, {top n similar files}> Store the results in your location of choice Follow us @forcedotcom
  • 49. Example 3: Sorting the results for Dilbert <Dilbert, (Annual Report, .63)>, <Dilbert, (Vision Statement, .77)>, <Dilbert, (Disk Usage, .45)>, <Dilbert, (Darth Vader, .77)> Identity map <Dilbert, {(Annual Report, .63), (Vision Statement, .77), (Disk Usage, .45), (Darth Vader, .77)}> Reduce <Dilbert, {Darth Vader, Vision Statement}> (Top 2 files) Store results Follow us @forcedotcom
  • 50. Appendix §  Cosine formula and normalization trick to avoid the distributed cache A• B A B cosθ AB = = • A B A B §  Mahout has CF §  Asymptotic order of the algorithm is O(M*N2) in worst € case, but is helped by sparsity. Follow us @forcedotcom
  • 51. Summary Hadoop Cloud Data Hadoop + Force.com = Recommendation algorithms Follow us @forcedotcom
  • 52. @forcedotcom / #forcewebinar Developer Force Group facebook.com/forcedotcom Developer Force – Force.com Community Follow us @forcedotcom
  • 53. Upcoming Events §  June 26 – Mobile CodeTalk –  http://bit.ly/mct-wr §  June 27 – Painless Mobile App Development –  http://bit.ly/mobileapp-hp http://bit.ly/mdc-hp Follow us @forcedotcom
  • 54. Q&A http://bit.ly/ hadoopsurvey Narayan Bharadwaj Jed Crosby Prashant Kommireddi Santosh Rau @nadubharadwaj @JedCrosby @pRaShAnT1784 @santoshrau @SalesforceEng Follow us @forcedotcom