SlideShare a Scribd company logo
1 of 54
Download to read offline
How Salesforce.com uses Hadoop


  Narayan Bharadwaj
  Data Science
      @nadubharadwaj

  Jed Crosby
  Data Science
      @JedCrosby

  #forcewebinar
                   Follow us @forcedotcom
Safe Harbor
  Safe harbor statement under the Private Securities Litigation Reform Act of 1995:

  This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such
  uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ
  materially from the results expressed or implied by the forward-looking statements we make. All statements other than
  statements of historical fact could be deemed forward-looking, including any projections of product or service availability,
  subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of
  management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or
  technology developments and customer contracts or use of our services.

  The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and
  delivering new functionality for our service, new products and services, our new business model, our past operating losses,
  possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our
  security measures, the outcome of any litigation, risks associated with completed and any possible mergers and
  acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain,
  and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our
  limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further
  information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report
  on Form 10-K for the most recent fiscal year ended January 31, 2011 and in our quarterly report on Form 10-Q for the most
  recent fiscal quarter ended October 31, 2011. These documents and others containing important disclosures are available
  on the SEC Filings section of the Investor Information section of our Web site.

  Any unreleased services or features referenced in this or other presentations, press releases or public statements are not
  currently available and may not be delivered on time or at all. Customers who purchase our services should make the
  purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does
  not intend to update these forward-looking statements.




                                                  Follow us @forcedotcom
Agenda

 §  Hadoop use cases
 §  Use case 1 - Product Metrics*
 §  Technology
 §  Use case 2- Collaborative Filtering*
 §  Q&A




             *Every time you see the elephant, we will attempt to
             explain a Hadoop related concept.


                         Follow us @forcedotcom
Got “Cloud Data”?




              130k customers      780 million transactions/day
              Millions of users   Terabytes/day




                       Follow us @forcedotcom
Hadoop Overview

 §  Started by Doug Cutting at Yahoo!
 §  Based on two Google papers
     –  Google File System (GFS): http://research.google.com/archive/gfs.html
     –  Google MapReduce: http://research.google.com/archive/mapreduce.html


 §  Hadoop is an open source Apache project
     –  Hadoop Distributed File System (HDFS)
     –  Distributed Processing Framework (MapReduce)


 §  Several related projects
     –  HBase, Hive, Pig, Flume, ZooKeeper, Mahout, Oozie, HCatalog




                                    Follow us @forcedotcom
Hadoop use cases


                       User behavior
   Product Metrics                            Capacity planning
                         analysis




      Monitoring        Performance
                                                  Security
     intelligence         analysis




     Ad-hoc log         Collaborative
                                              Search Relevancy
      searches            Filtering



                     Follow us @forcedotcom
Product Metrics
Product Metrics – Problem Statement



 §  Track feature usage/adoption across 130k+ customers
    –  Eg: Accounts, Contacts, Visualforce, Apex,…


 §  Track standard metrics across all features
    –  Eg: #Requests, #UniqueOrgs, #UniqueUsers,
       AvgResponseTime,…


 §  Track features and metrics across all channels
    –  API, UI, Mobile


 §  Primary audience: Executives, Product Managers

                          Follow us @forcedotcom
Data Pipeline

                                    Collaborate &          Fancy UI
        Feature (What?)
                                        Iterate           (Visualize)




        Feature Metadata                                Daily Summary
        (Instrumentation)                                  (Output)




                                     Crunch it
                                      (How?)




                            Storage & Processing




                               Follow us @forcedotcom
Product Metrics Pipeline

                    User Input                  Collaboration                            Reports,
                  (Page Layout)                   (Chatter)                             Dashboards




                                                                                                        Formula
       Workflow




                                                                                                         Fields
                   Feature Metrics                                                   Trend Metrics
                   (Custom Object)                                                   (Custom Object)




                                     API




                                                                               API
                                             Client Machine

                                               Java Program

                                            Pig script generator




                                                                    Workflow




                                                                                             Log Pull
                                              Hadoop
                                                                                                              Log Files




                                           Follow us @forcedotcom
Feature Metrics (Custom Object)


Id      Feature Name     PM      Instrumentation     Metric1      Metric2     Metric3      Metric4   Status


F0001   Accounts         John    /001                #requests    #UniqOrgs   #UniqUsers   AvgRT     Dev

F0002   Contacts         Nancy   /003                #requests    #UniqOrgs   #UniqUsers   AvgRT     Review

F0003   API              Eric    A                   #requests    #UniqOrgs   #UniqUsers   AvgRT     Deployed



F0004   Visualforce      Roger   V                   #requests    #UniqOrgs   #UniqUsers   AvgRT     Decom



F0005   Apex             Kim     axapx               #requests    #UniqOrgs   #UniqUsers   AvgRT     Deployed

F0006   Custom Objects   Chun    /aXX                #requests    #UniqOrgs   #UniqUsers   AvgRT     Deployed



F0008   Chatter          Jed     chcmd               #requests    #UniqOrgs   #UniqUsers   AvgRT     Deployed

F0009   Reports          Steve   R                   #requests    #UniqOrgs   #UniqUsers   AvgRT     Deployed




                                         Follow us @forcedotcom
Feature Metrics (Custom Object)




                         Follow us @forcedotcom
User Input (Page Layout)
                                                    Formula
                                                    Field




                                                      Workflow
                                                      Rule




                           Follow us @forcedotcom
User Input (Child Custom Object)




                                                  Child
                                                  Objects




                         Follow us @forcedotcom
Apache Pig
Basic Pig script construct

  -- Define UDFs
  DEFINE GFV GetFieldValue(‘/path/to/udf/file’);

  -- Load data
  A = LOAD ‘/path/to/cloud/data/log/files’ USING PigStorage();
  -- Filter data
  B = FILTER A BY GFV(row, ‘logRecordType’) == ‘U’;

  -- Extract Fields
  C = FOREACH B GENERATE GFV(*, ‘orgId’), LFV(*. ‘userId’) ……..
  -- Group

  G = GROUP C BY ……
  -- Compute output metrics
  O = FOREACH G {
                          orgs = C.orgId; uniqueOrgs = DISTINCT orgs;

                      }
  -- Store or Dump results
  STORE O INTO ‘/path/to/user/output’;



                                              Follow us @forcedotcom
Java Pig Script Generator (Client)




                          Follow us @forcedotcom
Trend Metrics (Custom Object)



                                  #Unique          #Unique   Avg
Id     Date         #Requests
                                  Orgs             Users     ResponseTime

 F0001 06/01/2012     <big>            <big>         <big>      <little>

 F0002 06/01/2012     <big>            <big>         <big>      <little>

 F0003 06/01/2012     <big>            <big>         <big>      <little>

 F0001 06/02/2012     <big>            <big>         <big>      <little>

 F0002 06/02/2012     <big>            <big>         <big>      <little>

 F0003 06/03/2012     <big>            <big>         <big>      <little>




                          Follow us @forcedotcom
Upload to Trend Metrics (Custom Object)




                         Follow us @forcedotcom
Visualization (Reports & Dashboards)




                         Follow us @forcedotcom
Visualization (Reports & Dashboards)




                         Follow us @forcedotcom
Collaborate, Iterate (Chatter)




                           Follow us @forcedotcom
Recap

                     User Input                  Collaboration                            Reports,
                   (Page Layout)                   (Chatter)                             Dashboards




                                                                                                         Formula
        Workflow




                                                                                                          Fields
                    Feature Metrics                                                   Trend Metrics
                    (Custom Object)                                                   (Custom Object)




                                      API




                                                                                API
                                              Client Machine

                                                Java Program

                                             Pig script generator




                                                                     Workflow




                                                                                              Log Pull
                                               Hadoop
                                                                                                               Log Files




                                            Follow us @forcedotcom
Technology
Hadoop ecosystem




      Apache Hadoop
      Version=0.20.2




                       Follow us @forcedotcom
Contributions

     @pRaShAnT1784 : Prashant Kommireddi




    Lars Hofhansl                         @thefutureian : Ian Varley




                        Follow us @forcedotcom
Data Science tools ecosystem




       Apache Pig
       Version=0.9.1




                       Follow us @forcedotcom
Collaborative Filtering
Collaborative Filtering – Problem Statement




 §  Show similar files within an organization
    –  Content-based approach
    –  Community-base approach




                         Follow us @forcedotcom
Popular File




               Follow us @forcedotcom
Related File




               Follow us @forcedotcom
We found this relationship using item-to-item collaborative
filtering




 §  Amazon published this algorithm in 2003.
    –  Amazon.com Recommendations: Item-to-Item Collaborative Filtering,
       by Gregory Linden, Brent Smith, and Jeremy York. IEEE Internet
       Computing, January-February 2003.

 §  At Salesforce, we adapted this algorithm for Hadoop,
     and we use it to recommend files to view and users to
     follow.




                            Follow us @forcedotcom
Example: CF on 5 files

                                                         Vision Statement
                Annual Report




Dilbert Comic

                                                                Darth Vader Cartoon




                                Disk Usage Report




                                Follow us @forcedotcom
View History Table




              Annual   Vision           Dilbert       Darth     Disk
              Report   Statement        Cartoon       Vader     Usage
                                                      Cartoon   Report
 Miranda          1         1                     1       0         0
 (CEO)
 Bob (CFO)        1         1                     1       0         0
 Susan            0         1                     1       1         0
 (Sales)
 Chun             0         0                     1       1         0
 (Sales)
 Alice (IT)       0         0                     1       1         1




                         Follow us @forcedotcom
Relationships between the files




                   Annual Report                      Vision Statement




                                                                         Darth Vader
                                                                         Cartoon
         Dilbert
         Cartoon




                                        Disk Usage
                                        Report



                                   Follow us @forcedotcom
Relationships between the files



                    Annual
                    Report                   2            Vision Statement




                                                     0              1
                                      3
                    2


                                                         0                   Darth Vader
                                 0                                           Cartoon
          Dilbert
          Cartoon                             3



                                                              1
                             1



                                           Disk Usage
                                           Report



                                     Follow us @forcedotcom
Sorted relationships for each file




Annual                Vision               Dilbert                Darth Vader        Disk Usage
Report                Statement            Cartoon                Cartoon            Report
Dilbert (2)           Dilbert (3)          Vision Stmt. (3)       Dilbert (3)        Dilbert (1)
Vision Stmt. (2)      Annual Rpt. (2)      Darth Vader (3)        Vision Stmt. (1)   Darth Vader (1)


                      Darth Vader (1)      Annual Rpt. (2)        Disk Usage (1)
                                           Disk Usage (1)



              The popularity problem: notice that Dilbert appears first in every list.
              This is probably not what we want.


              The solution: divide the relationship tallies by file popularities.



                                         Follow us @forcedotcom
Normalized relationships between the files



                 Annual Report                                Vision Statement
                                             .82




                                                      0                  .33
                                       .77
                     .63


                                                          0
                                 0                                               Darth Vader
                                                                                 Cartoon
           Dilbert
           Cartoon                             .77




                           .45                                 .58




                                             Disk Usage
                                             Report



                                     Follow us @forcedotcom
Sorted relationships for each file, normalized by file popularities




Annual Report Vision                    Dilbert               Darth Vader       Disk Usage
              Statement                 Cartoon               Cartoon           Report
Vision Stmt.        Annual Report       Darth Vader           Dilbert (.77)     Darth Vader
(.82)               (.82)               (.77)                                   (.58)
Dilbert (.63)       Dilbert (.77)       Vision Stmt.          Disk Usage        Dilbert
                                        (.77)                 (.58)             (.45)
                    Darth Vader         Annual Report         Vision Stmt.
                    (.33)               (.63)                 (.33)
                                        Disk Usage
                                        (.45)




          High relationship tallies AND similar popularity values now drive closeness.



                                     Follow us @forcedotcom
The item-to-item CF algorithm




 1)  Compute file popularities
 2)  Compute relationship tallies and divide by file
     popularities
 3)  Sort and store the results




                         Follow us @forcedotcom
MapReduce Overview
    Map                        Shuffle                       Reduce




      (adapted from http://code.google.com/p/mapreduce-framework/wiki/MapReduce)
                                Follow us @forcedotcom
1. Compute File Popularities



                                       <user, file>


                                                     Inverse identity map



                                    <file, List<user>>


                                                      Reduce



                                    <file, (user count)>


 Result is a table of (file, popularity) pairs that you store in the Hadoop distributed cache.


                                   Follow us @forcedotcom
Example: File popularity for Dilbert




  (Miranda, Dilbert), (Bob, Dilbert), (Susan, Dilbert), (Chun, Dilbert), (Alice, Dilbert)



                                                   Inverse identity map



                     <Dilbert, {Miranda, Bob, Susan, Chun, Alice}>



                                                   Reduce



                                         (Dilbert, 5)




                                     Follow us @forcedotcom
2a. Compute relationship tallies - find all relationships in view
history table



                                <user, file>

                                             Identity map


                             <user, List<file>>

                                             Reduce


                         <(file1, file2), Integer(1)>,
                         <(file1, file3), Integer(1)>,
                         …
                         <(file(n-1), file(n)), Integer(1)>


           Relationships have their file IDs in alphabetical order
           to avoid double counting.
                             Follow us @forcedotcom
Example 2a: Miranda’s (CEO) file relationship votes




     (Miranda, Annual Report), (Miranda, Vision Statement), (Miranda, Dilbert)


                                                Identity map


              <Miranda, {Annual Report, Vision Statement, Dilbert}>

                                                 Reduce


                      <(Annual Report, Dilbert), Integer(1)>,
                      <(Annual Report, Vision Statement), Integer(1)>,
                      <(Dilbert, Vision Statement), Integer(1)>




                                Follow us @forcedotcom
2b. Tally the relationship votes - just a word count, where each
relationship occurrence is a word




                              <(file1, file2), Integer(1)>


                                                   Identity map


                            <(file1, file2), List<Integer(1)>



                                                   Reduce: count and
                                                   divide by popularities


          <file1, (file2, similarity score)>, <file2, (file1, similarity score)>


  Note that we emit each result twice, one for each file that belongs to a
  relationship.
                                   Follow us @forcedotcom
Example 2b: the Dilbert/Darth Vader relationship




                           <(Dilbert, Vader), Integer(1)>,
                           <(Dilbert, Vader), Integer(1)>,
                           <(Dilbert, Vader), Integer(1)>


                                                Identity map


                           <(Dilbert, Vader), {1, 1, 1}>



                                                Reduce: count and
                                                divide by popularities


            <Dilbert, (Vader, sqrt(3/5))>, <Vader, (Dilbert, sqrt(3/5))>




                               Follow us @forcedotcom
3. Sort and store results



                        <file1, (file2, similarity score)>


                                                Identity map



                     <file1, List<(file2, similarity score)>>


                                                Reduce


                          <file1, {top n similar files}>




                  Store the results in your location of choice


                               Follow us @forcedotcom
Example 3: Sorting the results for Dilbert


                               <Dilbert, (Annual Report, .63)>,
                               <Dilbert, (Vision Statement, .77)>,
                               <Dilbert, (Disk Usage, .45)>,
                               <Dilbert, (Darth Vader, .77)>


                                                      Identity map


<Dilbert, {(Annual Report, .63), (Vision Statement, .77), (Disk Usage, .45), (Darth Vader, .77)}>


                                                      Reduce


                  <Dilbert, {Darth Vader, Vision Statement}> (Top 2 files)




                                        Store results
                                     Follow us @forcedotcom
Appendix




§  Cosine formula and normalization trick to avoid the
    distributed cache

                          A• B   A   B
              cosθ AB   =      =   •
                          A B    A   B
§  Mahout has CF
§  Asymptotic order of the algorithm is O(M*N2) in worst
     €
    case, but is helped by sparsity.




                        Follow us @forcedotcom
Summary




          Hadoop                                       Cloud Data




    Hadoop + Force.com =                        Recommendation algorithms




                       Follow us @forcedotcom
@forcedotcom / #forcewebinar


Developer Force Group


facebook.com/forcedotcom


Developer Force – Force.com
Community

   Follow us @forcedotcom
Upcoming Events

§  June 26 – Mobile CodeTalk
   –  http://bit.ly/mct-wr


§  June 27 – Painless Mobile App
    Development
   –  http://bit.ly/mobileapp-hp




                             http://bit.ly/mdc-hp
                               Follow us @forcedotcom
Q&A
                     http://bit.ly/
                    hadoopsurvey

Narayan Bharadwaj    Jed Crosby            Prashant Kommireddi   Santosh Rau
@nadubharadwaj       @JedCrosby            @pRaShAnT1784         @santoshrau

                              @SalesforceEng
                         Follow us @forcedotcom

More Related Content

What's hot

Snowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetSnowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetJeno Yamma
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB MongoDB
 
Mastering SharePoint Migration Planning
Mastering SharePoint Migration PlanningMastering SharePoint Migration Planning
Mastering SharePoint Migration PlanningChristian Buckley
 
Web hdfs and httpfs
Web hdfs and httpfsWeb hdfs and httpfs
Web hdfs and httpfswchevreuil
 
Azure data factory
Azure data factoryAzure data factory
Azure data factoryDavid Giard
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations PresentationAdam Doyle
 
Introduction to extracting data from sap s 4 hana with abap cds views
Introduction to extracting data from sap s 4 hana with abap cds viewsIntroduction to extracting data from sap s 4 hana with abap cds views
Introduction to extracting data from sap s 4 hana with abap cds viewsLuc Vanrobays
 
Step by step procedure for loading of data from the flat file to the master d...
Step by step procedure for loading of data from the flat file to the master d...Step by step procedure for loading of data from the flat file to the master d...
Step by step procedure for loading of data from the flat file to the master d...Prashant Tyagi
 
Extending Complex Event Processing to Graph-structured Information
Extending Complex Event Processing to Graph-structured InformationExtending Complex Event Processing to Graph-structured Information
Extending Complex Event Processing to Graph-structured InformationAntonio Vallecillo
 
Best practices for fusion hcm cloud implementation
Best practices for fusion hcm cloud implementationBest practices for fusion hcm cloud implementation
Best practices for fusion hcm cloud implementationFeras Ahmad
 
Oracle Cloud Reference Architecture
Oracle Cloud Reference ArchitectureOracle Cloud Reference Architecture
Oracle Cloud Reference ArchitectureBob Rhubart
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
 
Azure Data Factory
Azure Data FactoryAzure Data Factory
Azure Data FactoryHARIHARAN R
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglyTyler Wishnoff
 

What's hot (20)

Hcm enterprise and_workforce_structures
Hcm enterprise and_workforce_structuresHcm enterprise and_workforce_structures
Hcm enterprise and_workforce_structures
 
Snowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetSnowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat Sheet
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB
 
Mastering SharePoint Migration Planning
Mastering SharePoint Migration PlanningMastering SharePoint Migration Planning
Mastering SharePoint Migration Planning
 
Azure datafactory
Azure datafactoryAzure datafactory
Azure datafactory
 
Web hdfs and httpfs
Web hdfs and httpfsWeb hdfs and httpfs
Web hdfs and httpfs
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
Introduction to extracting data from sap s 4 hana with abap cds views
Introduction to extracting data from sap s 4 hana with abap cds viewsIntroduction to extracting data from sap s 4 hana with abap cds views
Introduction to extracting data from sap s 4 hana with abap cds views
 
Step by step procedure for loading of data from the flat file to the master d...
Step by step procedure for loading of data from the flat file to the master d...Step by step procedure for loading of data from the flat file to the master d...
Step by step procedure for loading of data from the flat file to the master d...
 
Extending Complex Event Processing to Graph-structured Information
Extending Complex Event Processing to Graph-structured InformationExtending Complex Event Processing to Graph-structured Information
Extending Complex Event Processing to Graph-structured Information
 
Best practices for fusion hcm cloud implementation
Best practices for fusion hcm cloud implementationBest practices for fusion hcm cloud implementation
Best practices for fusion hcm cloud implementation
 
Oracle Cloud Reference Architecture
Oracle Cloud Reference ArchitectureOracle Cloud Reference Architecture
Oracle Cloud Reference Architecture
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Azure Data Factory
Azure Data FactoryAzure Data Factory
Azure Data Factory
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
Sap Analytics Cloud
Sap Analytics CloudSap Analytics Cloud
Sap Analytics Cloud
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 

Viewers also liked

Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010Jonathan Seidman
 
APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna HiremaneIntelAPAC
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopBrock Noland
 
Spring ’15 Release Preview - Platform Feature Highlights
Spring ’15 Release Preview - Platform Feature HighlightsSpring ’15 Release Preview - Platform Feature Highlights
Spring ’15 Release Preview - Platform Feature HighlightsSalesforce Developers
 
Spark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin KeynoteSpark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin KeynoteDatabricks
 
Big Data Project using HIVE - college scorecard
Big Data Project using HIVE - college scorecardBig Data Project using HIVE - college scorecard
Big Data Project using HIVE - college scorecardAbhishek Gupta
 
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...ProductCamp Boston
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in KafkaJoel Koshy
 
RaffaelloTorraco_CoachTrainer
RaffaelloTorraco_CoachTrainerRaffaelloTorraco_CoachTrainer
RaffaelloTorraco_CoachTrainerRaffaello Torraco
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop MapR Technologies
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereGwen (Chen) Shapira
 
Case study of online machine learning for display advertising in Yahoo! JAPAN
Case study of online machine learning for display advertising in Yahoo! JAPANCase study of online machine learning for display advertising in Yahoo! JAPAN
Case study of online machine learning for display advertising in Yahoo! JAPANDataWorks Summit/Hadoop Summit
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXBMC Software
 
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationmattlieber
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafkaJiangjie Qin
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersDatabricks
 

Viewers also liked (20)

Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
 
APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna Hiremane
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 
Mobile Shopping
Mobile ShoppingMobile Shopping
Mobile Shopping
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Spring ’15 Release Preview - Platform Feature Highlights
Spring ’15 Release Preview - Platform Feature HighlightsSpring ’15 Release Preview - Platform Feature Highlights
Spring ’15 Release Preview - Platform Feature Highlights
 
Spark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin KeynoteSpark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin Keynote
 
Big Data Project using HIVE - college scorecard
Big Data Project using HIVE - college scorecardBig Data Project using HIVE - college scorecard
Big Data Project using HIVE - college scorecard
 
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
 
RaffaelloTorraco_CoachTrainer
RaffaelloTorraco_CoachTrainerRaffaelloTorraco_CoachTrainer
RaffaelloTorraco_CoachTrainer
 
Social Sharing
Social Sharing Social Sharing
Social Sharing
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Javascript
JavascriptJavascript
Javascript
 
Case study of online machine learning for display advertising in Yahoo! JAPAN
Case study of online machine learning for display advertising in Yahoo! JAPANCase study of online machine learning for display advertising in Yahoo! JAPAN
Case study of online machine learning for display advertising in Yahoo! JAPAN
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
 
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluation
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 

Similar to How Salesforce.com uses Hadoop

How salesforce.com Uses Hadoop Webinar
How salesforce.com Uses Hadoop WebinarHow salesforce.com Uses Hadoop Webinar
How salesforce.com Uses Hadoop WebinarSalesforce Developers
 
Dreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_CasesDreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_CasesNarayan Bharadwaj
 
Hadoop Summit San Diego Feb2013
Hadoop Summit San Diego Feb2013Hadoop Summit San Diego Feb2013
Hadoop Summit San Diego Feb2013Narayan Bharadwaj
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16AppDynamics
 
Webinar september 2013
Webinar september 2013Webinar september 2013
Webinar september 2013Marc Gille
 
JamfNation Roadshow Frankfurt-2019 - Security & Business Intelligence
JamfNation Roadshow Frankfurt-2019 - Security & Business IntelligenceJamfNation Roadshow Frankfurt-2019 - Security & Business Intelligence
JamfNation Roadshow Frankfurt-2019 - Security & Business IntelligenceHenry Stamerjohann
 
Open Source World : Using Web Technologies to build native iPhone and Android...
Open Source World : Using Web Technologies to build native iPhone and Android...Open Source World : Using Web Technologies to build native iPhone and Android...
Open Source World : Using Web Technologies to build native iPhone and Android...Jeff Haynie
 
Using Web Technologies to Build Native iPhone & Android Applications
Using Web Technologies to Build Native iPhone & Android ApplicationsUsing Web Technologies to Build Native iPhone & Android Applications
Using Web Technologies to Build Native iPhone & Android ApplicationsAxway Appcelerator
 
Social ent. with java on heroku
Social ent. with java on herokuSocial ent. with java on heroku
Social ent. with java on herokuAnand B Narasimhan
 
Social Enterprise Java Apps on Heroku Webinar
Social Enterprise Java Apps on Heroku WebinarSocial Enterprise Java Apps on Heroku Webinar
Social Enterprise Java Apps on Heroku WebinarSalesforce Developers
 
Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April Triggr In
 
Data Mining with SpagoBI suite
Data Mining with SpagoBI suiteData Mining with SpagoBI suite
Data Mining with SpagoBI suiteSpagoWorld
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsGraham Dumpleton
 
AI and ML Series - Generative Extraction and Classification of Documents in S...
AI and ML Series - Generative Extraction and Classification of Documents in S...AI and ML Series - Generative Extraction and Classification of Documents in S...
AI and ML Series - Generative Extraction and Classification of Documents in S...DianaGray10
 
Agados POC Report to Build/Rebuild for ERP PKG
Agados POC Report to Build/Rebuild for ERP PKG Agados POC Report to Build/Rebuild for ERP PKG
Agados POC Report to Build/Rebuild for ERP PKG Yongkyoo Park
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsDigitalOcean
 
Introduction To Jira Slide Share
Introduction To Jira Slide ShareIntroduction To Jira Slide Share
Introduction To Jira Slide ShareRenjith V
 
[2011-17-C-4] Heroku & database.com
[2011-17-C-4] Heroku & database.com[2011-17-C-4] Heroku & database.com
[2011-17-C-4] Heroku & database.comMitch Okamoto
 

Similar to How Salesforce.com uses Hadoop (20)

How salesforce.com Uses Hadoop Webinar
How salesforce.com Uses Hadoop WebinarHow salesforce.com Uses Hadoop Webinar
How salesforce.com Uses Hadoop Webinar
 
Dreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_CasesDreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_Cases
 
Hadoop + Forcedotcom = Like
Hadoop + Forcedotcom = LikeHadoop + Forcedotcom = Like
Hadoop + Forcedotcom = Like
 
Hadoop Summit San Diego Feb2013
Hadoop Summit San Diego Feb2013Hadoop Summit San Diego Feb2013
Hadoop Summit San Diego Feb2013
 
SWIMing in a Standards Soup
SWIMing in a Standards SoupSWIMing in a Standards Soup
SWIMing in a Standards Soup
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
 
Webinar september 2013
Webinar september 2013Webinar september 2013
Webinar september 2013
 
JamfNation Roadshow Frankfurt-2019 - Security & Business Intelligence
JamfNation Roadshow Frankfurt-2019 - Security & Business IntelligenceJamfNation Roadshow Frankfurt-2019 - Security & Business Intelligence
JamfNation Roadshow Frankfurt-2019 - Security & Business Intelligence
 
Open Source World : Using Web Technologies to build native iPhone and Android...
Open Source World : Using Web Technologies to build native iPhone and Android...Open Source World : Using Web Technologies to build native iPhone and Android...
Open Source World : Using Web Technologies to build native iPhone and Android...
 
Using Web Technologies to Build Native iPhone & Android Applications
Using Web Technologies to Build Native iPhone & Android ApplicationsUsing Web Technologies to Build Native iPhone & Android Applications
Using Web Technologies to Build Native iPhone & Android Applications
 
Social ent. with java on heroku
Social ent. with java on herokuSocial ent. with java on heroku
Social ent. with java on heroku
 
Social Enterprise Java Apps on Heroku Webinar
Social Enterprise Java Apps on Heroku WebinarSocial Enterprise Java Apps on Heroku Webinar
Social Enterprise Java Apps on Heroku Webinar
 
Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April
 
Data Mining with SpagoBI suite
Data Mining with SpagoBI suiteData Mining with SpagoBI suite
Data Mining with SpagoBI suite
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web Applications
 
AI and ML Series - Generative Extraction and Classification of Documents in S...
AI and ML Series - Generative Extraction and Classification of Documents in S...AI and ML Series - Generative Extraction and Classification of Documents in S...
AI and ML Series - Generative Extraction and Classification of Documents in S...
 
Agados POC Report to Build/Rebuild for ERP PKG
Agados POC Report to Build/Rebuild for ERP PKG Agados POC Report to Build/Rebuild for ERP PKG
Agados POC Report to Build/Rebuild for ERP PKG
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult Steps
 
Introduction To Jira Slide Share
Introduction To Jira Slide ShareIntroduction To Jira Slide Share
Introduction To Jira Slide Share
 
[2011-17-C-4] Heroku & database.com
[2011-17-C-4] Heroku & database.com[2011-17-C-4] Heroku & database.com
[2011-17-C-4] Heroku & database.com
 

Recently uploaded

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Recently uploaded (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

How Salesforce.com uses Hadoop

  • 1. How Salesforce.com uses Hadoop Narayan Bharadwaj Data Science @nadubharadwaj Jed Crosby Data Science @JedCrosby #forcewebinar Follow us @forcedotcom
  • 2. Safe Harbor Safe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year ended January 31, 2011 and in our quarterly report on Form 10-Q for the most recent fiscal quarter ended October 31, 2011. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements. Follow us @forcedotcom
  • 3. Agenda §  Hadoop use cases §  Use case 1 - Product Metrics* §  Technology §  Use case 2- Collaborative Filtering* §  Q&A *Every time you see the elephant, we will attempt to explain a Hadoop related concept. Follow us @forcedotcom
  • 4. Got “Cloud Data”? 130k customers 780 million transactions/day Millions of users Terabytes/day Follow us @forcedotcom
  • 5. Hadoop Overview §  Started by Doug Cutting at Yahoo! §  Based on two Google papers –  Google File System (GFS): http://research.google.com/archive/gfs.html –  Google MapReduce: http://research.google.com/archive/mapreduce.html §  Hadoop is an open source Apache project –  Hadoop Distributed File System (HDFS) –  Distributed Processing Framework (MapReduce) §  Several related projects –  HBase, Hive, Pig, Flume, ZooKeeper, Mahout, Oozie, HCatalog Follow us @forcedotcom
  • 6. Hadoop use cases User behavior Product Metrics Capacity planning analysis Monitoring Performance Security intelligence analysis Ad-hoc log Collaborative Search Relevancy searches Filtering Follow us @forcedotcom
  • 8. Product Metrics – Problem Statement §  Track feature usage/adoption across 130k+ customers –  Eg: Accounts, Contacts, Visualforce, Apex,… §  Track standard metrics across all features –  Eg: #Requests, #UniqueOrgs, #UniqueUsers, AvgResponseTime,… §  Track features and metrics across all channels –  API, UI, Mobile §  Primary audience: Executives, Product Managers Follow us @forcedotcom
  • 9. Data Pipeline Collaborate & Fancy UI Feature (What?) Iterate (Visualize) Feature Metadata Daily Summary (Instrumentation) (Output) Crunch it (How?) Storage & Processing Follow us @forcedotcom
  • 10. Product Metrics Pipeline User Input Collaboration Reports, (Page Layout) (Chatter) Dashboards Formula Workflow Fields Feature Metrics Trend Metrics (Custom Object) (Custom Object) API API Client Machine Java Program Pig script generator Workflow Log Pull Hadoop Log Files Follow us @forcedotcom
  • 11. Feature Metrics (Custom Object) Id Feature Name PM Instrumentation Metric1 Metric2 Metric3 Metric4 Status F0001 Accounts John /001 #requests #UniqOrgs #UniqUsers AvgRT Dev F0002 Contacts Nancy /003 #requests #UniqOrgs #UniqUsers AvgRT Review F0003 API Eric A #requests #UniqOrgs #UniqUsers AvgRT Deployed F0004 Visualforce Roger V #requests #UniqOrgs #UniqUsers AvgRT Decom F0005 Apex Kim axapx #requests #UniqOrgs #UniqUsers AvgRT Deployed F0006 Custom Objects Chun /aXX #requests #UniqOrgs #UniqUsers AvgRT Deployed F0008 Chatter Jed chcmd #requests #UniqOrgs #UniqUsers AvgRT Deployed F0009 Reports Steve R #requests #UniqOrgs #UniqUsers AvgRT Deployed Follow us @forcedotcom
  • 12. Feature Metrics (Custom Object) Follow us @forcedotcom
  • 13. User Input (Page Layout) Formula Field Workflow Rule Follow us @forcedotcom
  • 14. User Input (Child Custom Object) Child Objects Follow us @forcedotcom
  • 16. Basic Pig script construct -- Define UDFs DEFINE GFV GetFieldValue(‘/path/to/udf/file’); -- Load data A = LOAD ‘/path/to/cloud/data/log/files’ USING PigStorage(); -- Filter data B = FILTER A BY GFV(row, ‘logRecordType’) == ‘U’; -- Extract Fields C = FOREACH B GENERATE GFV(*, ‘orgId’), LFV(*. ‘userId’) …….. -- Group G = GROUP C BY …… -- Compute output metrics O = FOREACH G { orgs = C.orgId; uniqueOrgs = DISTINCT orgs; } -- Store or Dump results STORE O INTO ‘/path/to/user/output’; Follow us @forcedotcom
  • 17. Java Pig Script Generator (Client) Follow us @forcedotcom
  • 18. Trend Metrics (Custom Object) #Unique #Unique Avg Id Date #Requests Orgs Users ResponseTime F0001 06/01/2012 <big> <big> <big> <little> F0002 06/01/2012 <big> <big> <big> <little> F0003 06/01/2012 <big> <big> <big> <little> F0001 06/02/2012 <big> <big> <big> <little> F0002 06/02/2012 <big> <big> <big> <little> F0003 06/03/2012 <big> <big> <big> <little> Follow us @forcedotcom
  • 19. Upload to Trend Metrics (Custom Object) Follow us @forcedotcom
  • 20. Visualization (Reports & Dashboards) Follow us @forcedotcom
  • 21. Visualization (Reports & Dashboards) Follow us @forcedotcom
  • 22. Collaborate, Iterate (Chatter) Follow us @forcedotcom
  • 23. Recap User Input Collaboration Reports, (Page Layout) (Chatter) Dashboards Formula Workflow Fields Feature Metrics Trend Metrics (Custom Object) (Custom Object) API API Client Machine Java Program Pig script generator Workflow Log Pull Hadoop Log Files Follow us @forcedotcom
  • 25. Hadoop ecosystem Apache Hadoop Version=0.20.2 Follow us @forcedotcom
  • 26. Contributions @pRaShAnT1784 : Prashant Kommireddi Lars Hofhansl @thefutureian : Ian Varley Follow us @forcedotcom
  • 27. Data Science tools ecosystem Apache Pig Version=0.9.1 Follow us @forcedotcom
  • 29. Collaborative Filtering – Problem Statement §  Show similar files within an organization –  Content-based approach –  Community-base approach Follow us @forcedotcom
  • 30. Popular File Follow us @forcedotcom
  • 31. Related File Follow us @forcedotcom
  • 32. We found this relationship using item-to-item collaborative filtering §  Amazon published this algorithm in 2003. –  Amazon.com Recommendations: Item-to-Item Collaborative Filtering, by Gregory Linden, Brent Smith, and Jeremy York. IEEE Internet Computing, January-February 2003. §  At Salesforce, we adapted this algorithm for Hadoop, and we use it to recommend files to view and users to follow. Follow us @forcedotcom
  • 33. Example: CF on 5 files Vision Statement Annual Report Dilbert Comic Darth Vader Cartoon Disk Usage Report Follow us @forcedotcom
  • 34. View History Table Annual Vision Dilbert Darth Disk Report Statement Cartoon Vader Usage Cartoon Report Miranda 1 1 1 0 0 (CEO) Bob (CFO) 1 1 1 0 0 Susan 0 1 1 1 0 (Sales) Chun 0 0 1 1 0 (Sales) Alice (IT) 0 0 1 1 1 Follow us @forcedotcom
  • 35. Relationships between the files Annual Report Vision Statement Darth Vader Cartoon Dilbert Cartoon Disk Usage Report Follow us @forcedotcom
  • 36. Relationships between the files Annual Report 2 Vision Statement 0 1 3 2 0 Darth Vader 0 Cartoon Dilbert Cartoon 3 1 1 Disk Usage Report Follow us @forcedotcom
  • 37. Sorted relationships for each file Annual Vision Dilbert Darth Vader Disk Usage Report Statement Cartoon Cartoon Report Dilbert (2) Dilbert (3) Vision Stmt. (3) Dilbert (3) Dilbert (1) Vision Stmt. (2) Annual Rpt. (2) Darth Vader (3) Vision Stmt. (1) Darth Vader (1) Darth Vader (1) Annual Rpt. (2) Disk Usage (1) Disk Usage (1) The popularity problem: notice that Dilbert appears first in every list. This is probably not what we want. The solution: divide the relationship tallies by file popularities. Follow us @forcedotcom
  • 38. Normalized relationships between the files Annual Report Vision Statement .82 0 .33 .77 .63 0 0 Darth Vader Cartoon Dilbert Cartoon .77 .45 .58 Disk Usage Report Follow us @forcedotcom
  • 39. Sorted relationships for each file, normalized by file popularities Annual Report Vision Dilbert Darth Vader Disk Usage Statement Cartoon Cartoon Report Vision Stmt. Annual Report Darth Vader Dilbert (.77) Darth Vader (.82) (.82) (.77) (.58) Dilbert (.63) Dilbert (.77) Vision Stmt. Disk Usage Dilbert (.77) (.58) (.45) Darth Vader Annual Report Vision Stmt. (.33) (.63) (.33) Disk Usage (.45) High relationship tallies AND similar popularity values now drive closeness. Follow us @forcedotcom
  • 40. The item-to-item CF algorithm 1)  Compute file popularities 2)  Compute relationship tallies and divide by file popularities 3)  Sort and store the results Follow us @forcedotcom
  • 41. MapReduce Overview Map Shuffle Reduce (adapted from http://code.google.com/p/mapreduce-framework/wiki/MapReduce) Follow us @forcedotcom
  • 42. 1. Compute File Popularities <user, file> Inverse identity map <file, List<user>> Reduce <file, (user count)> Result is a table of (file, popularity) pairs that you store in the Hadoop distributed cache. Follow us @forcedotcom
  • 43. Example: File popularity for Dilbert (Miranda, Dilbert), (Bob, Dilbert), (Susan, Dilbert), (Chun, Dilbert), (Alice, Dilbert) Inverse identity map <Dilbert, {Miranda, Bob, Susan, Chun, Alice}> Reduce (Dilbert, 5) Follow us @forcedotcom
  • 44. 2a. Compute relationship tallies - find all relationships in view history table <user, file> Identity map <user, List<file>> Reduce <(file1, file2), Integer(1)>, <(file1, file3), Integer(1)>, … <(file(n-1), file(n)), Integer(1)> Relationships have their file IDs in alphabetical order to avoid double counting. Follow us @forcedotcom
  • 45. Example 2a: Miranda’s (CEO) file relationship votes (Miranda, Annual Report), (Miranda, Vision Statement), (Miranda, Dilbert) Identity map <Miranda, {Annual Report, Vision Statement, Dilbert}> Reduce <(Annual Report, Dilbert), Integer(1)>, <(Annual Report, Vision Statement), Integer(1)>, <(Dilbert, Vision Statement), Integer(1)> Follow us @forcedotcom
  • 46. 2b. Tally the relationship votes - just a word count, where each relationship occurrence is a word <(file1, file2), Integer(1)> Identity map <(file1, file2), List<Integer(1)> Reduce: count and divide by popularities <file1, (file2, similarity score)>, <file2, (file1, similarity score)> Note that we emit each result twice, one for each file that belongs to a relationship. Follow us @forcedotcom
  • 47. Example 2b: the Dilbert/Darth Vader relationship <(Dilbert, Vader), Integer(1)>, <(Dilbert, Vader), Integer(1)>, <(Dilbert, Vader), Integer(1)> Identity map <(Dilbert, Vader), {1, 1, 1}> Reduce: count and divide by popularities <Dilbert, (Vader, sqrt(3/5))>, <Vader, (Dilbert, sqrt(3/5))> Follow us @forcedotcom
  • 48. 3. Sort and store results <file1, (file2, similarity score)> Identity map <file1, List<(file2, similarity score)>> Reduce <file1, {top n similar files}> Store the results in your location of choice Follow us @forcedotcom
  • 49. Example 3: Sorting the results for Dilbert <Dilbert, (Annual Report, .63)>, <Dilbert, (Vision Statement, .77)>, <Dilbert, (Disk Usage, .45)>, <Dilbert, (Darth Vader, .77)> Identity map <Dilbert, {(Annual Report, .63), (Vision Statement, .77), (Disk Usage, .45), (Darth Vader, .77)}> Reduce <Dilbert, {Darth Vader, Vision Statement}> (Top 2 files) Store results Follow us @forcedotcom
  • 50. Appendix §  Cosine formula and normalization trick to avoid the distributed cache A• B A B cosθ AB = = • A B A B §  Mahout has CF §  Asymptotic order of the algorithm is O(M*N2) in worst € case, but is helped by sparsity. Follow us @forcedotcom
  • 51. Summary Hadoop Cloud Data Hadoop + Force.com = Recommendation algorithms Follow us @forcedotcom
  • 52. @forcedotcom / #forcewebinar Developer Force Group facebook.com/forcedotcom Developer Force – Force.com Community Follow us @forcedotcom
  • 53. Upcoming Events §  June 26 – Mobile CodeTalk –  http://bit.ly/mct-wr §  June 27 – Painless Mobile App Development –  http://bit.ly/mobileapp-hp http://bit.ly/mdc-hp Follow us @forcedotcom
  • 54. Q&A http://bit.ly/ hadoopsurvey Narayan Bharadwaj Jed Crosby Prashant Kommireddi Santosh Rau @nadubharadwaj @JedCrosby @pRaShAnT1784 @santoshrau @SalesforceEng Follow us @forcedotcom