Hadoop + Forcedotcom = Like


Published on

Hadoop is the technology of choice for processing large data sets. Force.com provides a great metadata layer to define Hadoop jobs, and store job output (Custom Objects). Force.com also comes with a great visualization layer (Reports & Dashboards) to chart & trend the output from Hadoop jobs. In this session, we will explore a real life use case that combines these technologies to provide a compelling big data processing framework.

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Hadoop + Forcedotcom = Like

  1. 1. Force.com + Hadoop =Narayan BharadwajData Science, Monitoring & Management @nadubharadwaj
  2. 2. Safe HarborSafe harbor statement under the Private Securities Litigation Reform Act of 1995:This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any suchuncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differmaterially from the results expressed or implied by the forward-looking statements we make. All statements other thanstatements of historical fact could be deemed forward-looking, including any projections of product or service availability,subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans ofmanagement for future operations, statements of belief, any statements concerning new, planned, or upgraded services ortechnology developments and customer contracts or use of our services.The risks and uncertainties referred to above include – but are not limited to – risks associated with developing anddelivering new functionality for our service, new products and services, our new business model, our past operating losses,possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of oursecurity measures, the outcome of any litigation, risks associated with completed and any possible mergers andacquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain,and motivate our employees and manage our growth, new releases of our service and successful customer deployment, ourlimited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Furtherinformation on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual reporton Form 10-K for the most recent fiscal year ended January 31, 2011 and in our quarterly report on Form 10-Q for the mostrecent fiscal quarter ended October 31, 2011. These documents and others containing important disclosures are availableon the SEC Filings section of the Investor Information section of our Web site.Any unreleased services or features referenced in this or other presentations, press releases or public statements are notcurrently available and may not be delivered on time or at all. Customers who purchase our services should make thepurchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and doesnot intend to update these forward-looking statements.
  3. 3. A Big Data Pipeline Requirements Collaborate & Fancy UI (What?) Iterate (Visualize) Metadata End Result (Translation) (Output) Just do it (How?) Storage & Processing
  4. 4. Product Metrics Pipeline Reports, User Input Chatter Dashboards Workflow Formula Fields Metadata Daily summaries (Custom Object) (Custom Object) API API Client Machine Java Program Pig generator Hadoop Workflow Log Pull Log Files HDFS Cluster
  5. 5. Step 1 - Metadata (Custom Object)
  6. 6. Step 2 - User Input (Page Layout)
  7. 7. Step 3 - Java Pig Generator (Client)
  8. 8. Step 3 - Java Pig Generator (Client)
  9. 9. Step 4 - Daily Jobs (Hadoop Ecosystem) Apache Hadoop Version=0.20.2 Apache Pig Version=0.9.1
  10. 10. Hadoop at Salesforce Internal use cases Product Metrics User Behavior Analysis Product Features Chatter - Collaborative Filtering @jed_crosby Search Relevancy Pig Open Source contributions @pRaShAnT1784
  11. 11. Step 5 - Daily Summary Input(API à Custom Object)
  12. 12. Step 5 - Daily Summaries (Custom Object)
  13. 13. Step 6 - Visualization (Reports & Dashboards)
  14. 14. Step 6 - Visualization (Reports & Dashboards)
  15. 15. Step 7 - Collaborate, Iterate (Chatter)
  16. 16. Recap Reports, User Input Chatter Dashboards Workflow Formula Fields Metadata Daily summaries (Custom Object) (Custom Object) API API Client Machine Java Program Pig generator Hadoop Workflow Log Pull Log Files HDFS Cluster
  17. 17. Please stop by the registrationkiosks on Level 1 to complete a sessionsurvey for a chance to win a$500 American Express Gift CardThank you! @nadubharadwaj @SalesforceEng