Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

LEGO: Data Driven Growth Hacking Powered by Big Data

1,000 views

Published on

LEGO: Data Driven Growth Hacking Powered by Big Data

Published in: Technology
  • Be the first to comment

LEGO: Data Driven Growth Hacking Powered by Big Data

  1. 1. 1 Salesforce ConfidentialSalesforce Confidential LEGO: Data Driven Growth Hacking Powered by Big Data June 2016 Kamal Duggireddy Prashant Gokhale
  2. 2. 2 Salesforce Confidential Kamal Duggireddy Kamal Duggireddy currently leads Data Engineering, Product Data Science Team at Salesforce.com Prior to this, he served as Director - Big Data Architecture at American Express. Combining deep technical skills along with business knowledge and strong execution experience, Kamal developed reference architectures and new enterprise-level capabilities with the Hadoop stack. Prashant Gokhale Prashant is currently working on solving big data problems at Salesforce.com using Hadoop and its ecosystem components. Prior to this he held several critical engineering positions at Yahoo, Cloudera & Lookout. About Us
  3. 3. 3 Salesforce Confidential The Use Case | Overview Executives Analysts Product Managers
  4. 4. 4 Salesforce Confidential The Use Case | Flow Ad-Hoc Requests Predictive Data Apps Data Engineering & Curation Smart Data Dashboards (Salesforce Wave) Advanced Analysis Instrumentation 150+ Loglines Hadoop Data Processing Traditional Data Warehouses Dimensions
  5. 5. 5 Salesforce Confidential The Journey | How it all started
  6. 6. 6 Salesforce Confidential Milestones | Along the way </> <> Reusability Declarative Data Lake Data Dictionary Self service Automation Security Visualization Governance
  7. 7. 7 Salesforce Confidential The Framework | Finally! Datasets (Variousgrain) Data Lake Log Processing Metadata Flow Engine WebApp Self Service LogSourcesCloudMetrics Data Profiler Data Science Kafka Splunk Files Warehouse Objects Hadoop Cubes (Customgrain)
  8. 8. 8 Salesforce Confidential Goals Scalable Process hundreds of billions of log lines. Flexible Handle thousands of log schemas. Support variable grain and transformations using custom code. Data Quality Automated data profiling, monitoring and alerting. Self Service Enable ad-hoc analysis
  9. 9. 9 Salesforce Confidential Log Processing Engine •Declaratively define features and flows. •Normalize data across multiple log lines. •Custom code injection for data transformation. Data Profiler •Profile data at scale to detect anomalies. Web App •Interface to manage features and flows. Job Automation engine •End to end automation from features/flows to curated data sets in Wave. Key Building Blocks
  10. 10. 10 Salesforce Confidential Log Processing Engine logType==’X’ and event==’Create Event’ and page==’Home Landing’,”Feat 1”,”eval_code(event.toUpperCase())”,page,….. logType==’ABC’ and event==’Create Event’ and page==’Home’,”Feat 2”,”eval_code(event.substring(5))”,event,….. usage Log Files Feature definitions Hive tables Data Normalization Data Cleansing Data Transformation +
  11. 11. 11 Salesforce Confidential Data Profiler Dataset Field Type, Total, Min, Max, Avg, # Nulls, # Distinct, Median, 99th %tile, Top N lego_feat browser STR 2.3B 7 63 25 1M 50 34 38 [.....] lego_feat url STR 2.3B 20 223 50 0 5M 70 90 [.....] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Datasets across platform HCatalog MapReduce Datasets Dataset Profile An Example Monitoring & alerting
  12. 12. 12 Salesforce Confidential Everything put together Datasets (Variousgrain) Data Lake Log Processing Metadata Flow Engine WebApp Self Service LogSourcesCloudMetrics Data Profiler Data Science Kafka Splunk Files Warehouse Objects Hadoop Cubes (Customgrain)
  13. 13. 13 Salesforce Confidential Data Volumetrics TOTAL Avg. Volume of App Logs processed (Compressed) 100’s TB/mon Avg. Number of Jobs 6000+ /mon Avg. Log Size volume growth rate A lot! Number of Log Record Types 1,000s Number of fields 10s of 1,000s 200+ B Events / Day 500+ Features
  14. 14. 14 Salesforce Confidential thank y u 14 We are hiring!! www.salesforce.com/comapany/careers

×