Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Pipelines - Big Data meets Salesforce

303 views

Published on

Session run during Dreamforce'15 with @carolenlanube

Published in: Engineering
  • Be the first to comment

Data Pipelines - Big Data meets Salesforce

  1. 1. Data Pipeline: Big Data meets Salesforce Carolina Ruiz Medina Principal Developer on Product Innovation cruiz@financialforce.com @carolenlanube Agustina García Peralta Principal Developer on Platform Strategy agarcia@financialforce.com @agarciaodeian
  2. 2. Carolina Ruiz Medina Principal Developer on Product Innovation FinancialForce.com , MVP @CarolEnLaNube @CodeCoffeeCloud
  3. 3. Agustina García Peralta Principal Developer, Platform Strategy FinancialForce.com @agarciaodeian
  4. 4. About GREAT ALONE. BETTER TOGETHER. • Native to Salesforce App Cloud since 2009 • Investors include Salesforce Ventures • Customers in 27 countries • 650+ employees, San Francisco based • Dreamforce.FinancialForce.com
  5. 5. Agenda • Data Pipeline - Overview • Pipeline Use Cases • How Pipeline works – Demos • Big Data • Take away • Q&A
  6. 6. Asynchronous apex • @future • Queueable • Batch Apex • Flex Queue (since Summer ’15) Common scenario – Large amount of data
  7. 7. • Any other option? • Data Pipeline: New feature to integrate Apache Pig into Salesforce Common scenario – Large amount of data
  8. 8. • What does it do? • Process massive amounts of data in parallel. • Key elements • MapReduce  software to write programs to run amounts of data in parallel • Hadopp cluster  cluster for storing and analyzing amounts of data Apache Pig Background Enables Developers to create executions for analyzing LARGE AMOUNT of data in PARALLEL
  9. 9. • How does it work? • It uses Pig Latin • Data-flow language • Between SQL and Java • We can create our own UDF (user – define functions) Apache Pig Background
  10. 10. • Why is it relevant? • Technology associated with Hadoop but can be used by other frameworks  Salesforce • Is there anything unique to Apache Pig running in Salesforce? • Running in multitenant environment Apache Pig Background
  11. 11. • Under Pilot program  GA by Summer ‘16 (Safe Harbor) • How does Data Pipeline work? • Run Pig Scripts written in Pig Latin language What is Data Pipeline? Data Pipeline Pig Script Apex?
  12. 12. • Execution feature • Run asynchronously • In Parallel • From where? • Developer Console • During deploy • Tooling API  33.0 onwards What is Data Pipeline?
  13. 13. • Anything else? • It is an ETL (Extract – Transform – Load) • Pig Scripts can be included into a package What is Data Pipeline?
  14. 14. What is Data Pipeline?
  15. 15. 1 . Performance Data Pipeline – Advantages vs other processes 2 . Ability to Execute Scripts in Parallel 3 . No hitting governor Limits 4 . De-couple On-line Transaction Processing and On-line Analytical Processing 5 . Allows you to think in terms of data flow
  16. 16. How Pipeline can help us? …. and we need to process them Now! We have a large volume of Financial Transactions …. for our Users to be able to use them: Report, print, or for another quick process to finish revaluate Prepare data for Currency Revaluation SObject SObject to
  17. 17. How Pipeline can help us? …. and we need to process them Now! We have a large volume of Financial Transactions …. for our manager to look the progress, to export data quickly... Extracting information from large amount of Data SObject Fileto
  18. 18. To build the Solution lets See Pig Script first What is Pig Script ? Operators  JOIN  GROUP  DISTINCT  ORDER  …
  19. 19. Solution SObject SObjectto
  20. 20. Solution SObject Fileto File created
  21. 21. Demo
  22. 22. Use Case – LBX 7/7/2015 $150.00 I-00000 Other 7/7/2015 $250.00 I-00001 LBX 7/7/2015 $150.00 I-00002 LBX 12/7/2015 $350.00 I-00003 Other 15/7/2015 $550.00 I-00004 LBX 7/7/2015 $150.00 I-00000 Other 7/7/2015 $250.00 I-00001 LBX 7/7/2015 $150.00 I-00002 LBX 12/7/2015 $350.00 I-00003 Other 15/7/2015 $550.00 I-00004 LBX 7/7/2015 $150.00 I-00000 Other 7/7/2015 $250.00 I-00001 LBX 7/7/2015 $150.00 I-00002 LBX 12/7/2015 $350.00 I-00003 Other 15/7/2015 $550.00 I-00004 LBX 7/7/2015 $150.00 I-00000 Other 7/7/2015 $250.00 I-00001 LBX 7/7/2015 $150.00 I-00002 LBX 12/7/2015 $350.00 I-00003 Other 15/7/2015 $550.00 I-00004 7/7/2015 LBX $300.00 7/7/2015 Other $250.00 12/7/2015 Other $250.00 15/7/2015 Other $550.00 LBX 7/7/2015 $150.00 I-00000 Other 7/7/2015 $250.00 I-00001 LBX 7/7/2015 $150.00 I-00002 LBX 12/7/2015 $350.00 I-00003 Other 15/7/2015 $550.00 I-00004 SObjectto File
  23. 23. Use Case - SObjectto File
  24. 24. Use Case – No header!! SObjectto File
  25. 25. Demo
  26. 26. Use Case – SObjectto File
  27. 27. Use Case – SObjectto File
  28. 28. Data Pipeline – 2 more options Join 2 objects
  29. 29. Data Pipeline – 2 more options Read and Process a JSON file
  30. 30. • Thousand of invoices • Keep them somewhere for audit processes • No need all information, just some field values But that is not all!!
  31. 31. Big Data #Big Data#Big Objects
  32. 32. Big Data – Big Objects Custom Object Big Object Creation Manual & Metadata Metadata • Under Pilot program  GA by Summer ‘16 (Safe Harbor)
  33. 33. Big Data – Big Objects
  34. 34. Big Data – Big Objects
  35. 35. Big Data – Big Objects Custom Object Big Object Creation Manual & Metadata Metadata API name myObject__c myObject__b Enable Reports, Track Activities, Track Field History, etc. Options Available Options No Available Field Types All Text ; Date/Time ; Lookup Numbers!!!
  36. 36. Big Data – Big Objects Custom Object Big Object Able to edit / delete fields? Yes No Triggers; Field Sets; etc Options Available Options no Available
  37. 37. Big Data – Big Objects Custom Object Big Object How to Populate records All options Bulk API; SOAP API; Data Pipeline Can I amend a record? Yes No  Only clone is available Can I see data creating a Tab Yes No  Only via SOQL For free? Yes No  Talk with Salesfoce about it Storage? It count against storage limitation It DOES NOT count against the storage limitation Yes!!
  38. 38. Big Data – Big Objects & Pipeline
  39. 39. • Size complexity  20 operators, 20 loads and 10 stores / script • Run up to 30 scripts a day • Bulk API • Store calls it and its limits are in place • Does not support some operators like Count • Can’t break the rules on Salesforce Platform  triggers, validations, required fields, etc… • Once you run the process there is no way back Data Pipeline - Limitations
  40. 40. Data Pipeline – Take away 1. New Feature is in Pilot 2. Run Scripts via: Developer Console Deploy Tooling API ( since API 33.0) 3. Run Scripts Asynchronously and in Parallel 4. Better performance 5. Easy to use!!
  41. 41. Q&A ISV Scale: Big Data for ISV – 4pm Park Central Hotel, Franciscan Ballroom
  42. 42. • https://pig.apache.org/ • http://goo.gl/h5N7Sa • https://goo.gl/KXQSKC Links and more Carolina Ruíz Medina cruiz@financialforce.com @CarolEnLaNube @CodeCoffeeCloud www.codeandvoge.com http://www.meetup.com/es/South-Spain- Salesforce-Developer-Group/ Agustina García Peralta agarcia@financialforce.com @agarciaodeian www.agarciaodeian.com http://www.meetup.com/es/Spain-Salesforce- Developer-User-Group/
  43. 43. Thank you

×