VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
Data Pipelines -Big Data Meets Salesforce
1. Data Pipeline:
Big Data meets Salesforce
Carolina Ruiz Medina
Principal Developer on Product Innovation
cruiz@financialforce.com
@carolenlanube
Agustina García Peralta
Principal Developer on Platform Strategy
agarcia@financialforce.com
@agarciaodeian
4. About
GREAT ALONE. BETTER TOGETHER.
Native to Salesforce1™ Platform
since 2009
Investors include Salesforce Ventures
650+ employees, San Francisco based
4
5. Agenda
• Data Pipeline - Overview
• Pipeline Use Cases
• How Pipeline works – Demos
• Big Data
• Take away
• Q&A
6. Asynchronous apex
• @future
• Queueable
• Batch Apex
• Flex Queue (since Summer ’15)
Common scenario – Large amount of data
7. • Any other option?
• Data Pipeline: New feature to integrate Apache Pig into Salesforce
Common scenario – Large amount of data
8. • What does it do?
• Process massive amounts of data in parallel.
• Key elements
• MapReduce software to write programs to run amounts of data in parallel
• Hadopp cluster cluster for storing and analyzing amounts of data
Apache Pig Background
Enables Developers to create executions for
analyzing LARGE AMOUNT of data
in PARALLEL
9. • How does it work?
• It uses Pig Latin
• Data-flow language
• Between SQL and Java
• We can create our own UDF (user – define functions)
Apache Pig Background
10. • Why is it relevant?
• Technology associated with Hadoop but can be used by other frameworks Salesforce
• Is there anything unique to Apache Pig running in Salesforce?
• Running in multitenant environment
Apache Pig Background
11. • Under Pilot program GA by Summer ‘16 (Safe Harbor)
• How does Data Pipeline work?
• Run Pig Scripts written in Pig Latin language
What is Data Pipeline?
Data Pipeline Pig Script
Apex?
12. • Execution feature
• Run asynchronously
• In Parallel
• From where?
• Developer Console
• During deploy
• Tooling API 33.0 onwards
What is Data Pipeline?
13. • Anything else?
• It is an ETL (Extract – Transform – Load)
• Pig Scripts can be included into a package
What is Data Pipeline?
15. 1 . Performance
Data Pipeline – Advantages vs other processes
2 . Ability to Execute Scripts in Parallel
3 . No hitting governor Limits
4 . De-couple On-line Transaction
Processing and On-line Analytical
Processing
5 . Allows you to think in terms of
data flow
16. How Pipeline can help us?
…. and we need to process
them Now!
We have a large volume of
Financial Transactions
…. for our Users to be able to
use them: Report, print, or for
another quick process to finish
revaluate
Prepare data
for Currency
Revaluation
SObject SObject
to
17. How Pipeline can help us?
…. and we need to process
them Now!
We have a large volume of
Financial Transactions
…. for our manager to look the
progress, to export data
quickly...
Extracting
information
from large
amount of Data
SObject Fileto
18. To build the Solution lets See Pig Script first
What is Pig Script ?
Operators
JOIN
GROUP
DISTINCT
ORDER
…
35. Big Data – Big Objects
Custom Object Big Object
Creation Manual & Metadata Metadata
API name myObject__c myObject__b
Enable Reports, Track Activities,
Track Field History, etc. Options Available Options No Available
Field Types All Text ; Date/Time ; Lookup
36. Big Data – Big Objects
Custom Object Big Object
Able to edit / delete fields? Yes No
Triggers; Field Sets; etc Options Available Options no Available
37. Big Data – Big Objects
Custom Object Big Object
How to Populate records All options Bulk API; SOAP API; Data Pipeline
Can I amen a record? Yes No Only clone is available
Can I see data creating a Tab Yes No Only via SOQL
For free? Yes No Talk with Salesfoce about it
Storage? It count against storage limitation
It DOES NOT count against the
storage limitation
39. • Size complexity 20 operators, 20 loads and 10 stores / script
• Run up to 30 scripts a day
• Bulk API
• Store calls it and its limits are in place
• Does not support some operators like Count
• Can’t break the rules on Salesforce Platform triggers, validations, required fields, etc…
• Once you run the process there is no way back
Data Pipeline - Limitations
40. Data Pipeline – Take away
1. New Feature is in Pilot
2. Run Scripts via:
Developer Console
Deploy
Tooling API ( since API 33.0)
3. Run Scripts Asynchronously and in Parallel
4. Better performance
5. Easy to use!!
41. Q&A
ISV Scale: Big Data for ISVs
Session Date: 9/17/2015
Session Time: 4:00 p.m. - 4:40 p.m.PST
Location: Franciscan Ballroom, Park Central Hotel
42. • https://pig.apache.org/
• http://goo.gl/h5N7Sa
• https://goo.gl/KXQSKC
Links and more
Carolina Ruíz Medina
cruiz@financialforce.com
@CarolEnLaNube
@CodeCoffeeCloud
www.codeandvoge.com
http://www.meetup.com/es/South-Spain-
Salesforce-Developer-Group/
Agustina García Peralta
agarcia@financialforce.com
@agarciaodeian
www.agarciaodeian.com
http://www.meetup.com/es/Spain-Salesforce-
Developer-User-Group/
DONE slide 7 & 8
New feature is integrating apqache pig into SF
Reduce the text ! == make it more visual
Complex process to run at the end of the month that consume lots of resources
In general terms, revaluation of a currency is a calculated adjustment to a country's official exchange rate relative to a chosen baseline. The baseline can be anything from wage rates to the price of gold to a foreign currency.
There are two situations in which you might want to perform a currency revaluation.
At period end. You might want to revalue your income statement to eliminate the effect of exchange rate fluctuations.
At year end. You might want to revalue the company’s balance sheet so that it values the assets and liabilities of the company at the exchange rate applicable on the balance sheet date.
Get all the info from our
** weekly** extract large volumes
transactions
There are two situations in which you might want to perform a currency revaluation.
At period end. You might want to revalue your income statement to eliminate the effect of exchange rate fluctuations.
At year end. You might want to revalue the company’s balance sheet so that it values the assets and liabilities of the company at the exchange rate applicable on the balance sheet date.
In general terms, revaluation of a currency is a calculated adjustment to a country's official exchange rate relative to a chosen baseline. The baseline can be anything from wage rates to the price of gold to a foreign currency.
Pig is a high level scripting language that is used with Apache Hadoop. Pig enables data workers to write complex data transformations without knowing Java. Pig's simple SQL-like scripting language is called Pig Latin, and appeals to developers already familiar with scripting languages and SQL.
Break it down to level that is even more basic …. Before it gets to the slide – leading to --- tunning slide