Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Forcelandia 2016 PK Chunking

1,495 views

Published on

Forcelandia 2016 PK Chunking Presentation

Published in: Technology

Forcelandia 2016 PK Chunking

  1. 1. PK Chunking Divide and conquer massive objects in Salesforce Daniel Peter Lead Applications Engineer,Kenandy @danieljpeter Bay Area Salesforce Developer User Group
  2. 2. Takeaways: How to avoid these errors Query not “selective” enough: •Non-selective query against large object type (more than 100000 rows). Query takes too long: •No response from the server •Time limit exceeded •Your request exceeded the time limit for processing Too much data returned in query: •Too many query rows: 50001 •Remoting response size exceeded maximum of 15 MB.
  3. 3. GET THE DATA
  4. 4. Sounds great. How? Not so fast… …first we need some pre-requisite knowledge! •Database Indexes •Salesforce Ids
  5. 5. Database indexes (prereq) “Allow us to quickly locate rows without having to scan every row in the database”(paraphrased from wikipedia)
  6. 6. Database indexes (prereq)
  7. 7. Database indexes (prereq) Location Location Location
  8. 8. Salesforce Ids (prereq) •Composite key containing multiple pieces of data. •Uses base 62 numbering instead of the more common base 10. •Fastest way to find a database row.
  9. 9. Salesforce Ids (prereq)
  10. 10. Digits Values 1 62 2 3,844 3 238,328 4 14,776,336 million 5 916,132,832 million 6 56,800,235,584 billion 7 3,521,614,606,208 trillion 8 218,340,105,584,896 trillion 9 13,537,086,546,263,600 quadrillion Digits Values 1 10 2 100 3 1,000 4 10,000 5 100,000 6 1,000,000 million 7 10,000,000 million 8 100,000,000 million 9 1,000,000,000 billion Base 10 Base 62vs 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
  11. 11. Salesforce Ids (prereq) MO’ NUMBERS Base 62
  12. 12. Prerequisites complete!
  13. 13. How does PK Chunking work? Analogy: fetching people in a city.
  14. 14. Fetching people in a city: problems Non-selective Request: “get me all the people who are female” Response: “yer trippin’!”
  15. 15. Fetching people in a city: problems Timeout Request: “find me a 7 foot tall person in a pink tuxedo in Beijing” Response: (after searching all day) “I can’t find any! I give up!”
  16. 16. Finding people in a city: problems Too many people found Request: “find me all the men in San Francisco with beards” Response: (after searching for 10 mins) “The bus is full!”
  17. 17. PK Chunking addresses all those problems Divide and conquer! Parallelism!
  18. 18. Fetching people in a city: solutions Non-selective Request: “get me all the people who are female, in your small search area” Response: “¡Con mucho gusto!”
  19. 19. Fetching people in a city: solutions Timeout Request: “find me a 7 foot tall person in a pink tuxedo in Beijing, in your small search area” Response: SP1: “Didn’t find any, sorry!” SP2: “Didn’t find any, sorry!” SP3: “Found one!” SP4: “Didn’t find any, sorry!”
  20. 20. Finding people in a city: solutions Too many people found Request: “find me all the men in San Francisco with beards, in your small search area” Response: SP1: 30 people in our bus SP2: Didn’t find any SP3: 50 people in our bus
  21. 21. Technical details
  22. 22. 2 different implementations QLPK Query Locator PK Chunking Base62PK Base62 PK Chunking
  23. 23. QLPK Salesforce SOAP or REST API – AJAX toolkit works great. Create and leverage a server-sidecursor. Similar to an Apex query locator (Batch Apex). Analogy: Print me a phone book of everyonein the city so I can flip through it.
  24. 24. QLPK – AJAX Toolkit Request
  25. 25. QLPK – AJAX Toolkit Response Chunk the database, in size of your choice, by offsetting the queryLocator: 01gJ000000KnRpDIAV-500000 1gJ000000KnRpDIAV-100000 … 01gJ000000KnRpDIAV-39950000 01gJ000000KnRpDIAV-40000000
  26. 26. QLPK – The Chunks 800 chunks x 50,000 records 40,000,000 total records Analogy: we have exact addresses for clusters of 50k people to give to 800 different search parties.
  27. 27. QLPK – How to use in a query? Perform 800 queries with the Id ranges in the whereclause: SELECT Id, Autonumber__c, Some_Number__c FROM Large_Object__c WHERE Some_Number__c> 10 AND Some_Number__c< 20 AND Id >= ’a00J000000BWNYk’ AND Id <= ’a00J000000BWO4z’
  28. 28. THAT SPLIT CRAY database so hard, take 800 queries to find me
  29. 29. QLPK – Parallelism Yeah it’s 800 queries, but… They all went out at once, and they might all come back at once. Analogy: We hired 800 search parties and unleased them on the city at the same time.
  30. 30. QLPK Base62PK Shift Gears
  31. 31. Base62PK Get the first and last Id of the database and extrapolate the ranges in between. Analogy: Give me the highest and lowest address of everyone in the city and I will make a phonebook with every possible address in it. Then we will break that into chunks.
  32. 32. Base62PK – first and last Id Get the first Id SELECT Id FROM Large_Object__c ORDER BY Id ASC LIMIT 1 Get the last Id SELECT Id FROM Large_Object__c ORDER BY Id DESC LIMIT 1 Even on H-U-G-E databases these return F-A-S-T. No problem.
  33. 33. Base62PK – extrapolate 1. Chop off the last 9 digits of the 15 digit first/last Ids. Decompose. 2. Convert the 9 digit base 62 numbers into a Long Integer. 3. Add the chunk size to the first number until you hit or exceed the last number. 4. Last chunk may be smaller. 5. Convert those Long Integers back to base 62 and re- compose the 15 digit Ids
  34. 34. Base62PK – benefits •High performance! Calculates the Ids instead of querying for them.
  35. 35. Base62PK – issues •Digits 4 and 5 of the Salesforce Id are the pod Identifier. If the Ids in your org have different pod Id’s this technique will break, unless enhanced. •Fragmented Ids lead to sparsely populated ranges. You will search entire ranges of Ids which have no records.
  36. 36. So which do I pick? QLPK or Base62PK
  37. 37. So which do I pick? Hetergeneous Pod Ids Homogeneous Pod Ids Low Id Fragmentation (<1.5x) Medium Id Fragmentation (1.5x - 3x) High Id Fragmentation (>3x) QLPK X X X Base62PK X X
  38. 38. How do I implement? •Needs to be orchestrated via language like JS in your page, or another platform (Heroku) •Doesn’t work on Lightning Component Framework (yet). No support for real parallel controller actions. (boxcarred) •Has to be Visualforce or Lightning / Visualforce hybrid.
  39. 39. How do I implement? •Use RemoteActions to get the chunk queries back into your page. •Can be granular or aggregate queries! •Process each chunk query appropriately when it comes back. EX: update totals on a master object or push into a master array.
  40. 40. function queryChunks() { for (var i=0; i<chunkList.length; i++) { queryChunk(i); } } function queryChunk(chunkIndex) { var chunk = chunkList[chunkIndex]; Visualforce.remoting.Manager.invokeAction( '{!$RemoteAction.Base62PKext.queryChunk}', chunk.first, chunk.last, function (result, event) { for (var i=0; i<result.length; i++) { objectAnums.push(result[i].Autonumber__c); } queryChunkCount++; if (queryChunkCount == chunkList.length) { allQueryChunksComplete(); } }, {escape: false, buffer: false} ); }
  41. 41. @RemoteAction public static List<Large_Object__c> queryChunk(String firstId, String lastId) { String SOQL = 'SELECT Id, Autonumber__c, Some_Number__c ' + 'FROM Large_Object__c ' + 'WHERE Some_Number__c > 10 AND Some_Number__c < 20 ' + 'AND Id >= '' + firstId + '' ' + 'AND Id <= ''+ lastId +'' '; return database.query(SOQL); }
  42. 42. Landmines Timeouts – retries •Cache warming means if you first fail, try and try again! Concurrency •Beware: ConcurrentPerOrgApexLimit exceeded •Keep your individual chunk queries lean. < 5 secs.
  43. 43. Demos Backup video: https://www.youtube.com/watch?v=KqHOStka0eg
  44. 44. How did you figure this out? Had to meet requirements for Kenandy’slargest customer. $2.5B / yr manufacturer. High visibility project. Necessity mother of invention!
  45. 45. How did you figure this out? Query Plan Tool
  46. 46. How did you figure this out? Debug logs from real execution
  47. 47. Why doesn’t Salesforce do this? They do! (kinda) The Bulk API uses a similar technique, but it is more asynchronous and wrapped in a message container to track progress.
  48. 48. More Info Article on Salesforce Developers Blog https://developer.salesforce.com/blogs/developer-relations/2015/11/pk-chunking-techniques-massive- orgs.html Githubrepo https://github.com/danieljpeter/pkChunking Bulk API documentation https://developer.salesforce.com/docs/atlas.en- us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm
  49. 49. Q&A
  50. 50. Thank you!

×