Data Done Right


Published on

You probably already know that managing data in Salesforce can be a formidable task. But you might not know that it doesn't have to be! In this session, we'll focus on strategies to help you with key data tasks such as data migration, managing large data volumes, org merges, and data consolidations.

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Ezra
  • Ezra
  • BrianWe also offer over 200+ FREE app templates built by employees called Labs apps.These can be used out of the box as stated or you can choose to customize them to match your specific business process. Many people use them as a basis to kick off their custom app development projects.
  • Both can be Custom or Standard ObjectUse case for “standard” archive object - Custom pre-loads all accounts into Lead Object - “lead” gets promoted to an active account only when they call - Millions of passive/archived accounts can exist without them being “in the way”
  • REST-based, asynchronous API optimized for loading large sets of data.Enable high volume integration with Salesforce (volume)Enable integration that has to finish in a certain window of time (speed)
  • Data Done Right

    1. 1. Data Done Right<br />Administrators<br />Brian Wiebe: Technical Engagement Manager,<br />Ezra Kenigsberg: Data Architect,<br />
    2. 2. Safe Harbor<br />Safe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services.<br />The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of intellectual property and other litigation, risks associated with possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of, inc. is included in our annual report on Form 10-K for the most recent fiscal year ended January 31, 2010. This documents and others are available on the SEC Filings section of the Investor Information section of our Web site. <br />Any unreleased services or features referenced in this or other press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available., inc. assumes no obligation and does not intend to update these forward-looking statements.<br /><br />
    3. 3. Purpose<br /> To walk through three big data issues that can help make you an even-better administrator.<br />
    4. 4. This Session… Other Sessions…<br /><ul><li>Practical demos—things you can do TODAY
    5. 5. Required
    6. 6. Data Loader
    7. 7. Microsoft Excel
    8. 8. A decent text editor(I use Notepad++)
    9. 9. Optional
    10. 10. Cloud Converter
    11. 11. Synchronizer(requires Microsoft Access)</li></ul>Bigger-picture data strategy<br />Professional third-party tools<br />
    12. 12. Overview<br />Introduction 5 min<br />Moving Data 15 min<br />Cleaning Data 15 min<br />Working with Large Data Volumes 15 min<br />Q&A until they kick us out<br />Prior to making any major changes to your org:<br />BACK UP!<br />
    13. 13. Ezra Kenigsberg<br /><br />
    14. 14. Moving Data<br />
    15. 15. The Scenario<br />The scenario we’re walking through:<br />Gotta import new records by tomorrow<br />We’re creating a repeatable, documented process<br />“Just load it” fails the hit-by-a-bus test…<br />…is difficult to audit after the fact<br />…may not be reversible if I’ve made a mistake<br />
    16. 16. Links and Tools<br />Useful links:<br /><ul><li>
    17. 17.</li></ul>Our tools:<br /><ul><li>Required</li></ul>Data Loader<br />Microsoft Excel<br />A decent text editor (I use Notepad++)<br /><ul><li>Optional</li></ul>Cloud Converter<br />Synchronizer (requires Microsoft Access)<br />
    18. 18. Useful Links:<br />Dedicated pages for<br /><ul><li>Data Migration
    19. 19. Large Data Volumes
    20. 20. many others</li></li></ul><li>Useful Links:<br />Dedicated sections for<br /><ul><li>Handy Tools
    21. 21. Reference Links
    22. 22. Presentations
    23. 23. Requests for Salesforce & Data Loader Improvements</li></li></ul><li>HOW do I<br />HOW do I <br /> generate?<br />map?<br />generate?<br />map?<br />load?<br />load?<br />Three Key Steps<br />Three steps:<br />How should I map my data?<br />How can I automate the generating of CSVs?<br />How can I load data in an auditable way?<br />
    24. 24. Moving Data #1: How Do I Map?<br />Create a mapping file:<br /> Create list of source fields in source file/s<br /> Create list of API field names (not the UI labels!)<br />Get them with Data Loader or Cloud Converter<br />Match source fields to API field names<br />HOW do I <br />map?<br />
    25. 25. Grab the Source Field Names<br />HOW do I <br />map?<br />1<br />Transposed in Excel<br />Excel 2003:<br />Edit | Paste Special | Transpose<br />Excel 2007 or 2010:Home | Paste | Paste Special | Transpose<br />2<br />
    26. 26. Grab the API Field Names (slide 1 of 2)<br />HOW do I <br />map?<br />Model Metrics’ free utility “Cloud Converter” is a straightforward way to export metadata<br />1<br />2<br />
    27. 27. Grab the API Field Names (slide 2 of 2)<br />HOW do I <br />map?<br />1<br />“Cloud Converter” exports a tab for every object...<br />2<br />...and a row for every field<br />
    28. 28. Build Out Mapping File<br />HOW do I <br />map?<br />
    29. 29. Index Column<br />HOW do I <br />map?<br />An index column<br />facilitates later<br />sorting<br />
    30. 30. A Column of Arrows<br />HOW do I <br />map?<br />Arrows remind people of the DIRECTION of data<br />
    31. 31. Not Every Field Has to Map<br />HOW do I <br />map?<br />Not every column has to map<br />
    32. 32. Keep it Simple!<br />HOW do I <br />map?<br />The most common cause of bad data maps?<br />Too much stuff!<br />Put in only the columns you’ll UPDATE and USE<br />
    33. 33. Moving Data #2: How Do I Generate CSVs?<br />Import legacy data into a tool that enables REUSE<br />Raw data comes in, ready-to-load data comes out <br />Minimize manual steps. Tools I’m not so thrilled with:<br />Microsoft Excel<br />Import Wizard<br />Each subsequent load becomes a straightforward process<br />Test-then-Production<br />Follow-up loads<br />Which tool should I use?<br />Good question! Can you wait eight slides?<br />HOW do I <br />generate?<br />
    34. 34. Moving Data #3: How Do I Load?<br />HOW do I <br />load?<br />Loading Best Practices:<br />Group work by folder<br /> Keep all files together<br /> Auto-Match<br />Upsert!<br />
    35. 35. Folder Naming<br />HOW do I <br />load?<br />YYYY-MM-DD#NN<br />forces folders<br />to sort in<br />chronological<br />order<br />
    36. 36. Keep All Files Together<br />HOW do I <br />load?<br />Loading Best Practices:<br /> Group work by folder<br /> Keep all files together<br /> Auto-Match<br />Upsert!<br />
    37. 37. Folder Contents (1 of 3)<br />HOW do I <br />load?<br />Save source file, success files, and error files all in same directory<br />
    38. 38. Folder Contents (2 of 3)<br />HOW do I <br />load?<br />When loading errors:<br /><ul><li>Copy error file and
    39. 39. Use as source for next load</li></ul>1<br />2<br />
    40. 40. Folder Contents (3 of 3)<br />HOW do I <br />load?<br />
    41. 41. Dummy-Proof the Auto-Match<br />HOW do I <br />load?<br />Loading Best Practices:<br /> Group work by folder<br /> Keep all files together<br /> Auto-Match<br />Upsert!<br />
    42. 42. Perfect Auto-Match<br />HOW do I <br />load?<br />With a file created this way, Auto-Match gets every field mapped<br />
    43. 43. The Smarter Way to Load Data: Upsert!<br />HOW do I <br />load?<br />Loading Best Practices:<br /> Group work by folder<br /> Keep all files together<br /> Auto-Match<br />Upsert!<br />
    44. 44. Why UPSERT and EXTERNAL IDs are Great<br />a<br />a<br />a<br />With Upsert & External IDs<br />c<br />c<br />c<br />a<br />a<br />a<br />a<br />c<br />c<br />c<br />c<br />Without Upsert & External IDs<br />a<br />a<br />a<br />a<br />c<br />c<br />c<br />c<br />
    45. 45. HOW do I<br />HOW do I <br /> generate?<br />map?<br />load?<br />Synchronize?<br />Synchronizer has functions that facilitate all these tasks:<br />Mapping<br />Generating CSVs<br />Loading data<br /><ul><li>Synchronizer is an OPEN SOURCE application that is NOT SUPPORTED by Salesforce</li></ul>How Synchronizer Automates All This<br />
    46. 46. Synchronizer Walkthrough<br />In this walkthough, we’ll:<br /> Import legacy data into Synchronizer<br /> Map the data<br /> Document the data map<br /> Migrate the data into Salesforce<br /> Review the files created<br />HOW do I <br />Synchronize?<br />
    47. 47. Synchronizer Review: Import and Map<br />HOW do I <br />Synchronize?<br />One-button CSV importer (not perfect, but fast and simple):<br />Grab API fields, then tie legacy fields to API fields:<br />
    48. 48. Synchronizer Review: Create Data Map<br />HOW do I <br />Synchronize?<br />One-click data map generator:<br />
    49. 49. Synchronizer Review: Migrate<br />HOW do I <br />Synchronize?<br />One-screen UI for loading data<br />
    50. 50. Synchronizer Review: Organize Files<br />HOW do I <br />Synchronize?<br />Folder and file discipline:<br />Reimporting of success and error files for use in future loads<br />
    51. 51. Synchronizer Review: Other Goodies<br />Some other Synchronizer functions not covered:<br />Ability to run multiple steps in sequence<br />Scheduling<br />Mass-create tasks<br />Creating Users<br />Assigning Users to Groups<br />Custom reports<br />Storage usage by User, by object<br />HOW do I <br />Synchronize?<br />
    52. 52. Best Practices<br />Build a mapping file<br />Leverage a tool to generate CSVs<br />Use loading Best Practices<br />Get Synchronizer and help make it better!<br />HOW do I <br />generate?<br />map?<br />load?<br />
    53. 53. Brian Wiebe<br /><br />
    54. 54. Cleaning Data<br />
    55. 55. Defining a Broad Topic<br />What is Data Quality?<br /> Combination of Processes, Policies and Tools<br /> Involves Governance, Enforcement, Prevention Goal is not perfection<br />What are the typical Issues?<br />Duplicates (Account, Contact), Incomplete information, <br /> Stale or Untouched data, Inconsistent values, Incorrect linkages<br />What are the typical causes?<br /> Not part of Budget, Unmeasurable problem<br /> No Action Plan, No Ownership, Lack of Training, Non-optimized<br />43<br />Key Data Quality Concepts<br />
    56. 56. The Full Data Quality Lifecycle<br />Our 3-step, iterative process quickly identifies problems, fixes them and helps you maintain high data quality over time<br />Data Quality<br />Assessment<br /><ul><li>Profile data
    57. 57. Analyze results
    58. 58. Identify problems and next steps</li></ul>Assess<br />Cleanse<br />Protect<br />Data Cleanse<br /><ul><li>Standardize & Cleanse
    59. 59. Supplement & Enrich
    60. 60. Test & Load</li></ul>Data Protect<br /><ul><li>Train users
    61. 61. Enforce processes
    62. 62. Monitor on-going quality</li></li></ul><li>Data Quality Assessment<br />
    63. 63. Data Quality Assessment<br />
    64. 64. Project Planning<br />Strong Sponsorship<br />Committed Involvement & Availability (The DQ Assessment helps justify this)<br />Appreciation, Awareness & Understanding of Data complexities<br />Limiting Scope / Phased Approach<br />ACHIEVABLE Goals<br />Define critical quick-win items for Phase 1 (focus on biggest issues for end-users)<br />Test, Test, Test<br />Leverage your Sandbox Environment<br />Data Quality cleansing is a “destructive” process<br />Plan for End-user involvement<br />Data Quality is an iterative process – and MUST involve end-user buy-in and input<br />If the foundation is off.. <br />
    65. 65. Begin Governance & Stewardship<br />Involve IT and Business users<br />Monitoring Data Quality Dashboards – report back monthly<br />Use Salesforce features (e.g., Data Validation Rules, Conditional Workflow field updates, Analytic Snapshots for trending)<br />Archive un-used Data<br />Data must be USEFUL to the Business and must be justifiable<br />Candidates for archiving : last updated > 1 year ago, no child records, Missing Core Required Fields<br />Correct existing data<br />Users who have left company and STILL own records, Find/Replace picklist values, Apply Naming Standards<br />Data Quality Solution Considerations<br />
    66. 66. Identify and remove Dupes<br />Low hanging fruit: <br />Simple dupes: e.g.,) match on a unique key like email address<br />Flag dupes for merging in Salesforce<br />Leverage available de-dupe tools<br />Complex definition of dupe: e.g.,) fuzzy matching on name+address<br />Define your rules (matching rules, merging rules)<br />Enrich & Append<br />Enrich your existing data<br />Add NEW data for known companies<br />3rd party data vendors – helpful in creating Account hierarchies, helpful for accurate Contact Info – especially at various levels in the Company<br />Data Quality Solution Considerations<br />
    67. 67. Limit points of Entry<br />List Imports restricted to certain profiles<br />Control data being entered – without overwhelming users<br />Leverage Sales Intelligence Tools, Dupe Prevention, Address Validation<br />Automation / Integration<br />Integrations, Master Data Management<br />Nightly Batch Updates<br />Data Quality Solution Considerations<br />
    68. 68. Cleansing Environment<br />Staging<br /><ul><li> Transform & Re-model
    69. 69. Cleanse & Standardize
    70. 70. Enrich & De-dupe
    71. 71. Iterate
    72. 72. Validate with Business Users</li></ul>Production<br />Staging<br />
    73. 73. 1<br />Validate<br />& Modify<br />Cleansing Process<br />3<br />4<br />2<br />5<br />Enrich(Optional)<br />Standardize<br />De-dupe<br />Cleanse<br />Validate<br />Company<br />Name&Address<br />Names<br />Identify, <br />Match &Score<br />Find&<br />Replace<br />Load to<br />Sandbox<br />acme incorpAcme Inc<br />Hot  HighCold  Low<br />J. Smith, John Smith80%<br />HierarchyData<br />Addresses<br />Merge<br />NamingConventions<br />Acme Inc HQ<br />Acme UK<br />J. Smith, John Smith John Smith<br />US, U.S. U.S.A USA <br />Acme-Widgets-453<br />Demographics<br />Postal <br />Standards<br />Re-parent<br />ChildRecords<br />DataTransformation<br />Load to <br />Production<br />Account: Division, Opportunity, Contact<br />Mergers, acquisitions, spin-offs<br />Archiving &Filtering<br />
    74. 74. Protect Your Data<br />Safeguard your cleansed data and prevent future deterioration.<br />Train<br />Enforce<br />Monitor<br /><ul><li>User Training
    75. 75. Naming Conventions
    76. 76. Address Conventions
    77. 77. Dupe. Prevention Process
    78. 78. Data Importing Policies
    79. 79. Required Fields
    80. 80. Default Values
    81. 81. Data Validation Rules
    82. 82. Workflow Field Updates
    83. 83. Web-to-Lead Restrictions
    84. 84. DataQualityDashboards
    85. 85. DataQualityReassessment
    86. 86. AppExchangeTools</li></li></ul><li>What tools do I use?<br />
    87. 87. The AppExchangeThe Trusted Cloud Computing Marketplace<br />1000+<br />Pre-Integrated Apps<br />300+ <br />Services<br />4000+<br />Customer Reviews<br />
    88. 88. 200+ Free Apps to Get You Started<br /><ul><li>Reports & dashboards to end-to-end templates
    89. 89. Fully customizable</li></li></ul><li>AppExchange Tools Worth Checking Out!<br />Cloud Converter (Free)<br />Synchronizer (Free)<br />Jigsaw for Salesforce (Paid)<br />CRM Fusion (Paid)<br />Data Quality Dashboard (EE edition)<br />
    90. 90. Data Quality Analysis Dashboard (EE Edition)<br />All reports pull from just TWO formula fields.<br />
    91. 91. The Formula Field<br />You can EXPAND these formulas <br />to include YOUR custom fields.<br />59<br />
    92. 92. Brian WiebeEzra Kenigsberg<br /><br />
    93. 93. Managing Large Data Volumes (LDV)<br />
    94. 94. What Do We Mean by Large Data Volumes?<br />You know you’ve got scale when …<br />1,000s of Users<br />1,000,000s of records for a single Object<br />Role or Territory hierarchy > dozens of levels<br />Public Groups nested > 5 levels deep<br />A single User, Queue, Role, Public Group, or Territory:<br />Owning 10,000s of records<br />Seeing 10,000s of records as a result of sharing<br />10,000s of Public Groups<br />1,000s of Territories<br />These are NOT hard limits—only useful guides to proceed carefully!<br />Talk to your Account Executive<br />
    95. 95. Where Would We Proceed Carefully?<br />User Interface<br />Reports, Dashboards, List Views<br />Searches<br />API<br />Queries<br />Integration<br />Synchronizing with end-user (Outlook, Mobile) apps<br />
    96. 96. What Options are Available?<br />Segment<br />Optimize sharing<br />Leverage indexes & skinny tables<br />Move data asynchronously<br />
    97. 97. 1) Segmenting with Divisions<br />Think millions<br />Use data access patterns<br />Does everyone really look at everything?<br />Acts like DB partitions<br />Breaks up big objects<br />Aim for <1M / division<br />Used for performance of search, reports, dashboards, and list views<br />Not a security measure!<br />By Geography<br />By Responsibility<br />Example: Financial Services Customer<br />13M Clients / ~500 Branches = 26,000 records per Division<br />
    98. 98. 1) Segmenting with Tiered Data<br />Think tens of millions<br />Focus users on active data<br />Open cases<br />Warm leads<br />Recent history<br />Segregate inactive data<br />Can be used with Divisions<br />The Archive Data table...<br />...doesn’t have to be custom<br />...doesn’t have to be Salesforce<br />Can use Analytic Snapshots<br />Active Data<br />Standard or Custom Object<br />Standard functionality<br />Archive<br />Batch Apex<br />Scheduled Apex<br />Archive Data<br />Standard or Custom Object<br />Subset of columns<br />Report focused<br />Example: Financial Services Customer<br />145M Activities - 90M Legacy = 55M<br />55M / ~500 Branches = 110,000<br />
    99. 99. 2) Optimize Sharing<br />Use private sharing strategically<br />Enforce ownership to prevent data concentration<br />“Super-owner” individuals<br />Streamline hierarchies<br />Limit depth of nested groups <br />Roles, Groups, Territories, etc.<br />Leverage all capabilities<br />Apex Managed Sharing for custom objects<br />
    100. 100. 3) Leverage Custom Indexes<br />Standard Indexes: Created Date, Last Modified Date, Division, Record Type<br />Administrators can index fields bydesignating them as External IDs<br />Custom Indexes are availablethrough Support<br />Multi-column custom indexes also supported<br />Can be applied based on use case, impact and priority<br /> working on automatically detecting the need<br />Example: Large Japanese Insurance Co.<br />Custom object with 10M records queried regularly by 50,000+ users<br />80x boost in query perf. due to multi-column Custom Index<br />
    101. 101. 3) A Word About Skinny Tables<br />Faster Reports<br />(more rows fit in memory)<br />Database Innovation<br />No work required<br />Managed by<br />2-10x performance for some analytics<br />Also available through Support<br />Fewer<br />rows<br />per<br />fetch<br />More<br />rows<br />per<br />fetch<br />BaseTable<br />SkinnyTable<br />
    102. 102. Data streamed to temporary storage<br />Job updated in Admin Setup<br />Dataset processed in parallel <br />Job<br />Client<br />Processing Servers<br />Processing Thread<br />Processing Thread<br />Data Batches<br />Dequeue batch<br />Send all data<br />Insert/<br />update<br />Check Status<br />Results<br />Retrieve Results<br />Save results<br />4) Bulk Up—Asynchronously<br />
    103. 103. Bulk API<br />The “go-to” option for tens of thousands of records and up<br />Up to 10,000 records in a batch file<br />Asynchronous loading, tracked in Salesforce’s<br />Walkthrough time!<br />Upsertlegacy data into Salesforce—FAST<br />Example: American Insurance Co.<br />230 million records processed in 33 hours,<br />14 hours ahead of schedule<br />
    104. 104. Q&A<br />
    105. 105. Q&A<br />Links:<br /><ul><li>
    106. 106.</li></ul>Post-Session Questions?<br /><ul><li>Brian WiebeTechnical Engagement Manager (West)
    107. 107. Ezra KenigsbergData Architect (Midwest)</li></li></ul><li>D I S C O V E R<br />Visit Customer Success Team at Campground<br />the products, services and resources <br />that help you achieve<br />S U C C E S S<br />Learn about how to win prizes including 10 iPads & more!<br />Discover Training Learning Paths<br />Find us at the Customer Success Team area of Campground at Moscone North<br />Meet Success Experts<br />Learn about Customer Resources<br />Experience Product Demos<br />
    108. 108. How Could Dreamforce Be Better? Tell Us!<br />Log in to the Dreamforce app to submit<br />surveys for the sessions you attended<br />Use the Dreamforce Mobile app to submit surveys<br />OR<br />Every session survey you submit is a chance to win an iPod nano!<br />