Data Done Right

  • 1,875 views
Uploaded on

You probably already know that managing data in Salesforce can be a formidable task. But you might not know that it doesn't have to be! In this session, we'll focus on strategies to help you with key …

You probably already know that managing data in Salesforce can be a formidable task. But you might not know that it doesn't have to be! In this session, we'll focus on strategies to help you with key data tasks such as data migration, managing large data volumes, org merges, and data consolidations.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,875
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
93
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Ezra
  • Ezra
  • BrianWe also offer over 200+ FREE app templates built by Salesforce.com employees called Force.com Labs apps.These can be used out of the box as stated or you can choose to customize them to match your specific business process. Many people use them as a basis to kick off their custom app development projects.
  • Both can be Custom or Standard ObjectUse case for “standard” archive object - Custom pre-loads all accounts into Lead Object - “lead” gets promoted to an active account only when they call - Millions of passive/archived accounts can exist without them being “in the way”
  • REST-based, asynchronous API optimized for loading large sets of data.Enable high volume integration with Salesforce (volume)Enable integration that has to finish in a certain window of time (speed)

Transcript

  • 1. Data Done Right
    Administrators
    Brian Wiebe: Technical Engagement Manager, salesforce.com
    Ezra Kenigsberg: Data Architect, salesforce.com
  • 2. Safe Harbor
    Safe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services.
    The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of intellectual property and other litigation, risks associated with possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year ended January 31, 2010. This documents and others are available on the SEC Filings section of the Investor Information section of our Web site.
    Any unreleased services or features referenced in this or other press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.
    TinyURL.com/SalesforceSafeHarbor
  • 3. Purpose
    To walk through three big data issues that can help make you an even-better administrator.
  • 4. This Session… Other Sessions…
    • Practical demos—things you can do TODAY
    • 5. Required
    • 6. Data Loader
    • 7. Microsoft Excel
    • 8. A decent text editor(I use Notepad++)
    • 9. Optional
    • 10. Cloud Converter
    • 11. Synchronizer(requires Microsoft Access)
    Bigger-picture data strategy
    Professional third-party tools
  • 12. Overview
    Introduction 5 min
    Moving Data 15 min
    Cleaning Data 15 min
    Working with Large Data Volumes 15 min
    Q&A until they kick us out
    Prior to making any major changes to your org:
    BACK UP!
  • 13. Ezra Kenigsberg
    salesforce.com
  • 14. Moving Data
  • 15. The Scenario
    The scenario we’re walking through:
    Gotta import new records by tomorrow
    We’re creating a repeatable, documented process
    “Just load it” fails the hit-by-a-bus test…
    …is difficult to audit after the fact
    …may not be reversible if I’ve made a mistake
  • 16. Links and Tools
    Useful links:
    • developer.force.com/consultants
    • 17. EzraKenigsberg.com
    Our tools:
    • Required
    Data Loader
    Microsoft Excel
    A decent text editor (I use Notepad++)
    • Optional
    Cloud Converter
    Synchronizer (requires Microsoft Access)
  • 18. Useful Links: developer.force.com/consultants
    Dedicated pages for
    • Data Migration
    • 19. Large Data Volumes
    • 20. many others
  • Useful Links: EzraKenigsberg.com
    Dedicated sections for
    • Handy Tools
    • 21. Reference Links
    • 22. Presentations
    • 23. Requests for Salesforce & Data Loader Improvements
  • HOW do I
    HOW do I
    generate?
    map?
    generate?
    map?
    load?
    load?
    Three Key Steps
    Three steps:
    How should I map my data?
    How can I automate the generating of CSVs?
    How can I load data in an auditable way?
  • 24. Moving Data #1: How Do I Map?
    Create a mapping file:
    Create list of source fields in source file/s
    Create list of API field names (not the UI labels!)
    Get them with Data Loader or Cloud Converter
    Match source fields to API field names
    HOW do I
    map?
  • 25. Grab the Source Field Names
    HOW do I
    map?
    1
    Transposed in Excel
    Excel 2003:
    Edit | Paste Special | Transpose
    Excel 2007 or 2010:Home | Paste | Paste Special | Transpose
    2
  • 26. Grab the API Field Names (slide 1 of 2)
    HOW do I
    map?
    Model Metrics’ free utility “Cloud Converter” is a straightforward way to export metadata
    1
    2
  • 27. Grab the API Field Names (slide 2 of 2)
    HOW do I
    map?
    1
    “Cloud Converter” exports a tab for every object...
    2
    ...and a row for every field
  • 28. Build Out Mapping File
    HOW do I
    map?
  • 29. Index Column
    HOW do I
    map?
    An index column
    facilitates later
    sorting
  • 30. A Column of Arrows
    HOW do I
    map?
    Arrows remind people of the DIRECTION of data
  • 31. Not Every Field Has to Map
    HOW do I
    map?
    Not every column has to map
  • 32. Keep it Simple!
    HOW do I
    map?
    The most common cause of bad data maps?
    Too much stuff!
    Put in only the columns you’ll UPDATE and USE
  • 33. Moving Data #2: How Do I Generate CSVs?
    Import legacy data into a tool that enables REUSE
    Raw data comes in, ready-to-load data comes out
    Minimize manual steps. Tools I’m not so thrilled with:
    Microsoft Excel
    Import Wizard
    Each subsequent load becomes a straightforward process
    Test-then-Production
    Follow-up loads
    Which tool should I use?
    Good question! Can you wait eight slides?
    HOW do I
    generate?
  • 34. Moving Data #3: How Do I Load?
    HOW do I
    load?
    Loading Best Practices:
    Group work by folder
    Keep all files together
    Auto-Match
    Upsert!
  • 35. Folder Naming
    HOW do I
    load?
    YYYY-MM-DD#NN
    forces folders
    to sort in
    chronological
    order
  • 36. Keep All Files Together
    HOW do I
    load?
    Loading Best Practices:
    Group work by folder
    Keep all files together
    Auto-Match
    Upsert!
  • 37. Folder Contents (1 of 3)
    HOW do I
    load?
    Save source file, success files, and error files all in same directory
  • 38. Folder Contents (2 of 3)
    HOW do I
    load?
    When loading errors:
    • Copy error file and
    • 39. Use as source for next load
    1
    2
  • 40. Folder Contents (3 of 3)
    HOW do I
    load?
  • 41. Dummy-Proof the Auto-Match
    HOW do I
    load?
    Loading Best Practices:
    Group work by folder
    Keep all files together
    Auto-Match
    Upsert!
  • 42. Perfect Auto-Match
    HOW do I
    load?
    With a file created this way, Auto-Match gets every field mapped
  • 43. The Smarter Way to Load Data: Upsert!
    HOW do I
    load?
    Loading Best Practices:
    Group work by folder
    Keep all files together
    Auto-Match
    Upsert!
  • 44. Why UPSERT and EXTERNAL IDs are Great
    a
    a
    a
    With Upsert & External IDs
    c
    c
    c
    a
    a
    a
    a
    c
    c
    c
    c
    Without Upsert & External IDs
    a
    a
    a
    a
    c
    c
    c
    c
  • 45. HOW do I
    HOW do I
    generate?
    map?
    load?
    Synchronize?
    Synchronizer has functions that facilitate all these tasks:
    Mapping
    Generating CSVs
    Loading data
    • Synchronizer is an OPEN SOURCE application that is NOT SUPPORTED by Salesforce
    How Synchronizer Automates All This
  • 46. Synchronizer Walkthrough
    In this walkthough, we’ll:
    Import legacy data into Synchronizer
    Map the data
    Document the data map
    Migrate the data into Salesforce
    Review the files created
    HOW do I
    Synchronize?
  • 47. Synchronizer Review: Import and Map
    HOW do I
    Synchronize?
    One-button CSV importer (not perfect, but fast and simple):
    Grab API fields, then tie legacy fields to API fields:
  • 48. Synchronizer Review: Create Data Map
    HOW do I
    Synchronize?
    One-click data map generator:
  • 49. Synchronizer Review: Migrate
    HOW do I
    Synchronize?
    One-screen UI for loading data
  • 50. Synchronizer Review: Organize Files
    HOW do I
    Synchronize?
    Folder and file discipline:
    Reimporting of success and error files for use in future loads
  • 51. Synchronizer Review: Other Goodies
    Some other Synchronizer functions not covered:
    Ability to run multiple steps in sequence
    Scheduling
    Mass-create tasks
    Creating Users
    Assigning Users to Groups
    Custom reports
    Storage usage by User, by object
    HOW do I
    Synchronize?
  • 52. Best Practices
    Build a mapping file
    Leverage a tool to generate CSVs
    Use loading Best Practices
    Get Synchronizer and help make it better!
    HOW do I
    generate?
    map?
    load?
  • 53. Brian Wiebe
    salesforce.com
  • 54. Cleaning Data
  • 55. Defining a Broad Topic
    What is Data Quality?
    Combination of Processes, Policies and Tools
    Involves Governance, Enforcement, Prevention Goal is not perfection
    What are the typical Issues?
    Duplicates (Account, Contact), Incomplete information,
    Stale or Untouched data, Inconsistent values, Incorrect linkages
    What are the typical causes?
    Not part of Budget, Unmeasurable problem
    No Action Plan, No Ownership, Lack of Training, Non-optimized salesforce.com
    43
    Key Data Quality Concepts
  • 56. The Full Data Quality Lifecycle
    Our 3-step, iterative process quickly identifies problems, fixes them and helps you maintain high data quality over time
    Data Quality
    Assessment
    • Profile data
    • 57. Analyze results
    • 58. Identify problems and next steps
    Assess
    Cleanse
    Protect
    Data Cleanse
    • Standardize & Cleanse
    • 59. Supplement & Enrich
    • 60. Test & Load
    Data Protect
    • Train users
    • 61. Enforce processes
    • 62. Monitor on-going quality
  • Data Quality Assessment
  • 63. Data Quality Assessment
  • 64. Project Planning
    Strong Sponsorship
    Committed Involvement & Availability (The DQ Assessment helps justify this)
    Appreciation, Awareness & Understanding of Data complexities
    Limiting Scope / Phased Approach
    ACHIEVABLE Goals
    Define critical quick-win items for Phase 1 (focus on biggest issues for end-users)
    Test, Test, Test
    Leverage your Sandbox Environment
    Data Quality cleansing is a “destructive” process
    Plan for End-user involvement
    Data Quality is an iterative process – and MUST involve end-user buy-in and input
    If the foundation is off..
  • 65. Begin Governance & Stewardship
    Involve IT and Business users
    Monitoring Data Quality Dashboards – report back monthly
    Use Salesforce features (e.g., Data Validation Rules, Conditional Workflow field updates, Analytic Snapshots for trending)
    Archive un-used Data
    Data must be USEFUL to the Business and must be justifiable
    Candidates for archiving : last updated > 1 year ago, no child records, Missing Core Required Fields
    Correct existing data
    Users who have left company and STILL own records, Find/Replace picklist values, Apply Naming Standards
    Data Quality Solution Considerations
  • 66. Identify and remove Dupes
    Low hanging fruit:
    Simple dupes: e.g.,) match on a unique key like email address
    Flag dupes for merging in Salesforce
    Leverage available de-dupe tools
    Complex definition of dupe: e.g.,) fuzzy matching on name+address
    Define your rules (matching rules, merging rules)
    Enrich & Append
    Enrich your existing data
    Add NEW data for known companies
    3rd party data vendors – helpful in creating Account hierarchies, helpful for accurate Contact Info – especially at various levels in the Company
    Data Quality Solution Considerations
  • 67. Limit points of Entry
    List Imports restricted to certain profiles
    Control data being entered – without overwhelming users
    Leverage Sales Intelligence Tools, Dupe Prevention, Address Validation
    Automation / Integration
    Integrations, Master Data Management
    Nightly Batch Updates
    Data Quality Solution Considerations
  • 68. Cleansing Environment
    Staging
    • Transform & Re-model
    • 69. Cleanse & Standardize
    • 70. Enrich & De-dupe
    • 71. Iterate
    • 72. Validate with Business Users
    Production
    Staging
  • 73. 1
    Validate
    & Modify
    Cleansing Process
    3
    4
    2
    5
    Enrich(Optional)
    Standardize
    De-dupe
    Cleanse
    Validate
    Company
    Name&Address
    Names
    Identify,
    Match &Score
    Find&
    Replace
    Load to
    Sandbox
    acme incorpAcme Inc
    Hot  HighCold  Low
    J. Smith, John Smith80%
    HierarchyData
    Addresses
    Merge
    NamingConventions
    Acme Inc HQ
    Acme UK
    J. Smith, John Smith John Smith
    US, U.S. U.S.A USA
    Acme-Widgets-453
    Demographics
    Postal
    Standards
    Re-parent
    ChildRecords
    DataTransformation
    Load to
    Production
    Account: Division, Opportunity, Contact
    Mergers, acquisitions, spin-offs
    Archiving &Filtering
  • 74. Protect Your Data
    Safeguard your cleansed data and prevent future deterioration.
    Train
    Enforce
    Monitor
    • User Training
    • 75. Naming Conventions
    • 76. Address Conventions
    • 77. Dupe. Prevention Process
    • 78. Data Importing Policies
    • 79. Required Fields
    • 80. Default Values
    • 81. Data Validation Rules
    • 82. Workflow Field Updates
    • 83. Web-to-Lead Restrictions
    • 84. DataQualityDashboards
    • 85. DataQualityReassessment
    • 86. AppExchangeTools
  • What tools do I use?
  • 87. The AppExchangeThe Trusted Cloud Computing Marketplace
    1000+
    Pre-Integrated Apps
    300+
    Services
    4000+
    Customer Reviews
  • 88. 200+ Free Apps to Get You Started
    • Reports & dashboards to end-to-end templates
    • 89. Fully customizable
  • AppExchange Tools Worth Checking Out!
    Cloud Converter (Free)
    Synchronizer (Free)
    Jigsaw for Salesforce (Paid)
    CRM Fusion (Paid)
    Data Quality Dashboard (EE edition)
  • 90. Data Quality Analysis Dashboard (EE Edition)
    All reports pull from just TWO formula fields.
  • 91. The Formula Field
    You can EXPAND these formulas
    to include YOUR custom fields.
    59
  • 92. Brian WiebeEzra Kenigsberg
    salesforce.com
  • 93. Managing Large Data Volumes (LDV)
  • 94. What Do We Mean by Large Data Volumes?
    You know you’ve got scale when …
    1,000s of Users
    1,000,000s of records for a single Object
    Role or Territory hierarchy > dozens of levels
    Public Groups nested > 5 levels deep
    A single User, Queue, Role, Public Group, or Territory:
    Owning 10,000s of records
    Seeing 10,000s of records as a result of sharing
    10,000s of Public Groups
    1,000s of Territories
    These are NOT hard limits—only useful guides to proceed carefully!
    Talk to your Account Executive
  • 95. Where Would We Proceed Carefully?
    User Interface
    Reports, Dashboards, List Views
    Searches
    API
    Queries
    Integration
    Synchronizing with end-user (Outlook, Mobile) apps
  • 96. What Options are Available?
    Segment
    Optimize sharing
    Leverage indexes & skinny tables
    Move data asynchronously
  • 97. 1) Segmenting with Divisions
    Think millions
    Use data access patterns
    Does everyone really look at everything?
    Acts like DB partitions
    Breaks up big objects
    Aim for <1M / division
    Used for performance of search, reports, dashboards, and list views
    Not a security measure!
    By Geography
    By Responsibility
    Example: Financial Services Customer
    13M Clients / ~500 Branches = 26,000 records per Division
  • 98. 1) Segmenting with Tiered Data
    Think tens of millions
    Focus users on active data
    Open cases
    Warm leads
    Recent history
    Segregate inactive data
    Can be used with Divisions
    The Archive Data table...
    ...doesn’t have to be custom
    ...doesn’t have to be Salesforce
    Can use Analytic Snapshots
    Active Data
    Standard or Custom Object
    Standard functionality
    Archive
    Batch Apex
    Scheduled Apex
    Archive Data
    Standard or Custom Object
    Subset of columns
    Report focused
    Example: Financial Services Customer
    145M Activities - 90M Legacy = 55M
    55M / ~500 Branches = 110,000
  • 99. 2) Optimize Sharing
    Use private sharing strategically
    Enforce ownership to prevent data concentration
    “Super-owner” individuals
    Streamline hierarchies
    Limit depth of nested groups
    Roles, Groups, Territories, etc.
    Leverage all capabilities
    Apex Managed Sharing for custom objects
  • 100. 3) Leverage Custom Indexes
    Standard Indexes: Created Date, Last Modified Date, Division, Record Type
    Administrators can index fields bydesignating them as External IDs
    Custom Indexes are availablethrough Support
    Multi-column custom indexes also supported
    Can be applied based on use case, impact and priority
    Salesforce.com working on automatically detecting the need
    Example: Large Japanese Insurance Co.
    Custom object with 10M records queried regularly by 50,000+ users
    80x boost in query perf. due to multi-column Custom Index
  • 101. 3) A Word About Skinny Tables
    Faster Reports
    (more rows fit in memory)
    Database Innovation
    No work required
    Managed by salesforce.com
    2-10x performance for some analytics
    Also available through Support
    Fewer
    rows
    per
    fetch
    More
    rows
    per
    fetch
    BaseTable
    SkinnyTable
  • 102. Data streamed to temporary storage
    Job updated in Admin Setup
    Dataset processed in parallel
    Job
    Client
    Processing Servers
    Processing Thread
    Processing Thread
    Data Batches
    Dequeue batch
    Send all data
    Insert/
    update
    Check Status
    Results
    Retrieve Results
    Save results
    4) Bulk Up—Asynchronously
  • 103. Bulk API
    The “go-to” option for tens of thousands of records and up
    Up to 10,000 records in a batch file
    Asynchronous loading, tracked in Salesforce’s
    Walkthrough time!
    Upsertlegacy data into Salesforce—FAST
    Example: American Insurance Co.
    230 million records processed in 33 hours,
    14 hours ahead of schedule
  • 104. Q&A
  • 105. Q&A
    Links:
    • developer.force.com/consultants
    • 106. ezrakenigsberg.com
    Post-Session Questions?
    • Brian WiebeTechnical Engagement Manager (West)bwiebe@salesforce.com
    • 107. Ezra KenigsbergData Architect (Midwest) ekenigsberg@salesforce.com
  • D I S C O V E R
    Visit Customer Success Team at Campground
    the products, services and resources
    that help you achieve
    S U C C E S S
    Learn about how to win prizes including 10 iPads & more!
    Discover Training Learning Paths
    Find us at the Customer Success Team area of salesforce.com Campground at Moscone North
    Meet Success Experts
    Learn about Customer Resources
    Experience Product Demos
  • 108. How Could Dreamforce Be Better? Tell Us!
    Log in to the Dreamforce app to submit
    surveys for the sessions you attended
    Use the Dreamforce Mobile app to submit surveys
    OR
    Every session survey you submit is a chance to win an iPod nano!