Aen007 Kenigsberg 091807

  • 977 views
Uploaded on

Full session information and video available at Successforce.com.

Full session information and video available at Successforce.com.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
977
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
42
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Applied Data Quality Ezra Kenigsberg Data Architect salesforce.com Admin III: Expanding into new areas
  • 2. Safe Harbor Statement
    • “ Safe harbor” statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements including but not limited to statements concerning the potential market for our existing service offerings and future offerings. All of our forward looking statements involve risks, uncertainties and assumptions. If any such risks or uncertainties materialize or if any of the assumptions proves incorrect, our results could differ materially from the results expressed or implied by the forward-looking statements we make.
    • The risks and uncertainties referred to above include - but are not limited to - risks associated with possible fluctuations in our operating results and cash flows, rate of growth and anticipated revenue run rate, errors, interruptions or delays in our service or our Web hosting, our new business model, our history of operating losses, the possibility that we will not remain profitable, breach of our security measures, the emerging market in which we operate, our relatively limited operating history, our ability to hire, retain and motivate our employees and manage our growth, competition, our ability to continue to release and gain customer acceptance of new and improved versions of our service, customer and partner acceptance of the AppExchange, successful customer deployment and utilization of our services, unanticipated changes in our effective tax rate, fluctuations in the number of shares outstanding, the price of such shares, foreign currency exchange rates and interest rates.
    • Further information on these and other factors that could affect our financial results is included in the reports on Forms 10-K, 10-Q and 8-K and in other filings we make with the Securities and Exchange Commission from time to time. These documents are available on the SEC Filings section of the Investor Information section of our website at www.salesforce.com /investor . Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements, except as required by law.
  • 3. Purpose
    • To give you data techniques that can help make you a better administrator
  • 4. This session… Other sessions…
    • Practical demos—things you can do TODAY
    • Tactical ideas and technical detail
    • Data Loader, Excel, Excel Connector, Access
    • Bigger-picture data strategy
    • Third-party tools
  • 5. Outside this session: Please use these resources
    • Successforce.com
    • Salesforce.com’s Data Services group
      • Brian Wiebe, Program Director
      • Regional Data Architects
    • Product demos downstairs
    • Developer Network: salesforce.com/developer
    • This Year’s Sessions:
      • Data Data Everywhere!
      • Data Quality: Who Doesn’t Want It?
    • Last Year’s Sessions:
      • xrl.us/DataDataData
      • xrl.us/ExcelConnectorDemo
      • xrl.us/SystemOverload
  • 6. Overview
    • De-duplicating existing data 20 min
    • Managing data loads better 20 min
    • Keeping data clean 5 min
    • Q&A 10 min
  • 7. De-duplicating
    • I’ve gotta de-dupe the worst records by tomorrow
    • Four questions:
      • What qualifies as a duplicate?
      • What ’s my data look like?
      • How will I determine matches?
      • How will I merge records?
    WHAT’S HOW do I my data? match? a dupe? WHAT’S HOW do I my data? match? a dupe? merge? merge?
  • 8.
    • Typical questions:
    • Should we track individual physical customer locations ?
    • Should we merge all Accounts in a single country/region ?
    • Is it okay to merge clearly different records ?
    • Do we need different definitions of “dupe” for different business units/regions ?
    • . . . answers to these questions determine
    • Hierarchy Depth
    • Records per Customer
    De-duplicating 1: What’s a dupe? WHAT’S a dupe? A dupe’s definition determines the dataset!
  • 9. De-duplicating 1: What’s a dupe? (cont’d)
    • PROBLEM:
    • Too many records in org
    • Successive de-dupe runs were not eliminating as many records as expected
    • SOLUTION:
    • Make records more generalized by
      • merging obvious duplicates
      • AND
      • merging different physical-location records within US into unified records
    WHAT’S a dupe? Fortune 500 company 8,000+ Employees Operates in 30+ countries Case Study:
  • 10.
    • Some hierarchy possibilities:
            • One-and-only-one Account record
            • Single parent Account and many children (no grandparents or grandchildren)
            • Multi-tier
    • Multi-tier example: D&B implemented within Salesforce
    • Global Ultimate one across the entire world
      • Domestic Ultimate one per country
        • Branch multiple physical locations
          • Subsidiary one per physical location
    De-duplicating 1: What’s a dupe? (cont’d) Decreasing Ease-of- Maintenance WHAT’S a dupe?
  • 11.
    • For this example:
      • We’ll de-dupe to the physical-location level
      • We’ll allow hierarchy to be multi-tier
      • We’ll use Excel Connector
        • Following steps apply regardless of de-dupe tool
    De-duplicating 1: What’s a dupe? (cont’d) WHAT’S a dupe?
  • 12. And now. . . a tool
    • Excel Connector
      • Work with Salesforce data directly in Excel: Insert, Update, Extract, Delete
      • Too slow for large data volumes (typ. more than hundreds)
      • No longer being updated, not supported by salesforce.com (alas)
      • xrl.us/ExcelConnector
    • Dreamforce session “Excel Connector: Your Golden Ticket to Clean Data”
      • xrl.us/ExcelConnectorDemo
    • Apex tools (including Excel Connector)
      • xrl.us/ApexTools
    WHAT’S my data?
  • 13. De-duplicating 2: What’s my data?
    • First stop: Names
    • Examine the toughest cases to get a feel for all data
      • How do apparent dupes’ names seem to match?
        • Perfectly?
        • Systematically?
        • Randomly?
      • Sort
      • Filter
    WHAT’S my data?
  • 14. Initial data set Look for a tough case, then sort and filter to get… WHAT’S my data?
  • 15. (1) Sorted and (2) filtered on name … a set of records to work with WHAT’S my data?
  • 16. De-duplicating 3: How do I match?
    • Look to other fields after Name
      • Account numbers/IDs
      • Phone numbers
        • Scrub to raw numbers
      • Addresses
        • Select address fields that look promising
        • Concatenate as needed; Name & ZIP, Street & ZIP
    HOW do I match?
  • 17. Numbers/IDs … IDs/Numbers don’t help here HOW do I match?
  • 18. De-duplicating 3: How do I match? (cont’d)
    • Look to other fields after Name
      • Account numbers/IDs
      • Phone numbers
        • Scrub to raw numbers
      • Addresses
        • Select address fields that look promising
        • Concatenate as needed; Name & ZIP, Street & ZIP
    HOW do I match?
  • 19. Scrubbing phone numbers HOW do I match?
  • 20. Scrubbing phone numbers (cont’d) Easier to line up matches HOW do I match?
  • 21. De-duplicating 3: How do I match?
    • Look to other fields after Name
      • Account numbers/IDs
      • Phone numbers
        • Scrub to raw numbers
      • Addresses
        • Select address fields that look promising
        • Concatenate as needed; Street & ZIP, Name & ZIP
    HOW do I match?
  • 22. Concatenating address parts … consistent addresses … consistent names Better when we have… HOW do I match?
  • 23. De-duplicating 3: How do I match?
    • Use these techniques
      • Scrubbing Native & Third-Party
      • Concatenating Native & Third-Party
      • Address-standardizing Third-Party
      • Address lookup Third-Party
      • Fuzzy matching Third-Party
      • Automated scoring Third-Party
    • Widespread merging requires a third-party tool
      • CRMFusion’s DemandTools
      • Trillium’s Diamond Data
      • Informatica’s Data Quality
      • Microsoft’s SQL Server 2005, while not specifically adapted to Salesforce, provides fuzzy matching
      • Many other solutions downstairs!
    HOW do I match?
  • 24. De-duplicating 4: How do I merge?
    • Designate a Winner rec with all proper data
    • Modify records:
      • Third-Party: create “ID” field and “Winner/Loser” field
      • Salesforce: modify Name field
        • Winner: “merge” & generic Account Name & “winner”
        • Loser: “merge” & generic Account Name
    HOW do I merge?
  • 25. Modifying names in Salesforce HOW do I merge?
  • 26. Merging in the Salesforce UI HOW do I merge?
  • 27. Merging in the Salesforce UI (cont’d) HOW do I merge?
  • 28. De-duplicating Key lessons
    • Establish what qualifies as a duplicate before starting
    • Use a tough case to get a feel for the data
    • 3a. Determine which other fields can be used
      • Account numbers/IDs
      • Phone numbers
      • Addresses
    • 3b. Determine which methods can be used
      • Scrubbing Native & Third-Party
      • Concatenating Native & Third-Party
      • Address-standardizing Third-Party
      • Address lookup Third-Party
      • Fuzzy matching Third-Party
      • Automated scoring Third-Party
    • Designate a Winner record and merge Losers into it
    a dupe? WHAT’S HOW do I my data? match? merge?
  • 29. Managing data loads better
    • Gotta import new records by tomorrow
    • We’re creating a repeatable, documented process
      • “ Just load it”…
        • … is difficult to audit after the fact
        • … may not be reversible if I’ve made a mistake
      • Best practices for loading
  • 30. Managing data loads better (cont’d)
    • Three steps:
      • How should I map my data?
      • How can I automate the generating of CSVs?
      • How can I load data in an auditable way?
    HOW do I generate? load? map? HOW do I generate? load? map?
  • 31. And now. . . a tool
    • Data Loader
      • Insert, Update, UPSERT , Extract, Delete via CSVs
      • Supported by salesforce.com
      • xrl.us/DataLoader
    • Apex tools (including Data Loader)
      • xrl.us/ApexTools
    HOW do I map?
  • 32. And now. . . a tool (cont’d)
    • Relational Database:
    • Data stored in tables
    • Capable of importing CSVs
    • Capable of creating joins between tables
    HOW do I map?
  • 33. And now. . . a tool (cont’d)
    • Query functionality:
    • Queries/views stored separately from tables
    • Queries/views use SQL (Structured Query Language)
    • Bonus: toggling between SQL and graphical interfaces
    HOW do I map?
  • 34. Managing data loads better 1: How do I map?
    • Create a mapping file
      • Create list of fields in legacy file/s
      • Create list of API field names (not the UI field labels!)
        • Get them with Data Loader
    • Tie Force.com API field names to legacy fields
    HOW do I map?
  • 35. Grab the source field names 1 2 Transposed using Excel’s “ Edit | Paste Special | Transpose” command HOW do I map?
  • 36. Grab the Force.com API field names 1 2 Transposed using Excel’s “ Edit | Paste Special | Transpose” command HOW do I map?
  • 37. Build out mapping file An index column facilitates later sorting Not every column has to map Comment column is useful for concatenates, lookups HOW do I map?
  • 38. Managing data loads better 2: How do I generate CSVs?
    • Import legacy data into Access
      • Reusable tool – Great for subsequent loads
        • Test-then-production
        • Pilot-then-phase 1
    • Change all fields to Text
    • Concatenate strings to create query
    HOW do I generate?
  • 39. Importing into Access 1 2 HOW do I generate?
  • 40. Managing data loads better 2: How do I generate CSVs?
    • Import legacy data into Access
      • Reusable tool – Great for subsequent loads
        • Test-then-production
        • Pilot-then-phase 1
    • Change all fields to Text
    • Concatenate strings to create query
    HOW do I generate?
  • 41. Access table design After import, change field types to Text HOW do I generate?
  • 42. Managing data loads better 2: How do I generate CSVs?
    • Import legacy data into Access. Why?
      • Reusable tool – Great for subsequent loads
        • Test-then-production
        • Pilot-then-phase 1
    • Change all fields to Text
    • Concatenate strings to create query
    HOW do I generate?
  • 43. Concatenate strings to create query Concatenation formula produces clauses suitable for pasting in SQL SELECT query HOW do I generate?
  • 44. Customize SQL where needed Customized, concatenated DESCRIPTION field (as specified in mapping document) Lookups to other tables would also require customization HOW do I generate?
  • 45. Managing data loads better 3: How do I load?
    • Save query as CSV
    • Folder naming:
      • Use Date & Task format: [YYYY]-[MM]-[DD]#[NN] [Object] [Task] [Desc]
    • Save source, success, error files in appropriate folder
    • Note how Auto-Match works with CSV
    HOW do I load?
  • 46. Folder naming YYYY-MM-DD#NN forces folders to sort in chronological order HOW do I load?
  • 47. Managing data loads better 3: How do I load?
    • Save query as CSV
    • Folder naming:
      • Use Date & Task format: [YYYY]-[MM]-[DD]#[NN] [Object] [Task] [Desc]
    • Save source, success, error files in appropriate folder
    • Note how Auto-Match works with CSV
    HOW do I load?
  • 48. Folder contents Save source file, success files, and error files all in same directory HOW do I load?
  • 49. Folder contents (cont’d) 1 2
    • When loading errors:
    • Copy error file and
    • Use as source for next load
    HOW do I load?
  • 50. Managing data loads 3: How do I load?
    • Save query as CSV
    • Folder naming:
      • Use Date & Task format: [YYYY]-[MM]-[DD]#[NN] [Object] [Task] [Desc]
    • Save source, success, error files in appropriate folder
    • Note how Auto-Match works with CSV
    HOW do I load?
  • 51. Perfect auto-match With a file created this way, Auto-Match gets every field mapped HOW do I load?
  • 52. Managing data loads Key lessons
    • Build a mapping file to document how data gets transformed
    • Leverage a tool (in this example, Access) to generate CSVs
      • Concatenating strings can help save time mapping
    • Use a folder & file name discipline to make your work self-documenting and easy-to-follow
      • Use Date & Task format: [YYYY]-[MM]-[DD]#[NN] [Object] [Task] [Desc]
    HOW do I generate? load? map?
  • 53. Keeping data clean
    • Record DeDup 1.4 xrl.us/RecordDeDupV14
    • Data Quality Dashboards xrl.us/DataQualityDashboardsV1
    • Custom Reports (go crazy!)
  • 54. Session Feedback Let us know how we’re doing!
    • Please score the session from 5 to 1 (5=excellent,1=needs improvement) in the following categories:
      • Overall rating of the session
      • Quality of content
      • Strength of presentation delivery
      • Relevance of the session to your organization
    We strive to improve, t hank you for filling out our survey.
    • Additionally, please score each individual speaker on:
      • Overall delivery of session
  • 55. Q&A
    • Post-Session Questions? Ezra Kenigsberg Data Architect (Midwest) [email_address]
    • Links:
      • Successforce.com
      • salesforce.com/developer
      • Previous Dreamforce presentations:
        • xrl.us/DataDataData
        • xrl.us/ExcelConnectorDemo
        • xrl.us/SystemOverload
      • Data tools:
        • xrl.us/ApexTools
        • xrl.us/DataLoader
        • xrl.us/ExcelConnector
      • Keeping data clean:
        • xrl.us/RecordDeDupV14
        • xrl.us/DataQualityDashboardsV1
  • 56. Extras: Ezra’s Toughest Data Projects
    • De-duplication
    • Data normalization/denormalization
      • Mapping a table of addresses or phone numbers to specific fields according to rules
    • Data flow-through
      • “ If value is not in table A, use value in table B”
    • Forcing free text entries to behave like external keys
      • Dealing with typos, whitespace
    • Attributing record ownership
  • 57. Extras: Manage your time! Make shortcuts!
    • Get good with keyboard shortcuts
      • We all know Alt+Tab…
      • Alt+Spacebar and its subsequent accelerators
      • Use accelerator keys…
        • … in naming app & internet shortcuts
        • … in navigating menus
      • Learn individual apps’ keystrokes
    • Shortcuts for apps and sites
      • Populate your Start Menu or Taskbar
        • Sort your Programs menu often
      • Ctrl+Alt+(letter)
    • Use your Bookmarks/Favorites
      • Shortcuts for Salesforce tabs
      • Shortcuts for Salesforce setup links
      • Users want to go to a specific page?
        • Help them put it on their Start Menus, Taskbars, or Desktops
    • Be curious!
    … after before…
  • 58. Extras: Ezra’s favorite keystrokes
    • User-defined app shortcuts: Ctrl+Alt+(letter)
    • Windows app menu: Alt+Spacebar
    • Show desktop toggle: Windows+D
    • Menu navigation: Alt, then letters
    • Excel select row: Shift+Spacebar
    • Excel select column: Ctrl+Spacebar
    • Excel insert: Ctrl+Shift+Plus
    • Excel delete: Ctrl+Minus
    • Excel numformats: Ctrl+Shift+(number)
    • Excel hide (unhide) row: Ctrl+(Shift)+9
    • Excel hide (unhide) column: Ctrl+(Shift)+0
    • Excel edit: F2
  • 59. Extras: Other cool data tricks
    • Use FixID() to convert 15-char IDs to 18-char IDs
      • Funnction available in Excel Connector
    • Use Pivot Tables to quickly list and count unique values
      • Skip unneeded Wizard steps
    • Fill in blank cells in a table with a killer combo:
      • Go To… Special | Blanks
      • Ctrl+Enter