Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tuning up Your Process: Delta Record Processing

140 views

Published on

Every day, the analytic requirements of organizations grow at a rate never before seen. These new requirements put immense pressure on organizations to add processes, and ultimately data, to their ecosystems, oftentimes without a plan to tie this information back to a single point of reference. Sound familiar? Master Data Management is a process for utilizing existing relationships and building logical bridges where direct relationships don't exist to tie separated information together. In this session, learn how to identify logical gaps and use Alteryx to fill those gaps to alleviate some of the pressure of memory-intensive processes.

Treyson Marks - Senior Business Systems Analyst, Alteryx ACE

Published in: Data & Analytics
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Tuning up Your Process: Delta Record Processing

  1. 1. # A L T E R Y X 1 9 PRESENTED BY TUNING UP YOUR PROCESS: DELTA RECORD PROCESSING Treyson Marks Treyson.Marks@gmail.com
  2. 2. # A L T E R Y X 1 9 COMPLETE SESSION SURVEYS ATTENTION 2 You were handed a survey as you entered the room. It should take less than 2 minutes to complete Please return your completed surveys B E F O R E YO U L E AV E the room Surveys are anonymous, and we rely on your opinion for improvement
  3. 3. # A L T E R Y X 1 9 3 With Alteryx, I can do a wide range of otherwise specialized functions. TREYSON MARKS When I use Alteryx, I feel the need to make everything faster! A L T E R Y X U S E R S I N C E 2 0 1 6
  4. 4. # A L T E R Y X 1 9 4 TODAY’S AGENDA 1. Processing to the Delta How do we avoid processing records that have been processed before? 2. Hashing Using string fields to create a key when one doesn’t exist 3. Laughing We have a lot to cover so hopefully we can keep it interesting 4. Cass/Geocoding Why do we use these tools and how do we make them better? 5. Fuzzy Matching How do we match without exact values matching up? 6. Importance of Quickness In a limited environment, we all fight for space.
  5. 5. # A L T E R Y X 1 9 5 WHAT WE MEAN WHEN WE SAY “MASTER DATA MANAGEMENT”
  6. 6. # A L T E R Y X 1 9 6 THE PROBLEM • Marketing has 6 different sources of lead information • Each source asks different questions • Each source has some similarities • Jonathan filled out each one differently Source First Last Street Unit City State Zip Email Phone 1 Jonathan Smith 123 Main St. APPT. 2 Kansas City MO 12345 NoEmail 2 Jonathan Smith 123 Main Street #2 Kansas City MO 12345 Jon@Yahoo.com 3 John Smith Jon@Yahoo.com 123-4567 4 Jon Smith 123 Main Str. 2 Kansas City KS 12345 BigBadJonBoy@Gmail.com 5 6 Jon Smith 123 Main KC MO Jon@Yahoo.com (650) 123-4567 6 Tammy Smith 123 Main KC 1 Jon Smith 456 W 1st ST Baltimore MD 54862 NotThatJonSmith@Yahoo.com
  7. 7. # A L T E R Y X 1 9 7 THE SOLUTION • A single source of truth • The best data from each resource Person_ID HouseHold_ID First Last Street Unit City State Zip Email Phone 1 1 Jonathan Smith 123 Main Street #2 Kansas City MO 12345 Jon@Yahoo.com (650) 123-4567 2 1 Tammy Smith 123 Main Street #2 Kansas City MO 12345 (650) 123-4567 3 2 Jon Smith 456 W 1st ST Baltimore MD 54862 NotThatJonSmith@Yahoo.com
  8. 8. # A L T E R Y X 1 9 8 IDENTIFYING THE SIMILARITIES BETWEEN DISPARATE DATA SETS AND CLEANING THEM SO THAT WE CAN MAKE COMPARISONS. WE AREN’T THAT DIFFERENT AFTER ALL
  9. 9. # A L T E R Y X 1 9 WHAT BRINGS RECORDS TOGETHER? BUILDING_ID ADDRESS_ID • Creating an address to the suite • Important in office buildings and apartment complexes HOUSEHOLD_ID INDIVIDUAL_ID • Creating an ID for the street level addresses • Should cover most addresses in the United States • Within an address, how are we grouping people • Multiple family members • Different date ranges • If we are referring to one person, we want to refer to the correct person 9
  10. 10. # A L T E R Y X 1 9 10 CREATING HASH VALUES
  11. 11. # A L T E R Y X 1 9 WHAT IS HASHING? 11 “Hashing is an important Data Structure which is designed to use a special function called the Hash function which is used to map a given value with a particular key for faster access of elements.” • https://www.geeksforgeeks.org/hashing-data-structure/ We are going to assign a key value for strings that sometimes exist across multiple cells.
  12. 12. # A L T E R Y X 1 9 HOW DO WE DO THIS IN ALTERYX? 12 • MD5_ASCII • The string is expected to be only ASCII characters. Unicode® characters are turned into ? before calculating the MD5 hash. • MD5_Unicode • Calculates the MD5 hash of the string. • MD5 – Bad for cryptography, good for validating data.
  13. 13. # A L T E R Y X 1 9 13 UTILIZING CASS AND DOING YOUR OWN LEGWORK. CLEANING YOUR ADDRESSES
  14. 14. # A L T E R Y X 1 9 SCRUBBING YOUR DATA CLEANING BEFORE WE CASS 14 • Hash before cleaning! • Creating your own Key/Value pair database • Create your own scrubbing tools (MACROS!) • Become great at REGEX BadName GoodName St Street St. Street Streat Street Ave Avenue Av Avenue
  15. 15. # A L T E R Y X 1 9 15 COMMON ADDRESS PROBLEMS STREET NAMES COLUMN NUMBERS • Data Set 1 has everything in a single column • Data Set 2 has everything broken out BAD CITY NAMES BAD/MISSING INFO • Street, Ave, Place etc. • Street, St, Str, Streat… • Avenue, Ave, Av… • Spell Biloxi • St. Louis vs. Saint Louis • Zip Code/City boundaries • Missing zips
  16. 16. # A L T E R Y X 1 9 CASS V STREET GEOCODER CASS 16 • Global support • Quarterly updates (massive files) • Provides latitude and longitude • Hierarchy of matches (Street, city, zip) STREET GEOCODER • USPS Data (US and Canada only) • Bi-monthly updates (small files) • Provides congressional districts, post areas, etc. • All or nothing matches (error codes returned)
  17. 17. # A L T E R Y X 1 9 17 CATCHING AS MUCH AS WE CAN • When a match fails, send it through again with different variables • Waterfall Method • Null Zips and Cities
  18. 18. # A L T E R Y X 1 9 18 EMAILS, AND CELL PHONES, AND HOME PHONES, OH MY! CLEANING CONTACT INFORMATION
  19. 19. # A L T E R Y X 1 9 19 COMMON CONTACT INFO PROBLEMS What’s in a Name? • Prefixes/Suffixes • Junior v Jr. • Nicknames • Oswald v Ozzie • Invalid Email • Treyson@Idontyouremails.com • Invalid Phone Numbers and Extensions
  20. 20. # A L T E R Y X 1 9 20 USING FUZZY MATCHING
  21. 21. # A L T E R Y X 1 9 CHOOSING MATCH LEVELS LEVEL 1 21 LEVEL 2 • Last Name, 85% • First Name, 85% • Last Name, 95% • Last Name, 60% • Physical Address, 100% • Last Name, 100% • First Name, 100% LEVEL 3 USUALLY YOU WILL WANT TO CREATE A MATCH HIERARCHY
  22. 22. # A L T E R Y X 1 9 22 JARO AND LEVENSHTEIN DISTANCE MATH FOR WORDS • Jaro – function of how many letters match and the distance between letters that don’t match • Levenshtein – How many adjustments need to be made to make Treyson = Trayson
  23. 23. # A L T E R Y X 1 9 USING THE GROUPING TOOL IF A = B AND B = C THEN A = C 23
  24. 24. # A L T E R Y X 1 9 ALTERYX EXAMPLE 24
  25. 25. # A L T E R Y X 1 9 25 REMEMBER HASHING? RUNNING IT OVER AGAIN
  26. 26. # A L T E R Y X 1 9 AVOID THE REPROCESSING OF RECORDS • Hashing creates the same key from the same string • You can apply the information from previous hash exercises to the new data • Skip geocoding or fuzzy matching 26
  27. 27. # A L T E R Y X 1 9 27 WHY IS ANY OF WHAT WAS SAID TODAY IMPORTANT? FIGHTING FOR SPACE
  28. 28. # A L T E R Y X 1 9 28 LIMITED RESOURCES RUNNING ON THE SERVER • Two jobs at a time • Memory killers • MDM initial Load: 38 hours • MDM delta loads: 45 mins
  29. 29. # A L T E R Y X 1 9 BEFORE YOU LEAVE ATTENTION 29 B E F O R E YO U L E AV E … Please take a moment to complete your evaluation survey. Hand it to the room monitors on your way out.
  30. 30. # A L T E R Y X 1 9 THANK YOU Treyson.Marks@gmail.com 30 (210) 632-9346 TREYSON MARKS

×