The document discusses challenges with managing data and keeping data clean. It notes that ensuring clean, consistent data is provided to customers is important. It also notes that despite best efforts, some invalid or unexpected data will inevitably exist in systems, similar to how some contamination exists in water systems, and outlines some strategies for addressing data quality issues.
3. 5 Questions I Can’t Answer about me
What IsYour Phone Number?
Where DoYou Live?
Who are you?
What is your Date of Birth?
What isYour Name?
4. Irreverent, but serious
Feel free to chat/ask question as we go
Let’s have fun
Let’s learn a few things
Twitter is a great way to spread the word
@datachick
About this session
13. "The basic commandment of
the Water Bureau is to provide
clean, cold and
constant water to its
customers“…
14.
15.
16.
17.
18.
19.
20.
21.
22. If you want your data model to be simple..
Go out and make the
world simple, then
come back to me.
23. Missing Data
“Soloman noticed that when
nothing hit the detector, a
negative reading was being
recorded. But you cannot get
negative energy.Thus, he
contacted scientists at the US
space agency. It turned out that
Soloman had noticed
something no-one else had,
including the NASA experts.
-1 as as a Faux NULL
-1 as as a Faux NULL
33. Karen Says: Names
1. One of the more difficult Data Modeling problems
2. Format is different than content
3. Not all name parts are the same
4. Not all name rules are the same
5. More myths about names than names themselves
http://tinyurl.com/namemyths
44. LoveYour Data… Lake, DBs, Spreadsheets, etc.
Data
UrinalysisTM
is a good data
strategy
Data Hygiene
keeps your
data
contamination
from spreading
Set
expectations
about data
quality
Profile your
data, often.
It’s like a clinic
check that you
are still healthy
50. 1. There is no longer a
concept of Home / Work
phone number
2. We can’t use categories
to tell us when to call
3. We should ask about
times to contact
separately
4. Not all phones are
phones
Karen Says: Phone Data
51. Anticipate International
Data
Don’t Assume Anything
Based on Other Data
Offer Data Correction
Processes
Learn about the Outliers
… What else?
WhatYou Should Do:
52. • What My Name Is
• What My Address Is
• What I Do
• Where I Live
• What my Phone Number
Is
….I don’t know until you tell
me.
Don’t Ask Me: