Alex Meadows
TDWI Carolinas Meetup – October 26 2018
Ethics In a Data Driven World
2
● General warning – The examples shared can be very
unsettling and uncomfortable.
● Trying to keep bias out of these conversations can be difficult,
especially when talking actual examples.
● Let’s do our best 
● Also, while an answer may be obvious to you – it may be
slightly/completely different to others, please respect that
3
4
● Target (and most companies) assigns every
customer a Guest ID
- loyalty/debit/credit card
- Email
- Name
● Tracks every purchase ever made
● Attaches any demographic information known
● New parents are a gold mine
- Don’t care as much about brand loyalty
- Frazzled and overwhelmed
● Birth records are public
- Everyone markets to parents upon birth
● How can companies market to expectant
parents in first/second trimester?
5
“[Pole] ran test after test, analyzing the data, and before
long some useful patterns emerged. Lotions, for
example. Lots of people buy lotion, but one of Pole’s
colleagues noticed that women on the baby registry
were buying larger quantities of unscented lotion
around the beginning of their second trimester.
Another analyst noted that sometime in the first 20
weeks, pregnant women loaded up on supplements
like calcium, magnesium and zinc. Many shoppers
purchase soap and cotton balls, but when someone
suddenly starts buying lots of scent-free soap and
extra-big bags of cotton balls, in addition to hand
sanitizers and washcloths, it signals they could be
getting close to their delivery date.”
6
7
8
9
10
"We’re appalled and genuinely sorry that this
happened. We are taking immediate action to
prevent this type of result from appearing. There
is still clearly a lot of work to do with automatic
image labeling, and we’re looking at how we can
prevent these types of mistakes from happening
in the future.“ – Google rep to Ars Technica
11
But computer algorithms aren't perfect, and when
they identify images incorrectly, the results can
be disastrous.
Some concentration camp photos received
inappropriate tags, including "sport" and "jungle
gym."
Flickr had also been tagging some images of
people as "ape" and "animal," including a photo
of a black man named William taken by
photographer Corey Deshon, according to the
Guardian.
The photo service had also labeled a white
woman wearing face paint as "ape" and "animal,"
so Flickr's algorithm does not appear to be taking
a person's skin color into consideration when
auto-tagging them.
12
13
● GEDMatch – free/open
genealogy genetics database
● Takes data from all major DNA
testing companies
● Allows cross-analysis
● Also can catch criminals
14
● DNA from over 100 crime scenes uploaded
● Police create fake profiles, upload suspect data and triangulate
● Forced GEDMatch to change their Terms of Service to allow for criminal investigation
● Users have mixed reactions
- A few deleted their data
- Others were thankful
• [GEDMatch’s co-creator] says a woman wrote that her father was a serial killer, and she
wanted her data out there to give the families of his victim’s closure.
● Police also have CODIS – the government criminal DNA database
- Only looks at small snippets of DNA – 20 locations in human genome
- DNA Tests/GEDMatch contains full DNA sequences – 600,000 locations in human genome
15
● Problems
- Is it legal? Yes, and admissible.
- Accuracy?
• Officers could use this evidence as ‘smoking gun’
• Need to strengthen case with other evidence
- How about those hidden family secrets?
• Children up for adoption
• Not the biological parent situations
• Do ‘lost’ family members want to be found?
- Who owns the data after the donor dies?
16
17
● Many insurance companies are starting to offer fitness tracking discounts
● Subscribers provide fitness tracking data
● Insurance company provides discounts based on activity
● Same model used in car companies
● Problem:
- If health issue found, what are the ramifications?
• Cut policy?
• Raise rates?
- If pre-existing condition?
- What if subscriber cheats?
18
19
● Mylan bought rights to Epipen in 2016 (originally invented in the 1970s).
● Pre-Mylan, Epipens cost $57
● Post-Mylan, Epipens cost $600 - this was a data driven decision
- Also created a generic for $300 – this was a public relations decision
● Problem:
- If market demand will support the higher cost, is that okay?
- What about folks who can’t afford even the generic?
- Could this impact other medicine costs?
20
21
One of the ads, called "Supermarket," also
fools with a stereotype: It shows a father at
a market with his daughter getting the
ingredients for a traditional recipe, three-milk
cake (generally made with condensed, fresh
and evaporated milk). Latino men are
typically not depicted in such a place. So
why show a man in a market?
"Because Hispanic women love it," said Jeff
Manning, the executive director of the milk
board who worked for 25 years in the
advertising industry before taking the post in
1993.
"They love the idea. It is aspirational. They
look at it and say, 'That makes me feel
good,' " said Manning.

Ethics In A Data Driven World

  • 1.
    Alex Meadows TDWI CarolinasMeetup – October 26 2018 Ethics In a Data Driven World
  • 2.
    2 ● General warning– The examples shared can be very unsettling and uncomfortable. ● Trying to keep bias out of these conversations can be difficult, especially when talking actual examples. ● Let’s do our best  ● Also, while an answer may be obvious to you – it may be slightly/completely different to others, please respect that
  • 3.
  • 4.
    4 ● Target (andmost companies) assigns every customer a Guest ID - loyalty/debit/credit card - Email - Name ● Tracks every purchase ever made ● Attaches any demographic information known ● New parents are a gold mine - Don’t care as much about brand loyalty - Frazzled and overwhelmed ● Birth records are public - Everyone markets to parents upon birth ● How can companies market to expectant parents in first/second trimester?
  • 5.
    5 “[Pole] ran testafter test, analyzing the data, and before long some useful patterns emerged. Lotions, for example. Lots of people buy lotion, but one of Pole’s colleagues noticed that women on the baby registry were buying larger quantities of unscented lotion around the beginning of their second trimester. Another analyst noted that sometime in the first 20 weeks, pregnant women loaded up on supplements like calcium, magnesium and zinc. Many shoppers purchase soap and cotton balls, but when someone suddenly starts buying lots of scent-free soap and extra-big bags of cotton balls, in addition to hand sanitizers and washcloths, it signals they could be getting close to their delivery date.”
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
    10 "We’re appalled andgenuinely sorry that this happened. We are taking immediate action to prevent this type of result from appearing. There is still clearly a lot of work to do with automatic image labeling, and we’re looking at how we can prevent these types of mistakes from happening in the future.“ – Google rep to Ars Technica
  • 11.
    11 But computer algorithmsaren't perfect, and when they identify images incorrectly, the results can be disastrous. Some concentration camp photos received inappropriate tags, including "sport" and "jungle gym." Flickr had also been tagging some images of people as "ape" and "animal," including a photo of a black man named William taken by photographer Corey Deshon, according to the Guardian. The photo service had also labeled a white woman wearing face paint as "ape" and "animal," so Flickr's algorithm does not appear to be taking a person's skin color into consideration when auto-tagging them.
  • 12.
  • 13.
    13 ● GEDMatch –free/open genealogy genetics database ● Takes data from all major DNA testing companies ● Allows cross-analysis ● Also can catch criminals
  • 14.
    14 ● DNA fromover 100 crime scenes uploaded ● Police create fake profiles, upload suspect data and triangulate ● Forced GEDMatch to change their Terms of Service to allow for criminal investigation ● Users have mixed reactions - A few deleted their data - Others were thankful • [GEDMatch’s co-creator] says a woman wrote that her father was a serial killer, and she wanted her data out there to give the families of his victim’s closure. ● Police also have CODIS – the government criminal DNA database - Only looks at small snippets of DNA – 20 locations in human genome - DNA Tests/GEDMatch contains full DNA sequences – 600,000 locations in human genome
  • 15.
    15 ● Problems - Isit legal? Yes, and admissible. - Accuracy? • Officers could use this evidence as ‘smoking gun’ • Need to strengthen case with other evidence - How about those hidden family secrets? • Children up for adoption • Not the biological parent situations • Do ‘lost’ family members want to be found? - Who owns the data after the donor dies?
  • 16.
  • 17.
    17 ● Many insurancecompanies are starting to offer fitness tracking discounts ● Subscribers provide fitness tracking data ● Insurance company provides discounts based on activity ● Same model used in car companies ● Problem: - If health issue found, what are the ramifications? • Cut policy? • Raise rates? - If pre-existing condition? - What if subscriber cheats?
  • 18.
  • 19.
    19 ● Mylan boughtrights to Epipen in 2016 (originally invented in the 1970s). ● Pre-Mylan, Epipens cost $57 ● Post-Mylan, Epipens cost $600 - this was a data driven decision - Also created a generic for $300 – this was a public relations decision ● Problem: - If market demand will support the higher cost, is that okay? - What about folks who can’t afford even the generic? - Could this impact other medicine costs?
  • 20.
  • 21.
    21 One of theads, called "Supermarket," also fools with a stereotype: It shows a father at a market with his daughter getting the ingredients for a traditional recipe, three-milk cake (generally made with condensed, fresh and evaporated milk). Latino men are typically not depicted in such a place. So why show a man in a market? "Because Hispanic women love it," said Jeff Manning, the executive director of the milk board who worked for 25 years in the advertising industry before taking the post in 1993. "They love the idea. It is aspirational. They look at it and say, 'That makes me feel good,' " said Manning.

Editor's Notes

  • #4 Source: https://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/#127b50f96668 Target (and other companies) are smart enough to know when you are expecting a kid. In 2012 Forbes reported on the story that a family in Minneapolis where a baby coupon book was sent and addressed to their teen daughter. The father, being quite angry went down to the Target and demanded an apology. A few days later, the manager called the man back to apologize again but was actually apologized to. It turns out that the daughter was expecting. We in the data community know of course how they are able to.
  • #5 Source: https://www.nytimes.com/2012/02/19/magazine/shopping-habits.html?pagewanted=1&_r=1&hp
  • #9 Source: http://theconversation.com/data-ethics-is-more-than-just-what-we-do-with-data-its-also-about-whos-doing-it-98010
  • #12 Source: https://money.cnn.com/2015/05/21/technology/flickr-racist-tags/
  • #13 So, have you done a DNA test? This is about as personal as information can get. Do you know what the company you took your test with is allowed to do with your data?
  • #14 Source: https://www.theatlantic.com/science/archive/2018/06/gedmatch-police-genealogy-database/561695/
  • #20 Source: https://www.cbsnews.com/news/generic-epipen-epipen-jr-teva-cleared-by-fda-challenges-mylan/
  • #21 Source: https://www.cbsnews.com/news/got-milk-ad-campaign-ends-after-20-years/ Source: Various image sites Source: https://www.sfgate.com/business/article/Lost-in-the-translation-Milk-board-does-without-2884230.php