SlideShare a Scribd company logo
1 of 58
The Natural
History of Gmail
Data Mining
Gmail isn’t really about
email !!
..it’s a gigantic profiling
machine
1.
court case !
A court case reveals a trove of documents
about Gmail’s inner workings
In late
2010 .. Ads
depends
on emails
the most
serious
legal
challenge
Illegal data
mining
2.
How google Makes money?
Google is the world’s largest advertising
company.
3.
Gmail’s early history
Gmail’s early history
✖Lunched 2004.
✖Yahoo and Microsoft’s Hotmail since
the 90s.
✖Vast amount of storage space per
user.
✖It would be free to users and earn
revenue through advertising.
4.
Gmail’s limitless data
mining ambitions
Gmail data mining
✖The first version of ad serving in
Gmail exploited only concepts directly
extracted from message texts and did
little or no user profiling.
Gmail’s original patented data mining scheme(2013)
✖“internal” and “external” message
attributes that used in any combination
to extract the meaning of an email and
select the best ads to match it.
Gmail’s original patented data mining scheme(2013)
Internal Email Information:
✖Info. from a subject line.
✖Info. from body text.
✖A sender name and/or email address.
✖One or more recipient name and/or
email address.
✖Recipient type (e.g., direct recipient,
cc, bcc).
✖Text extracted from an email address.
✖Embedded information (e.g., business
card file, an image).
Gmail’s original patented data mining scheme(2013) (continue)
Internal Email Information:
✖Linked Info. (e.g., info. from a web
page linked to from the email).
✖Attached info. (e.g., Word processor
files, images, spreadsheets, etc).
Gmail’s original patented data mining scheme(2013) (continue)
External Email Information:
✖Info extracted or derived from search
results returned in response to a search
query composed of extracted email info.
✖Info about the sender for example
derived from previous interactions with
the recipient.
✖Info from other emails sent by sender
and/or received by the recipient.
✖Info from common directory to
embedded info(word file).
Gmail’s original patented data mining scheme(2013) (continue)
External Email Information:
✖A geographic location of the sender
and the recipient.
✖A time the email was sent(lunch).
5.
Gmail doesn’t make much
money from ads
✖When Gmail was finally released to
the public in April 2004, its ad serving
system used a sophisticated data
mining algorithm known as PHIL.
PHIL algorithm
✖PHIL already implemented the
previous year in Google’s AdSense
program that serves ads to web sites
PHIL algorithm
✖PHIL stands for Probabilistic
Hierarchical Inferential Learner
PHIL algorithm
✖PHIL identify clusters, depending on
concepts.
✖Concepts more or less likely to occur
in email content or web page.
PHIL algorithm
✖e. g., PHIL can learn to distinguish the
entirely different meaning of two
concepts as “ski resort” and “lender of
last resort”.
PHIL algorithm
✖In AdSense, PHIL matched concepts
derived from sets of keywords provided
by advertisers with concepts extracted
from the web pages where publishers
wanted Google to place ads.
✖The idea was that the better the
match, the more likely a visitor to the
publisher’s site would be to click on the
ad, which was the revenue generating
event for Google.
PHIL algorithm In AdSense
✖ AdSense quickly grew to become
Google’s second largest business after
search itself, reaching more than $1
million a day by 2004 and $13 billion a
year by 2013.
PHIL algorithm In AdSense
PHIL algorithm In Gmail
✖PHIL for monetization in Gmail must
have seemed like a no-brainer to the
Google managers.
✖BUT ..
PHIL algorithm In Gmail
✖BUT things did not work out as hoped.
✖Gmail revenues were not good!!
PHIL algorithm In Gmail
✖Gmail revenues for 2014 at barely
$400 million, or less than 1% of
Google’s total revenue.
✖Google was estimated to have over
500 million users.
✖THEN ..
PHIL algorithm In Gmail
✖THEN Gmail user produces less than
$1 in revenue per year.
PHIL algorithm In Gmail
✖ The cost of storage alone is 31 cents
per year per gigabyte.
✖If the average Gmail user consumes
only 20% of their nominally allotted 15
gigabytes.
✖Google’s retail price for this amount
of storage would be 93 cents
✖more than the revenue it gets from
one Gmail user.
Why is revenue generation
in Gmail so much weaker
than for search or AdSense?
6.
From ads to user profiles
Google online profiling
✖Using PHIL ..
Google online profiling
✖the most comprehensive kind—
consists of the concept or category
clusters extracted by the PHIL
algorithm from documents the user has
viewed (web pages, inbound emails) or
created (outbound emails, social media
posts).
Google online profiling
✖Assuming conservatively that the
average Gmail user receives just 10
non-spam emails per day, the annual
flux of inbound Gmail probably
approaches and may well surpass two
trillion messages per year.
Google online profiling
✖By building and continually updating
a vast database of individual user
profiles.
✖one particular user who enters the
word “blackberry” into her browser ..
Google page ranking
✖computes an aggregate statistical view
of each web page’s.
✖Bad way ..
One Box to rule them all
purely
ad-based
business
model
ads and
user
profiling
COB (Content OneBox)
✖the PHIL-based extraction of
message concepts
✖updating the “user model” that
Google maintains of each user
✖attaching “smart labels” to
messages that indicate their type
COBCAT2 MIXER
How does CAT2 Mixer operate ??
CAT2 Mixer did not trigger, and consequently
neither did COB
in the case of Government and Business to
pay real money for the service
The Soltion for COB
Sequence of events in the life of a Gmail message
In 2014
60$ billionThat’s a lot of money
70% - 80% of usersAnd a lot of users
ad1 ad2 ad3
user1 1$
user2 1.5$
user3 2$
hundreds of thousands of advertisers
hundredsofmillionsofusers
Sparsity is a problem for Google
Clustering using data brokers
AcxiomDatalogix Epsilon
Too expensive for 0.5
Billion
Clustering using query stream &( IRS & Zillow)
technology
technology
health
health
lGoogle’s (partial) clarification of
data mining in Gmail
r create advertsing
profiles”. in April 2014 ,Google promises on its web site that “Google
Apps for Education services do not collect or use student
data for advertising purposes or create advertising
profiles”.
 The carefully worded promise to stop using student data
to create “advertising profiles” does not rule out the
possibility that it will continue creating profiles that help
it to optimize search results or identify valuable clusters of
users
 Google was forced to admit that, contrary to its
promises to educators, it was in fact mining student
emails in GAFE for years.
We cannot know for certain what Google is doing with the
output of its vast and highly sophisticated email data mining
machinery
have “no legitimate
expectation of privacy”
 It is not the profiling itself that is objectionable, it becomes objectionable
when the “voluntary” part drops out of the formula.
 Google argued that implicit user consent to data mining was
sufficient“…impliedly consent to Google’s practices by virtue of the fact
that all users of email must necessarily expect that their emails will be
subject to automated processing.”
Gmail users have “no
legitimate expectation
of privacy”
Google’s lawyers make the preposterous claim that once
users turn their email over to a third party service
provider they no longer have any “legitimate expectation
of privacy”.
have“nolegitimate
expectationofprivacy”The future of Gmail data mining and
the need for transparency
Google is calling on governments around the world to
disclose and limit their surveillance practices
it is time for Google to embrace the same transparency
about data mining it wishes to see in others.
Recourses
✖https://medium.com/@jeffgould/the-
natural-history-of-gmail-data-
mining-be115d196b10
Thanks!
Any questions?

More Related Content

Similar to Gmail data mining

An Internet All About You
An Internet All About YouAn Internet All About You
An Internet All About YouDaylan Pearce
 
John Locke Essay Prize 2014
John Locke Essay Prize 2014John Locke Essay Prize 2014
John Locke Essay Prize 2014Marco Bertone
 
Greenlight's Consumer Electronics Sector Report, May 2013, Issue 1
Greenlight's Consumer Electronics Sector Report, May 2013, Issue 1Greenlight's Consumer Electronics Sector Report, May 2013, Issue 1
Greenlight's Consumer Electronics Sector Report, May 2013, Issue 1Greenlight Digital
 
Understanding Gmail's Terms of Service - Policy Primer
Understanding Gmail's Terms of Service - Policy PrimerUnderstanding Gmail's Terms of Service - Policy Primer
Understanding Gmail's Terms of Service - Policy PrimerHayden Sin
 
Why Google Is Evil
Why Google Is EvilWhy Google Is Evil
Why Google Is Evilguest99e16d
 
Why Google Is Evil (1)
Why Google Is Evil (1)Why Google Is Evil (1)
Why Google Is Evil (1)guest99e16d
 
Why Google Is Evil
Why Google Is EvilWhy Google Is Evil
Why Google Is Evilguest99e16d
 
Google vs YEP. YEP Case Study.pdf
Google vs YEP. YEP Case Study.pdfGoogle vs YEP. YEP Case Study.pdf
Google vs YEP. YEP Case Study.pdfDipayan Saha Dip
 
K UJ John E. Gamble University of South Alab.docx
K  UJ  John E. Gamble University of South Alab.docxK  UJ  John E. Gamble University of South Alab.docx
K UJ John E. Gamble University of South Alab.docxDIPESH30
 
Consumer Behaviour Work Example
Consumer Behaviour Work ExampleConsumer Behaviour Work Example
Consumer Behaviour Work ExampleEMBS2007
 
Week 4 power point slide -3-case study 3- groupon's business model social an...
Week 4  power point slide -3-case study 3- groupon's business model social an...Week 4  power point slide -3-case study 3- groupon's business model social an...
Week 4 power point slide -3-case study 3- groupon's business model social an...Zulkifflee Sofee
 
Week 4 power point slide -3-case study 3- groupon's business model social an...
Week 4  power point slide -3-case study 3- groupon's business model social an...Week 4  power point slide -3-case study 3- groupon's business model social an...
Week 4 power point slide -3-case study 3- groupon's business model social an...Zulkifflee Sofee
 
The google problem 1.13
The google problem 1.13The google problem 1.13
The google problem 1.13FairSearch
 
Google’s strategy in 2008 22
Google’s strategy in 2008 22Google’s strategy in 2008 22
Google’s strategy in 2008 22Sali1110
 
Google Recommendations and Implementations Strategies
Google Recommendations and Implementations StrategiesGoogle Recommendations and Implementations Strategies
Google Recommendations and Implementations Strategiesgbrynza
 
Slideshare perspectives 04 14 2014 rev1
Slideshare perspectives 04 14 2014 rev1Slideshare perspectives 04 14 2014 rev1
Slideshare perspectives 04 14 2014 rev1katrinas1983
 

Similar to Gmail data mining (20)

An Internet All About You
An Internet All About YouAn Internet All About You
An Internet All About You
 
John Locke Essay Prize 2014
John Locke Essay Prize 2014John Locke Essay Prize 2014
John Locke Essay Prize 2014
 
Google
GoogleGoogle
Google
 
Greenlight's Consumer Electronics Sector Report, May 2013, Issue 1
Greenlight's Consumer Electronics Sector Report, May 2013, Issue 1Greenlight's Consumer Electronics Sector Report, May 2013, Issue 1
Greenlight's Consumer Electronics Sector Report, May 2013, Issue 1
 
Understanding Gmail's Terms of Service - Policy Primer
Understanding Gmail's Terms of Service - Policy PrimerUnderstanding Gmail's Terms of Service - Policy Primer
Understanding Gmail's Terms of Service - Policy Primer
 
Why Google Is Evil
Why Google Is EvilWhy Google Is Evil
Why Google Is Evil
 
Why Google Is Evil (1)
Why Google Is Evil (1)Why Google Is Evil (1)
Why Google Is Evil (1)
 
Why Google Is Evil
Why Google Is EvilWhy Google Is Evil
Why Google Is Evil
 
Google vs YEP. YEP Case Study.pdf
Google vs YEP. YEP Case Study.pdfGoogle vs YEP. YEP Case Study.pdf
Google vs YEP. YEP Case Study.pdf
 
Case Study 1 Google Inc.- Kris Hodgson E-Commerce
Case Study 1 Google Inc.- Kris Hodgson E-CommerceCase Study 1 Google Inc.- Kris Hodgson E-Commerce
Case Study 1 Google Inc.- Kris Hodgson E-Commerce
 
K UJ John E. Gamble University of South Alab.docx
K  UJ  John E. Gamble University of South Alab.docxK  UJ  John E. Gamble University of South Alab.docx
K UJ John E. Gamble University of South Alab.docx
 
Google case study
Google case studyGoogle case study
Google case study
 
Consumer Behaviour Work Example
Consumer Behaviour Work ExampleConsumer Behaviour Work Example
Consumer Behaviour Work Example
 
Week 4 power point slide -3-case study 3- groupon's business model social an...
Week 4  power point slide -3-case study 3- groupon's business model social an...Week 4  power point slide -3-case study 3- groupon's business model social an...
Week 4 power point slide -3-case study 3- groupon's business model social an...
 
Week 4 power point slide -3-case study 3- groupon's business model social an...
Week 4  power point slide -3-case study 3- groupon's business model social an...Week 4  power point slide -3-case study 3- groupon's business model social an...
Week 4 power point slide -3-case study 3- groupon's business model social an...
 
The google problem 1.13
The google problem 1.13The google problem 1.13
The google problem 1.13
 
Google’s strategy in 2008 22
Google’s strategy in 2008 22Google’s strategy in 2008 22
Google’s strategy in 2008 22
 
Google Recommendations and Implementations Strategies
Google Recommendations and Implementations StrategiesGoogle Recommendations and Implementations Strategies
Google Recommendations and Implementations Strategies
 
Slideshare perspectives 04 14 2014 rev1
Slideshare perspectives 04 14 2014 rev1Slideshare perspectives 04 14 2014 rev1
Slideshare perspectives 04 14 2014 rev1
 
Google
GoogleGoogle
Google
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Gmail data mining

  • 1. The Natural History of Gmail Data Mining
  • 2. Gmail isn’t really about email !!
  • 3. ..it’s a gigantic profiling machine
  • 5. A court case reveals a trove of documents about Gmail’s inner workings In late 2010 .. Ads depends on emails the most serious legal challenge Illegal data mining
  • 7.
  • 8. Google is the world’s largest advertising company.
  • 10. Gmail’s early history ✖Lunched 2004. ✖Yahoo and Microsoft’s Hotmail since the 90s. ✖Vast amount of storage space per user. ✖It would be free to users and earn revenue through advertising.
  • 12. Gmail data mining ✖The first version of ad serving in Gmail exploited only concepts directly extracted from message texts and did little or no user profiling.
  • 13. Gmail’s original patented data mining scheme(2013) ✖“internal” and “external” message attributes that used in any combination to extract the meaning of an email and select the best ads to match it.
  • 14. Gmail’s original patented data mining scheme(2013) Internal Email Information: ✖Info. from a subject line. ✖Info. from body text. ✖A sender name and/or email address. ✖One or more recipient name and/or email address. ✖Recipient type (e.g., direct recipient, cc, bcc). ✖Text extracted from an email address. ✖Embedded information (e.g., business card file, an image).
  • 15. Gmail’s original patented data mining scheme(2013) (continue) Internal Email Information: ✖Linked Info. (e.g., info. from a web page linked to from the email). ✖Attached info. (e.g., Word processor files, images, spreadsheets, etc).
  • 16. Gmail’s original patented data mining scheme(2013) (continue) External Email Information: ✖Info extracted or derived from search results returned in response to a search query composed of extracted email info. ✖Info about the sender for example derived from previous interactions with the recipient. ✖Info from other emails sent by sender and/or received by the recipient. ✖Info from common directory to embedded info(word file).
  • 17. Gmail’s original patented data mining scheme(2013) (continue) External Email Information: ✖A geographic location of the sender and the recipient. ✖A time the email was sent(lunch).
  • 18. 5. Gmail doesn’t make much money from ads
  • 19. ✖When Gmail was finally released to the public in April 2004, its ad serving system used a sophisticated data mining algorithm known as PHIL. PHIL algorithm
  • 20. ✖PHIL already implemented the previous year in Google’s AdSense program that serves ads to web sites PHIL algorithm
  • 21. ✖PHIL stands for Probabilistic Hierarchical Inferential Learner PHIL algorithm
  • 22. ✖PHIL identify clusters, depending on concepts. ✖Concepts more or less likely to occur in email content or web page. PHIL algorithm
  • 23. ✖e. g., PHIL can learn to distinguish the entirely different meaning of two concepts as “ski resort” and “lender of last resort”. PHIL algorithm
  • 24. ✖In AdSense, PHIL matched concepts derived from sets of keywords provided by advertisers with concepts extracted from the web pages where publishers wanted Google to place ads. ✖The idea was that the better the match, the more likely a visitor to the publisher’s site would be to click on the ad, which was the revenue generating event for Google. PHIL algorithm In AdSense
  • 25. ✖ AdSense quickly grew to become Google’s second largest business after search itself, reaching more than $1 million a day by 2004 and $13 billion a year by 2013. PHIL algorithm In AdSense
  • 26. PHIL algorithm In Gmail ✖PHIL for monetization in Gmail must have seemed like a no-brainer to the Google managers. ✖BUT ..
  • 27. PHIL algorithm In Gmail ✖BUT things did not work out as hoped. ✖Gmail revenues were not good!!
  • 28. PHIL algorithm In Gmail ✖Gmail revenues for 2014 at barely $400 million, or less than 1% of Google’s total revenue. ✖Google was estimated to have over 500 million users. ✖THEN ..
  • 29. PHIL algorithm In Gmail ✖THEN Gmail user produces less than $1 in revenue per year.
  • 30. PHIL algorithm In Gmail ✖ The cost of storage alone is 31 cents per year per gigabyte. ✖If the average Gmail user consumes only 20% of their nominally allotted 15 gigabytes. ✖Google’s retail price for this amount of storage would be 93 cents ✖more than the revenue it gets from one Gmail user.
  • 31. Why is revenue generation in Gmail so much weaker than for search or AdSense?
  • 32. 6. From ads to user profiles
  • 34. Google online profiling ✖the most comprehensive kind— consists of the concept or category clusters extracted by the PHIL algorithm from documents the user has viewed (web pages, inbound emails) or created (outbound emails, social media posts).
  • 35. Google online profiling ✖Assuming conservatively that the average Gmail user receives just 10 non-spam emails per day, the annual flux of inbound Gmail probably approaches and may well surpass two trillion messages per year.
  • 36. Google online profiling ✖By building and continually updating a vast database of individual user profiles. ✖one particular user who enters the word “blackberry” into her browser ..
  • 37. Google page ranking ✖computes an aggregate statistical view of each web page’s. ✖Bad way ..
  • 38. One Box to rule them all purely ad-based business model ads and user profiling
  • 39. COB (Content OneBox) ✖the PHIL-based extraction of message concepts ✖updating the “user model” that Google maintains of each user ✖attaching “smart labels” to messages that indicate their type
  • 41. How does CAT2 Mixer operate ??
  • 42. CAT2 Mixer did not trigger, and consequently neither did COB
  • 43. in the case of Government and Business to pay real money for the service
  • 45. Sequence of events in the life of a Gmail message
  • 46. In 2014 60$ billionThat’s a lot of money 70% - 80% of usersAnd a lot of users
  • 47. ad1 ad2 ad3 user1 1$ user2 1.5$ user3 2$ hundreds of thousands of advertisers hundredsofmillionsofusers
  • 48. Sparsity is a problem for Google
  • 49. Clustering using data brokers AcxiomDatalogix Epsilon Too expensive for 0.5 Billion
  • 50. Clustering using query stream &( IRS & Zillow) technology technology health health
  • 51. lGoogle’s (partial) clarification of data mining in Gmail r create advertsing profiles”. in April 2014 ,Google promises on its web site that “Google Apps for Education services do not collect or use student data for advertising purposes or create advertising profiles”.  The carefully worded promise to stop using student data to create “advertising profiles” does not rule out the possibility that it will continue creating profiles that help it to optimize search results or identify valuable clusters of users  Google was forced to admit that, contrary to its promises to educators, it was in fact mining student emails in GAFE for years.
  • 52. We cannot know for certain what Google is doing with the output of its vast and highly sophisticated email data mining machinery
  • 53. have “no legitimate expectation of privacy”  It is not the profiling itself that is objectionable, it becomes objectionable when the “voluntary” part drops out of the formula.  Google argued that implicit user consent to data mining was sufficient“…impliedly consent to Google’s practices by virtue of the fact that all users of email must necessarily expect that their emails will be subject to automated processing.” Gmail users have “no legitimate expectation of privacy”
  • 54. Google’s lawyers make the preposterous claim that once users turn their email over to a third party service provider they no longer have any “legitimate expectation of privacy”.
  • 55. have“nolegitimate expectationofprivacy”The future of Gmail data mining and the need for transparency
  • 56. Google is calling on governments around the world to disclose and limit their surveillance practices it is time for Google to embrace the same transparency about data mining it wishes to see in others.