Phil Pearce - Blackhat analytics
Upcoming SlideShare
Loading in...5
×
 

Phil Pearce - Blackhat analytics

on

  • 9,529 views

Don't miss the next year of Marketing Festival Brno - http://www.marketingfestival.cz

Don't miss the next year of Marketing Festival Brno - http://www.marketingfestival.cz

You can also buy a video of this presentation at marketingfestival.cz

Statistics

Views

Total Views
9,529
Views on SlideShare
9,368
Embed Views
161

Actions

Likes
1
Downloads
10
Comments
0

5 Embeds 161

http://www.marketingfestival.cz 139
http://video.marketingfestival.cz 13
https://twitter.com 5
http://dev.marketingfestival.cz 3
https://tweetdeck.twitter.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Welcome :)
  • Fun fact
  • Definition: A headless browser is a web browser WITHOUT a user interface. They are frequently used for quality control or to extract data from pages, but have the power to be used for other purposes. Headless browsers, are able to parse JavaScript. They can click on links and even cope with downloads.

Phil Pearce - Blackhat analytics Phil Pearce - Blackhat analytics Presentation Transcript

  • [photo of generla zorg] BlackHat Analytics 2: Privacy Wars in the future
  • Welcome Phil Pearce PPC, Privacy and Analytics Expert Freelancer @philpearce phil@precisionppc.me www.linkedin.com/in/philpearce Web Analytics Exchange mentor Tracking protection group (DNT) 750 GA questions answered #BlackhatAnalytics #MktFest
  • Summary 1.Background 2.Definition 3.Example Techniques 4.Classifications 5.Penalties 6.Industry issues 7.Why the imbalance? 8.Class action wars 9.Privacy apocalypse & Big Data 10.Look at the future… 11.Questions
  • A long time ago... … in a google universe far, far away...
  • Define: Blackhat Analytics
  • Define: Blackhat Analytics “0” results Define: Blackhat Analytics
  • If you do this search now... Define: Blackhat Analytics
  • It turns out... Me Me ...I know more than Google ;)
  • Hypothesis At some point in the future "BlackHat Analytics" or “Faking Conversions” might become more widespread. Because... 1. WA is becoming more important for business decision making. 2. Automatic performance based PPC bid management system are becoming more widely used. 3. Increase in online competitiveness & more revenue at stake.
  • Definition Intentional act of distorting, deleting, unethically using, or hijacking WA data using technical or legal loopholes; with the goal of making financial gains, or obtaining a competitive advantage. Phil Pearce 2009
  • Ad behavioural targeting (Interest Based Stalking) Remarketing Ads (Return Visitor Stalking) - Starwars stalker Safari 3rd party POST cookie (Preference bypassing) Evil tracking from pre-2010 Referral backlink log spam (depreciated SEO technique) GA log spam (Spider visit loading JS) NEW “Headless Browser” spam Visited links CSS hack (History Sniffing) Flash cookie respawn (Zombie Cookies) EverCookie (all of the above+)
  • Super evil: EverCookie
  • The EverCookie was so difficult to delete: even NSA considered using it! But they decided they did`nt need it ;) Source: http://www.slideshare.net/jonbonachon/tor-stinks
  • Examples from USA
  • Classification Intent Accidental Malicious Competitors Target Own website website Purpose of Same Different data purpose purpose collection Scale Niche Data Impact uneffected Intent Accidental Malicious Target Own website Competitors website Data collection PurposeSame Different purpose Scale Niche Mass effect Impact Data uneffected GA Account deletion Mass effect GA Account deletion
  • Classifications .EverCookie Good/Accurate Measure Data Flash Cookie Respawn Malintent Cashback cookies (e.g Quidco) Phone call logs App error logs Flash Cookie CSS history sniffing Referral log spam Unintentional or Accidental Speed checking robots Hostname spam Fake conversions Google Wifi incident Google (not provided) Bad/Unreliable Measure Data
  • Updates Good/Accurate Measure Data Unintentional or Accidental Malintent Speed checking robots Hostname spam Google Wifi incident Bad/Unreliable Measure Data More good/reliable measure data as less accidental data mistakes
  • If nasty tracking code is installed Who is liable?
  • Liability for Privacy & Security Is the agency liable? No. Local laws say Website Owner is responsible (not agency or Vendor) BUT agency is responsible for * Uphold professional standards (e.g. GACP status) * Pro-active client relationship
  • Why do people still do this bad stuff?
  • The Lure of the Dark side is too strong!
  • Its all about the money! £££ Affiliate networks looking to increase CPA and attract new Affiliate. Online News website looking to retain users & sell stories (e.g. NYT) Banner networks looking to improve CPM & reduce cookie deletion rates and overcome keywords “not provided”. Sustained CPC bidding wars Big data
  • But there is a disturbance in the task force...
  • Meet the new Matt Cutts ... Matt Cutts Hired WebSpam fighter to Force quality improvements in 2000. http://www.mattcutts.com/blog/about-me/ “Red team” leader Google Privacy “Red” team soon to be hired in 2013 following FTC settlement. Mission to discovering and prioritizing subtle, unusual, and emergent privacy & security flaws https://www.google.com/about/jobs/locations/mountainview/engineering/systems/data-privacy-engineerprivacy-red-team-mountain-view.html
  • “Internal” Imperial Bureau Security New Google Product Manager of Privacy & information security
  • F@#K - GA account deleted! You will not collect any data that personally identifies an individual such as a:  full name  email address  billing information  or other data which can be reasonably linked to such information by Google You must post a Privacy Policy which provides notice that your use of cookies is to collect traffic data. You must not circumvent any privacy features (e.g, an opt-out) that are part of GA. www.google.com/analytics/terms/us.html
  • Why cant GA just remove the bad PII data?    Free WA packages unable to remove PII without deleting whole GA accounts! Raw logs are only stored for ~30days Right to be forgotten was introduced after GA was designed. (although this might be possible with Universal which is user-centric, not visitor-centric)
  • “Sensitive” data also is an issue http://en.wikipedia.org/wiki/Personal_id entifier#Examples_of_PID
  • Example1: Accidental PII www.yoursite.com site:comptetitor.com inurl:"utm_content * gmail.com“ http://www.google.com/#q=inurl:cz+inurl:utm_content+*+gmail&pws=0&num=100&filter=0& as_qdr=all e.g. www.fashiondays.cz/campaigns/?cod=xxxx&mail=NAME.REMOVED@gmail.com&utm_m edium=email&utm_source=new_registration_pagE&utm_campaign=new_registration_page &utm_content=Layout_B privacy@google.com https://support.google.com/adwords/answer/8206?contact=1&rd=1
  • Solution/Counter-measure for Accidental PII Add exclude parameters to GWT: eail, email utm_source, utm_medium, utm_campain, utm_content, utm_keyword Or use temporary robots.txt fix: User-agent: * Disallow: /*utm_medium=email Disallow: /*gmail.com Noarchive: /*utm_medium=email Noarchive: /*gmail.com
  • Disclaimer Legal Disclaimer: The purpose of this example is to demonstrate a hole in all Analytics platforms, and how to patch this hole. It is used for TESTING purposes ONLY. By reading this example you agree to NOT use this on a live website, and agree that I (Phil Pearce) and NOT liabilities for and damage that a website owner may suffer arising out of this example & tool. If you are in any doubt, please seek the advice of the Google legal team www.google.com/contact/ or your local legal counsel BEFORE testing. Note: This issue has been raised on the GACP private discussion forum 6months ago, prior to this event.
  • Example2: Do you recognise this number? It is a Quintillion or “Big Integer”
  • Intentional Data damage WARNING: Don’t Try this at Home! javascript:_gaq.push(['_setAccount','UA-xxxxxx-1'],[' _addTr ans','8148350','affiliati on','-9223372036854775807' ,'-9223372036854775807','0.00','-','- ','-'],['_addItem','SKU 00001','8148350','BIG refund','-','-9223372036854775807' ,'1'],['_trac kTr ans']); http://www.google-analytics.com/__utm.gif?utmwv=5.4.6&utms=44&utmn=393079074&utmhn=domain.com&utmt=tran&utmtid=8148350&utmtst= affiliation&utmtto=-9223372036854775807&utmttx=-9223372036854775807&utmtsp=0.00&utmtci=-&utmtrg=-&utmtco=-&utmcs= UTF-8&utmsr =1366x768&utm vp=1366x550&utmsc=24- bit&utmul=en- us&utmje=1&utmfl=11.9 r900&utmdt=TITLE&utmhid=509485053&utmr =-&utmp=/&utmht=1385061484294&utmac =UA-XXXXX-1&utmc c=__utma=251194116.2116214072.1385060410.1385060410.1385060410.1; __utmz=251194116.1385060410.1.1.utmcsr=( direct)| utmccn=(direc t)| utmc md=(none);&utmu=qjAL~
  • Solution/Counter-measure for intention Data Damage Tool to manually fix… bit.ly/bigintegerfix Legal Disclaimer: The purpose of this example is to demonstrate a hole in all Analytics platforms, and how to patch this hole. It is used for TESTING purposes ONLY. By reading this example you agree to NOT use this on a live website, and agree that I (Phil Pearce) and NOT liabilities for and damage that a website owner may suffer arising out of this example & tool. If you are in any doubt, please seek the advice of the Google legal team www.google.com/contact/ or your local legal counsel BEFORE testing. Note: This issue has been raised on the GACP private discussion forum 6months ago, prior to this event.
  • Fine calculator Fine = (No. users effected * Scale badness * Size of Brand) less (Website Risk assessment . + Vendor privacy self certification)
  • Office for Personal Data Protection uoou.cz Czech Maximum Fine is CZK 10,000,000 / £300K (typically the fine is CZK 10,000 / £0.3K with the highest fine so far £70K to Institute for Drug Control for unlawfully data collection & processing) Fine example of CZK 10,000 to Solus Association last month for misuse of financial data http://aktualne.centr um.cz/finance/peniz e/clanek.phtml?id=7 92359 No cookie fines so far.
  • But there is still an Imbalance in the force Consumers VS Advertiser
  • Because… Maturity in Advertising sector User data allows better Ad targeting = £ MORE data better targeting = £££
  • Data is power We do'na the data capt
  • Rise of the Big Data Empire
  • Dark motivations: Data Greed & Fear of losing existing user data
  • Triggered…
  • Group/Class Action Wars Note: “Class” is a collective of users (e.g. “South Bohemian Mothers group” vs Temelin nuclear Power plant)
  • Define: Class Action Prosecutor they represent the users. Like Affiliates (i.e revenue motivated) but larger resources & clever-er For example….
  • US Class Action Prosecutor: Like bounty hunters, but more … sophisticated!
  • BIG class-action fines in US
  • Question… Do class action lawsuits exist in Europe or are they only in US?
  • Class Action Prosecutors: also now active in UK! e.g. Google UK vs Olswang Class Action (Safari 3rd party cookie bypassing on iOS)
  • First every UK “group action” vs Google UK on Feb 2013 claiming 10m Safari users effected www.googlelawsuit.co.uk and www.facebook.com/SafariUsersAgainstGooglesSecretTracking UK test case, could set precedent for EU class-action cases!
  • BUT… Group-action picked the “Wrong Google” when the case was submitted, doh! They accidently selected “Google Inc” not “Google UK” http://www.infosecurity-magazine.com/view/34033/google-responds-to-british-lawsuit-uk-privacy-laws-dont-apply/
  • Successful class action raids in US… £6.5 million £10 per user £13million hit Settlement funds 50:50 between users and Class Action Lawyers. Previous settlements 70:30, thus smaller % cut for Class Action Lawyers, but huge volume claim.
  • W3C republic – A new hope for Truce DNT user signal Must be UNSET by default
  • Browser ignore the W3C consensus Firefox: Talk`s about a blockade of 3rd party cookies MS: Windows8 IE10 rollsout DNT=1 which is UNSET by default!
  • Firefox Lost battle: Too many False positive Firefox says its Han`s are tied for a few month on 3rd party cookies Dark Side too powerful ;)
  • MS IE10 DNT=1 browser signal ON by default… …Both Apache & Yahoo threaten to ignore DNT=1 from IE10… …IE10 DNT signal grounded http://www.ypolicyblog.com/policyblog/2012/10/26/dnt/ http://www.admonsters.com/article/apache-ignores-ie10-dnt-signal
  • Alternative Cookie Clearinghouse proposed (like stopbad malware list) Allow “Good” cookies Block “Bad” Cookie`s
  • W3C republic 2 years reign – disunity rules! Peter Swire - Chief resign Jonathan Mayer – Firefox resign Thomas Roessler - Joins Google http://lists.w3.org/Archives/Public/public-tracking/2013Jul/0550.html
  • Advertising Principles (AdChoices) proposed as alternative principles to W3C`s DNT
  • Privacy in the Universe restored! Users have choice & freedom within the Global Galactic Empire
  • But… The secret arms race
  • The Dark Star BIG Data Centre …with ability to process: Device signatures UserID respawn Custom Remarketing Also affiliate networks start building Device Signature conversion tracking tools: We (tradedoubler.com) are looking at options such as device recognition, using non-personally identifiable information that is freely available from a user’s device. Using advanced matching algorithms a single device can be recognized at the point of impression/click and conversion without the use of cookies. http://www.tradedoubler.com/uk-en/blog/firefox-22-cookies/
  • Plans for Device Signatures leaked Belgium advanced scanner study (by KU Leuven University)
  • War for Anonymity (aka War of Shadows)
  • Browser (excluding Chrome) secretly move to anonymise device signatures So that all customised devices extensions look the same! Thus… destroying any shadow tracking
  • Facebook(Borg) & Google(Empire) forced browser push DNT=0 on user login
  • Prism Tracker Unexpected “Snow den monster” Ed Enforcers/regulators get a boost of user support
  • Headless Browser robotic crawler causing havok in GA data! Definition: A headless browser is a web browser WITHOUT a user interface. Examples of Headless Browsers: • Zombie.js • Phantom.js • HtmlUnit Impossible to differentiate from a real user! www.webmasterworld.com/search_engine_spiders/4619880.htm nodejsmodules.org/new/tags/spider
  • Authenticate/Loggedin user tracking might be the only method of prevention
  • Jedi Strike 2014 invasion of Privacy officers Forced 2% global revenue power University Research divisions expand the use of Taint Droids Note: Anti-train droid link: http://gsbabil.github.io/AntiTaintDroid/
  • Polarisation: Dark get darker (e.g. IE fav icon party cookies bypassing hole) White get Whiter (e.g. duckduckgo.com & ixquick.com) rd 3
  • Class Action Prosecutors Balance of Power Google Data Empire High Chance of Blackhat Detection Ad Revenue Fines/Lawsuits Browsers (in the middle) Jedi Enforcers Facebook Borg Affiliates Low Chance of Blackhat Detection
  • Because… LITTLE MISS INFORMATI …HAS CAUSED ON USER CONFUSION & A MUDDLE
  • Data Dealer video http://www.youtube.com/watch?v=x2eCAgQ1DTo&list=PL45AABD8BB96D3785&index=7
  • THIS HAS CAUSED USER CONFUSION & A MUDDLE
  • So… Are we the bad guys?
  • In the eyes of the user… YES!!
  • …How do WE prevent big corporations (and niche bad players) misusing user data/power?
  • With Great Data comes Great responsibility
  • Look to the future… Industry need to govern & enforce itself!
  • That’s means YOU need to agree not break the analytics code of honour Good Bad Report any thing that looks a bit “Grey” AND make sure no one else abuses the system!
  • Standards & Self regulation • • • Vendor built-in privacy & miss-use protection Adwords & Adsense ToS levels Affiliate network guidelines • • • • • • WAA Code of Conduct GA qualified individual GAP certified partner WAA Certified Ethical Analyst Risk assessment / Compliance audit Third party reviews & compliance automated monitoring
  • Please look out for U.i.O User Intent Override
  • Is this a User Intent Override? UIO?
  • Need for Industry standards and Honey pots / seeds tests. Forced Training & Accreditation (e.g. Certified Analyst or MOWA member) Google Adwords privacy cpc tax and Google organic SERP ranking bonus
  • Fixes (GA profile filters)  GA profile filters: Hostname include filter: (^|.)yourdomain.com$ ISP location exclude Ask.com bot: ^(inktomi corporation|iac search and media europe ltd|iac search media inc|yahoo! inc.|facebook inc.|stumbleupon inc.|dub6 ec2|site confidence test agent servers|site ?confidence|apache ltd.|nielsen netratings|affinity internet inc|microsoft corp)$ Top content report - Contains box: (email|add|postcode|zipcode|tel) or [?&](.+)=(.*)gmail.com Weekly scheduled report to check for the above Check data stored in utm_content, User-defined, CustomFields & Event fields Check all GA profiles including Raw Data profile for PII`s, and add exclude parameters where necessary.
  • Fixes (process changes)  Account protection Training for developers and marketers Check Scheduled reports not sending to unknown users. Limit number of Number of Admin users Enable 2 stage authentication if possible. Looks for unusual variances of data spikes in GA (especially new visits to homepage) CPA audits (GA vs Affiliate report)
  • Back to the present day…
  • …California DNT track law Sept 2013 Expected soon Yikes… are they Disabling Tracking??
  • California just asks for DNT visibility (i.e. Does your server see the DNT signal?) I`ll be track-ed (still)
  • Prevention     Use a tag management system, that is configured with digitalData layer privacy features enabled (see appendix) Try to use POST request rather than GET request where possible, or a form action=/thankyoupage.html Keep pdf reader, flash & java updated Lockdown FTP to fixed set of static IP`s, and use 2stage Authentifcaiton for GTM write-access.
  • This is how things should be… Closing Remarks Users should not needing to relying on despicable class action lawyers Google acts even more responsibly Enforcers just watchers not needing to intervene Facebook introduces a more human privacy interface
  • Party Tonight: 19:30 NVMERI 20:10 MyCool King + DJ Trush 21:00 Charlie Straight 22:15 midi lidi May the Marketing Force be with you!
  • Questions
  • Appendix…
  • Small Favour  http://www.digitalanalyticsassociation.org/?page=codeofethics  Google for “DAA code of ethics”  Please Sign!  Also see UK institute of analysts code of conduct.
  • DISCLAIMER – I`m not a lawyer GA terms of service http://www.google.com/analytics/terms/us.html http://www.google.com/analytics/learn/privacy.html Privacy Trouble shooter http://support.google.com/bin/static.py?hl=en&ts=1291807&page=ts.cs Report a privacy concern http://www.google.com/contact/ Contact Google Analytics http://support.google.com/analytics/bin/request.py?hlrm=en&contact_type=contact_policy https://support.google.com/adwords/answer/8206?contact=1&rd=1 Report a security concern security@google.com http://www.google.com/security.html
  • Discussion Questions        How much is your data worth? Can you afford to drive traffic in the dark with no insight? Is PII or sensitive data or urls being accidentally tracked? Can competitors detect that PII data is being sent into GA? Are you in a very competitive industry? When was the last time you audited your WA installation? Are you capturing data that easily allows an individual to be “linked” or “re-identified” by Google (e.g. detailed demographic data example, or Netflix.com + IMDB.com example1 or example2)
  • Related presentations & resources CookieTAB virus screenshots https://www.dropbox.com/s/w0gprycb23ajguw/2011_03_18%20CookieTAB%20virus%20scr eenshots%20.pptx Effect of EU Cookie law on US businesses: https://www.dropbox.com/s/ces1m53mm7o4gmm/2012-1004%20GAUGE%20Boston%20%20Effect%20of%20EU%20Cookie%20law%20on%20US%20organisations.pptx Recipe for a Cookie Law https://www.dropbox.com/s/l9n3gchusdv57bm/2011_03_18%20Recipe%20for%20a%20Co okie%20Law%20by%20Phil%20Pearce%20.pptx . Cookie law Implementation Examples https://www.dropbox.com/s/7q8qfxesk44tpkc/Implimentation%20Examples%20by%20Phil %20Pearce%202012_03_18.pptx Cookie compliance Audit - Example.docx https://www.dropbox.com/s/idyrql6c1aniaw6/01%20UK%20Cookie%20compliance%20Audi t%20-%20Example.docx CookieLaw research in 90mb Dropbox: https://www.dropbox.com/s/uapu90d7rc2uxl1/2012_Cookie_Law_Resources_Folder_40mb _Download.zip
  • Appendix External privacy feedback mechanisms: safeharbor.export.gov/companyinfo.aspx?id=16626 feedback-form.truste.com/watchdog/request?url=www.google.com www.bbb.org/sanjose/business-reviews/internet-services/google-in-mountain-view-ca214105/file-a-complaint www.networkadvertising.org/contact-support/report-problem/i-would-report-violation-of-naicode-nai-member-company-2 www.snapsurveys.com/swh/surveylogin.asp?k=133707671186 [ICO.gov.uk form] addons.mozilla.org/en-US/firefox/addon/privacy-dashboard/ [W3C feedback mechanism] www.google.com/trends/explore?hl=en#cat=0-14-54-1281&geo=US&date=today%203m&cmpt=q [user web searches in category of “privacy” per country] Security & Privacy prize of upto £13K offered by Google for detecting holes: www.google.com/about/appsecurity/reward-program/ blog.chromium.org/2012/08/announcing-pwnium-2.html Example XSS hole in GA found in 2008: derkeiler.com/Mailing-Lists/Full-Disclosure/200812/msg00200.html Open Source feedback techniques fourthparty.info/data appanalysis.org/download.html Free to check cookie databases: www.cookielaw.org/cookie-search.aspx?domain=http://www.facebook.com www.cookiecert.com/cookies-for-facebook.com privacyscore.com/score_details/2a03b4fe8d9d4eb8b4fb0ccf356cbaaa/showcase
  • Privacy by Design: Client-side values Sourced from customer experience digitalData Privacy sub-group on JavaScript objects with Privacy meta data digitalData.privacy.mapping = [ "page" : "public", // describes the page itself "product" : "public", // further describes the product associated with the page "cart" : "identifiable", "transaction" : "sensitive", "transaction total" : "private @analytics", // defined as private, but we make an exception for our analytics tools "event" : "public", "privacy" : "public", // everyone should be able to see the general privacy definitions and policy "privacy mapping" : "@system", // in this example, we decide we don't want to expose the exact policy mapping "user" : "identifiable", "user segment" : "public" ]; // PP: Proposed digitalData layer privacy object for VISITOR digitalData = { "visitor": { "returningStatus": "new", // new or returning visitor: used to only trigger consent message for new visitors "preferenceForDNT": window.navigator.doNotTrack, // yes|no|"not specified". MUST defaulted to "not specified" "anonymizeIp": false, // hash last 3 characters of IP address in GA. Defaulted to off/false. "geoplugin_status": geoplugin_status, // 403 error, 200 is look-up ok "geoIPcountryCode": geoplugin_countryCode, // geo-plugin JS variable "geoIPcontinentCode": geoplugin_continentCode // geo-plugin JS variable }, {// Server-side USER values on login or registration "user": { "profile": { "auth_isSignedIn": true, "auth_isNewRegistration": true, // used to only trigger consent message on first registration "auth_userIDtoSessionIDoveride": false, "profileID": 12345 } } } }
  • Privacy by Design: Server Response Effectively this is saying… serverResponseForDNT = obeyDNT|ignoresDNT|inprogressDNT|"not specified" // From DNT Preference Expression Spec: http://www.w3.org/TR/tracking-dnt/#status-representation http://www.w3.org/2011/tracking-protection/drafts/tracking-dnt.html#status-representation { "targeting": "yes", // IsOnlineBehaviouralTargeting for Publishers OR onsite remarketing for Advertisers enabled? "tracking": "yes", // Is AudienceMeasurementTracking enabled "qualifiers": "afc", // external "A"udit + "F"raud prevention + ad-frequency "C"apping "controller": "http://www.yourdomain.com/privacy.html", "same-party": [{ "google-analytics.com", "stats.g.doubleclick.net", "api.youtube.com" }], "third-party": [{ "googleadservices.com" "ads.doubleclick.net", }], "audit": [{ "http://policy.cookiereports.com/caf4f823-en-gb.html" // e.g. w3.org/P3P/validator.html }], "policy": "/privacy.html#cookies", "edit": "http://www.yourdomain.com/user-dashboard/edit-your-data" }