Big Data & Privacy
Assoc.Prof. Abzetdin ADAMOV
CeDAWI - Center for Data Analytics and Web Insights
Qafqaz University
aadamov@qu.edu.az
http://ce.qu.edu.az/~aadamov
IDC’s Security Roadshow 07 June 2016
Digital Universe Volume
• 2003 – 5 exabytes from beginning of civilization
• 2005 – 130 exabytes
• 2008 – 480.000 petabytes (PB)
• 2009 – 800.000 PB
• 2010 – 1200 000 PB or 1.2 zettabyte (ZB)
• 2011 – 1.8 ZB
• 2012 – 2.7 ZB
• 2014 ~ 6.2 ZB
• 2015 ~ 10 ZB
• Expected to reach 44 ZB by 2020
Every day now we create as much information as we
did from the dawn of civilization up until 2003
CeDAWI Research Center, IDC’s Security
Roadshow 07 June 2016
Each Second Online
• 25 Terabytes transferred through across Internet
• 9 Website created (172 000 per day)
• 1 800 000 SPAM emails sent
• 4 100 Photos posted on Facebook (355 mln per day)
• 5 000 Instagram photos uploaded
• 1 500 Skype calls made
• 4 000 Tweets tweeted (340 mln per day)
• 10 000 Dropbox files uploaded
• 45 000 Google searches made (3.5 bln per day)
• 92 000 YouTube videos viewed
• 55 000 Facebook likes
CeDAWI Research Center, IDC’s Security
Roadshow 07 June 2016
5Vs of Big Data
Big Data is extremely large data set that analyzed computationally to reveal patterns,
trends and associations, especially related to human behavior and interactions.
CeDAWI Research Center, IDC’s Security
Roadshow 07 June 2016
How Deep is Your Digital Footprint?
Two types of consumers: who is highly concerned on Privacy, and who fill
okay with openness, good and cheap services...
CeDAWI Research Center, IDC’s Security
Roadshow 07 June 2016
Services Powered by Data
• Search, mapping, advertisement, shopping
• Wireless-location, navigation, traffic
• All devices talk to each other (IoT)
• CCTV cameras, video recording in public transport
• Video registrators in cars
• Bio passports – remote scanning
• Cellphones – location tracking
• GPS-embedded devices
• Internet connectivity
• Phone call metadata
All services we used today are powered by data and generating
huge amount of Data…
CeDAWI Research Center, IDC’s Security
Roadshow 07 June 2016
What is Big Data?
The answer depend who you ask to…
Businesses
 Increased need to leverage personal and sensitive information
for competitive advantage
 Investment into data sources and data analytics
Criminals
 New opportunities for Identity Theft
 Sophisticated technology and tools for finding vulnerabilities
Consumers
 Increased awareness and concern about collection, use and
disclosure of their personal information
Legislators
 Taking measures towards restricting access and use of personal
information
 Applying significant restrictions on business operation
CeDAWI Research Center, IDC’s Security
Roadshow 07 June 2016
Total Surveillance
Data Analytics are so efficient that they can reveal far more than
most people had anticipated when Data was Stored or Shared.
DATA ANALYTICS
CeDAWI Research Center, IDC’s Security
Roadshow 07 June 2016
Big Questions of Big Data Privacy
• What companies do with personal data they collect?
• How do we know that they are doing what they say?
• When exactly our right violated?
• Why should it matter to us?
• How to make Big Data Privacy friendly?
CeDAWI Research Center, IDC’s Security
Roadshow 07 June 2016
Privacy Vision from Angles
• What PI users agree to provide in exchange for free services?
• Under which conditions service providers agree to provide
free services?
• Peer-to-Peer negotiation on Privacy or binary choice?
• Free Services – are they really Free?
CeDAWI Research Center, IDC’s Security
Roadshow 07 June 2016
Why Privacy is Matter?
• 898,590,196 records compromised in 4,850 data breaches
since 2005
• TV has camera, microphone, track 100% of use
• Share PI with Insurance Company for lower price
• EULA (End User License Agreement) of Facebook
CeDAWI Research Center, IDC’s Security
Roadshow 07 June 2016
Things Happening Today
• Devices on Utility Poles to determine Radio Stations being listened to by passing drivers
• Automatic license‐plate readers (visual and RFID) – Insurance
• The Target inferred through shopping behavior that a teenage customer was pregnant
• Experts at the MIT and the Cambridge Police Department have used ML algorithm
• Differential pricing for flight tickets, college costs, etc.
• By tracking Cell Phones, retailers recognize returning customers, just as cookies
• Social Media and public data make it easy to infer the Network of friends
• Colleges use Predictive Modeling to identify students who are at risk of dropping out
• LendUp, a California‐based startup, use social media as scoring source to provide credit
• Sensors on Cell Phones can reveal when Drivers make dangerous maneuvers
• Heart Rates from the changes in Facial Coloration that occur with each beat
CeDAWI Research Center, IDC’s Security
Roadshow 07 June 2016
Types of Protected Information
Personal Identifiable - PII Sensitive Information Nonidentifiable Information
Name
Postal Address
Email Address
Tel/Mobile Number
Social Security Number
Bank Account
Credit/Debit Card Number
ZIP Code
Race/Ethnicity
Political Opinion
Religious Beliefs
Health/Medical Records
Marital Status
Age
Gender
Criminal Records
Cookie ID
Static IP Address
Computer Literacy
Preferences
Mouse‐clicks, taps, swipes
Phone calls metadata
IoT Devices
CeDAWI Research Center, IDC’s Security
Roadshow 07 June 2016
EU-US Safe Harbor Principles
• Notice (Transparency): inform about purpose of collection information
• Choice: give the choice to decide which information to share or opt out
• Consent: disclose information to third parties only under Notice and
Choice principles
• Security: protect PI from loss, misuse, unauthorized access, disclosure and
alteration
• Data Integrity: assure the veracity of PI, accuracy, novelty
• Access: provide individuals with access to their own PI
• Accountability: operator must be accountable for following the principles
CeDAWI Research Center, IDC’s Security
Roadshow 07 June 2016
Legislations: Privacy or Innovation
• Concept “Right to be Forgotten” concept discussed in EU since 2006
• European Data Protection Regulation and Directive 95/46/EC the EU in 2012 (as of May
2014, Google has removed 1,390,838 URLs)
• Russian Personal Data Law, 01 September 2015 – new rules obliging all companies
offering Internet services to store its citizens personal data inside the country
• Grandson of oil tycoon John Rockefeller, Senator Jay Rockefeller is involved in "New Oil"
as his grandfather in oil (Do Not Track Bill - "do not track" database, new HTTP request
headers for Mozilla Firefox: XBehavioral-Ad-Opt-Out and X-Do-Not-Track
• Federal Trade Commission - Google, Facebook, Twitter, Apple to take care about privacy
of data they collect
Have to be balance of regulatory approach to not intervene
and stop this innovation and technology
CeDAWI Research Center, IDC’s Security
Roadshow 07 June 2016
Many Thanks
aadamov@qu.edu.az
www.cedawi.org
CeDAWI Research Center, IDC’s Security
Roadshow 07 June 2016
http://idcitsecurity.com/baku

Big Data & Privacy

  • 1.
    Big Data &Privacy Assoc.Prof. Abzetdin ADAMOV CeDAWI - Center for Data Analytics and Web Insights Qafqaz University aadamov@qu.edu.az http://ce.qu.edu.az/~aadamov IDC’s Security Roadshow 07 June 2016
  • 2.
    Digital Universe Volume •2003 – 5 exabytes from beginning of civilization • 2005 – 130 exabytes • 2008 – 480.000 petabytes (PB) • 2009 – 800.000 PB • 2010 – 1200 000 PB or 1.2 zettabyte (ZB) • 2011 – 1.8 ZB • 2012 – 2.7 ZB • 2014 ~ 6.2 ZB • 2015 ~ 10 ZB • Expected to reach 44 ZB by 2020 Every day now we create as much information as we did from the dawn of civilization up until 2003 CeDAWI Research Center, IDC’s Security Roadshow 07 June 2016
  • 3.
    Each Second Online •25 Terabytes transferred through across Internet • 9 Website created (172 000 per day) • 1 800 000 SPAM emails sent • 4 100 Photos posted on Facebook (355 mln per day) • 5 000 Instagram photos uploaded • 1 500 Skype calls made • 4 000 Tweets tweeted (340 mln per day) • 10 000 Dropbox files uploaded • 45 000 Google searches made (3.5 bln per day) • 92 000 YouTube videos viewed • 55 000 Facebook likes CeDAWI Research Center, IDC’s Security Roadshow 07 June 2016
  • 4.
    5Vs of BigData Big Data is extremely large data set that analyzed computationally to reveal patterns, trends and associations, especially related to human behavior and interactions. CeDAWI Research Center, IDC’s Security Roadshow 07 June 2016
  • 5.
    How Deep isYour Digital Footprint? Two types of consumers: who is highly concerned on Privacy, and who fill okay with openness, good and cheap services... CeDAWI Research Center, IDC’s Security Roadshow 07 June 2016
  • 6.
    Services Powered byData • Search, mapping, advertisement, shopping • Wireless-location, navigation, traffic • All devices talk to each other (IoT) • CCTV cameras, video recording in public transport • Video registrators in cars • Bio passports – remote scanning • Cellphones – location tracking • GPS-embedded devices • Internet connectivity • Phone call metadata All services we used today are powered by data and generating huge amount of Data… CeDAWI Research Center, IDC’s Security Roadshow 07 June 2016
  • 7.
    What is BigData? The answer depend who you ask to… Businesses  Increased need to leverage personal and sensitive information for competitive advantage  Investment into data sources and data analytics Criminals  New opportunities for Identity Theft  Sophisticated technology and tools for finding vulnerabilities Consumers  Increased awareness and concern about collection, use and disclosure of their personal information Legislators  Taking measures towards restricting access and use of personal information  Applying significant restrictions on business operation CeDAWI Research Center, IDC’s Security Roadshow 07 June 2016
  • 8.
    Total Surveillance Data Analyticsare so efficient that they can reveal far more than most people had anticipated when Data was Stored or Shared. DATA ANALYTICS CeDAWI Research Center, IDC’s Security Roadshow 07 June 2016
  • 9.
    Big Questions ofBig Data Privacy • What companies do with personal data they collect? • How do we know that they are doing what they say? • When exactly our right violated? • Why should it matter to us? • How to make Big Data Privacy friendly? CeDAWI Research Center, IDC’s Security Roadshow 07 June 2016
  • 10.
    Privacy Vision fromAngles • What PI users agree to provide in exchange for free services? • Under which conditions service providers agree to provide free services? • Peer-to-Peer negotiation on Privacy or binary choice? • Free Services – are they really Free? CeDAWI Research Center, IDC’s Security Roadshow 07 June 2016
  • 11.
    Why Privacy isMatter? • 898,590,196 records compromised in 4,850 data breaches since 2005 • TV has camera, microphone, track 100% of use • Share PI with Insurance Company for lower price • EULA (End User License Agreement) of Facebook CeDAWI Research Center, IDC’s Security Roadshow 07 June 2016
  • 12.
    Things Happening Today •Devices on Utility Poles to determine Radio Stations being listened to by passing drivers • Automatic license‐plate readers (visual and RFID) – Insurance • The Target inferred through shopping behavior that a teenage customer was pregnant • Experts at the MIT and the Cambridge Police Department have used ML algorithm • Differential pricing for flight tickets, college costs, etc. • By tracking Cell Phones, retailers recognize returning customers, just as cookies • Social Media and public data make it easy to infer the Network of friends • Colleges use Predictive Modeling to identify students who are at risk of dropping out • LendUp, a California‐based startup, use social media as scoring source to provide credit • Sensors on Cell Phones can reveal when Drivers make dangerous maneuvers • Heart Rates from the changes in Facial Coloration that occur with each beat CeDAWI Research Center, IDC’s Security Roadshow 07 June 2016
  • 13.
    Types of ProtectedInformation Personal Identifiable - PII Sensitive Information Nonidentifiable Information Name Postal Address Email Address Tel/Mobile Number Social Security Number Bank Account Credit/Debit Card Number ZIP Code Race/Ethnicity Political Opinion Religious Beliefs Health/Medical Records Marital Status Age Gender Criminal Records Cookie ID Static IP Address Computer Literacy Preferences Mouse‐clicks, taps, swipes Phone calls metadata IoT Devices CeDAWI Research Center, IDC’s Security Roadshow 07 June 2016
  • 14.
    EU-US Safe HarborPrinciples • Notice (Transparency): inform about purpose of collection information • Choice: give the choice to decide which information to share or opt out • Consent: disclose information to third parties only under Notice and Choice principles • Security: protect PI from loss, misuse, unauthorized access, disclosure and alteration • Data Integrity: assure the veracity of PI, accuracy, novelty • Access: provide individuals with access to their own PI • Accountability: operator must be accountable for following the principles CeDAWI Research Center, IDC’s Security Roadshow 07 June 2016
  • 15.
    Legislations: Privacy orInnovation • Concept “Right to be Forgotten” concept discussed in EU since 2006 • European Data Protection Regulation and Directive 95/46/EC the EU in 2012 (as of May 2014, Google has removed 1,390,838 URLs) • Russian Personal Data Law, 01 September 2015 – new rules obliging all companies offering Internet services to store its citizens personal data inside the country • Grandson of oil tycoon John Rockefeller, Senator Jay Rockefeller is involved in "New Oil" as his grandfather in oil (Do Not Track Bill - "do not track" database, new HTTP request headers for Mozilla Firefox: XBehavioral-Ad-Opt-Out and X-Do-Not-Track • Federal Trade Commission - Google, Facebook, Twitter, Apple to take care about privacy of data they collect Have to be balance of regulatory approach to not intervene and stop this innovation and technology CeDAWI Research Center, IDC’s Security Roadshow 07 June 2016
  • 16.
    Many Thanks aadamov@qu.edu.az www.cedawi.org CeDAWI ResearchCenter, IDC’s Security Roadshow 07 June 2016 http://idcitsecurity.com/baku