In the Internet of Things (IoT), 4.2 Billion people create approximately 2.5 quintillion bytes of data each day. Businesses, Governments, Law Enforcement, and Hackers mine this data to create digital profiles. Digital Profiling has been established as a common practice to predict an individual's influencers and behaviors based on multiple public and private data sources. In this session, we will discuss Idiographic Digital Profiling works, how digital footprints are used to construct your digital profile, the impact of digital profiling, and options to take control of your internet data footprint.
Securing Your Digital Footprint: Idiographic Digital Profiling and the Losing Battle of Controlling Your Data
1. Michael Torres
NVCC Professor, IT & Cybersecurity
Mtorres@nvcc.edu
Linkedin: http://www.linkedin.com/in/michaelto
Slideshare Presentation Repository: https://www.slideshare.net/MichaelTorres139
NVCC Student Research Volunteers
• Elsie Darko: Profiling Challenges
• Dorian Araneda: Profiling Tools and Services
• Stephen Bilson-Ogoe: Profiling Examples
NOVA CyCON – Cybersecurity Conference
(Cytalks.com)
Securing Your Digital Footprint: Idiographic
Digital Profiling and the Losing Battle of
Controlling Your Data
March 2019
2. CIA's 'Facebook' Program Dramatically Cut Agency's
Costs
https://www.youtube.com/watch?v=8BZxLM3BpOk
2
3. Agenda
• Understanding Digital Profiling & the Role of Data,
Information and Knowledge
• Data Anonymization & Re-Identification
• Digital Profiling Framework: A Law Enforcement
Perspective
• Digital Profiling Challenges
• How do you take control of your digital footprint?
4
5. What is Digital Profiling?
6
Sources
• Colombini & Colella (2011) Digital profiling-A computer forensics approach
• Colombini & Colella (2012) Digital scene of crime: Technique of profiling users
• Steel, C. (2014) Idiographic Digital Profiling: Behavioral Analysis Based on Digital Footprints.
• Grigaliunas & Toldinas (2016) Digital Evidence Investigation Using Habits Attribution
• An idiographic approach to technical and intelligence
behavioral analysis
• For Examining and Attributing a particular subject’s
aggregate direct and indirect digital memories (aka. Digital
Footprint) from electronic media, internet activities, and social
interactions,
• To 1) Construct a digital profile; 2) Understand modus
operandi; and 3) Extrapolate behavioral, ideological, and
social influencers, patterns, and predictors.
6. Digital Profiling Deconstructed
7
• Level of Analysis: Particular Subject
• Purpose: Construct, Understand and Extrapolate
• Practice: Idiographic approach is the discovery of
“own” or “private” facts and processes
• Process: Examine and Attribute
• Product (Output):
• Digital Profile
• Modus Operandi
• Influencers, patterns, and predictors
7. 8
Idiographic Digital Profiling Goal
Source: Steel (2014) Idiographic Digital Profiling: Behavioral Analysis Based on Digital Footprints
Cross-site Tracking
Attribution (Re-Identify) & Disambiguation
Map the Subject’s Digital Presence
Enumerating Associates
Providing Subject Insights
Organize digital intelligence regarding a subject
into a timely, actionable profile
8. What are the outcomes of Digital Profiling?
9
• Understand flow, intention, purpose, timeline, &
contextual variables
• Subject Attribution & Disambiguation
• Extrapolate influencers, patterns, and predictors of:
• Behavior
• Ideology
• Decision Making
• Social & Familial interaction and association
• Other Domain-Specific Characteristics (i.e., Criminal,
Medical, Performance, Purchasing, etc.)
• What's your reason?
9. Source: Colombini, C., & Colella, A. (2011) Digital profiling: A computer forensics approach
Digital Profiling Technique
10
Identify the goal
Collect and assess targeted data from
mass memory
Selection of relevant information and
extraction of indicators
Information matching of data (indicators)
Collection of information (previously
compared) and develop a "digital profile"
Interpretation of the result in comparison to
the initial goal
10. 11
What is the Difference and Relationship Between
Data vs Information vs Knowledge
?
11. What is data?
12
Structure of data
• Structured: Organization &
searchable data
• Unstructured: Lacks the internal
structure that complies with standard
or proprietarily data models or
schemas.
Data Categories
• Data: Description of things, events, activities, and transactions that are
recorded, classified, and stored but are not organized
• Metadata: Data about data
State of Data
• In-Transit: “Active” Data that is
being Processed or transferred
• At-Rest: “Inactive” Data stored in
its intended storage location
Types of Data
• Volatile: Data that is discarded automatically
• Temporary: User data the system manages without user action
• Persistent: User Management Data
12. Role data in Digital Profiling
13
• Any type Data (Structured or Unstructured) and metadata, in any
state (‘In-Transit’ and ‘At-Rest’), that can directly or indirectly be
associated to subject, for the purposes of extrapolating influencers,
patterns, and predictors.
• Potential Data Sources include, but not limited to:
• Any Device Type (e.g., Computers, Wearable Devices, GPS)
• Open Data Sources (e.g., Internet)
• Social Data Sources (e.g., Instagram)
• Private Data Sources (e.g., Corporate Systems)
• Protected Data Sources (e.g., PHI & PII)
• Semi-Private Data Repositories (e.g., Credit Reporting agencies
& Research Firms)
• Public, Semi-Public, & Private Data Exchanges (e.g, Libraries,
Medical, Legal)
14. Role of Knowledge in data profiling?
15
Construct Information and Acquire Knowledge
Tacit Knowledge
(Subjective/Experiential)
Explicit Knowledge
(Objective/Technical)
Knowledge that comprises part of an organization’s or individual’s
‘Memory’, Ideology, or Culture, that may be used to understand,
Influence or manipulate.
Graphic Source: Rainer & Cegielski (2012) Introduction to information systems
15. 16
Goals of Data Analysis
Data = What you have
Information = What you Understand
Knowledge = What you Retain
16. Here’s Why Big Data Could Be Dangerous | Fortune
https://www.youtube.com/watch?v=lJ6AOpzk7VA
17
18. 19
Common Types of Personal Identifiable/Attributable Data
Personal Identifiable Information (PII)
Protected Health Information (PHI)
Personally Identifiable Financial Information
(PIFI)
19. 20
5 Levels of Identifiability
Source: https://georgetownlawtechreview.org/re-identification-of-anonymized-data/GLTR-04-2017/
• Name, Address, SSN
• DoB, Zip Code, Medical Record
Number, IP address, Geolocation
• Movie & Retail Preferences
• Aggregated Census data,
survey results
• Weather
Individual
Direct
Identifier
Individual
Indirect
Identifier
Linked to
multiple
individuals
Not linkable
to an
Individual
Not related
to
Individuals
20. 21
Identifiable vs De-identify vs Re-Identify
2/de-
Source: http://caristix.com/blog/2010/12/de-identifying-patient-data-part-2/
21. The Four Types of Data Scrubbing
(De-Identification)
• Entirely remove or redact any data that directly identifies a person
Deletion or Redaction
• Replacing data with pseudonyms that are either randomly
generated or determined by an algorithm
Pseudonymization
• Uses ‘static’ to obscure the identity of the individuals involved.
These include
Generalization – Perturbation - Swapping
Statistical Noise
• Dataset is aggregated and only a summary statistic or sub-set is
released.
Aggregation
Source: https://georgetownlawtechreview.org/re-identification-of-anonymized-data/GLTR-04-2017/
23. 24
Sweeney 2002 Study on Re-Identification
Anonymized
Public Medical
dataset from the
Commission of
Massachusetts
Massachusetts
Voter Registration
list purchased for
$20
Combining two or more datasets can
facilitate re-identification
Sweeney (2002) Anonymity: A model for Protecting Privacy
25. Source: Grigaliunas & Toldinas (2016) Digital Evidence Investigation Using Habits Attribution
Profiling Framework: Analysis Domains
26
General framework for the analysis and the acquisition of digital evidence
26. Source: Grigaliunas & Toldinas (2016) Digital Evidence Investigation Using Habits Attribution
Profiling Framework: Attribution Domain
Cause of Behavior
27
• Internal attribution subdomain (Personality Traits): process of
assigning the cause of behavior to some internal
characteristics. (e.g., Behavior of a person to his personality,
motives or beliefs)
• External attribution subdomain (Situation or Event): process
of assigning the cause of behavior to some situation or event
outside a person’s control to explain behavior. (i.e., situational
or environment features)
27. Source: Grigaliunas & Toldinas (2016) Digital Evidence Investigation Using Habits Attribution
28
Profiling Framework: Profile Domain
Profiling Methodologies
Inductive
Investigative
Psychology
Three Stages
Input
Decision Process
Criminal Profile
Digital
Behavioral
Analysis
Deductive
Behavioral
Evidence
Analysis
FBI Model
Five Profile Factors
Interpersonal Coherence
Time/Place Significance
Criminal Career
Criminal Characteristics
Forensic Awareness
Two Phases
Investigative
Trial
Approach
Idiographic
Digital Profiling
(IDP)
Analysis Goal: Generalize the behavior Analysis Goal: Specific Behavior
IDP Benefits
- Aggregate Digital Footprint
- Developed iteratively
- Subject disambiguation
28. Source: Grigaliunas & Toldinas (2016) Digital Evidence Investigation Using Habits Attribution
29
Profiling Framework: Habit Domain
Profiling Approach
Objective
• Subject disambiguation
• Subject identification
• Lead generation
Goals
• Exposure Context: habit
representation & form habitual
response
• Habit Activation: Determine what
Causes Habit
29. Source: Steel (2014) Idiographic Digital Profiling: Behavioral Analysis Based on Digital Footprints
30
What typically constructs a Digital Profile?
Digital Biography
Information
Affinity/Competency Axes
• Identifiers
• Passwords
• Sites Visited
• IP Addresses
• Locations
• Associates
• Technical Ability
• Countermeasures
• Sociability
• Domain Knowledge
• Computer Scientist
• Associates
31. 32
Currently, there is no laws explicitly preventing or
regulating the practice of digital profiling.
However, The following may be illegal factors of
consideration
How data is obtained
From whom the data is obtained
The actions taken with the data
Is digital profiling legal? YES
The Legal Challenge
32. The Legal Challenge
33
Year Law
Record
Type
Info
Type
Access Disclosure
Unauthorized
Disclosure
Notification
1996 Health Insurance Portability and Accountability Act (HIPAA) of 1996 X X X
2003 HIPAA Privacy Rule of 2003 X X X
2005 HIPAA Security Rule of 2005
2013 Health Information Technology for Economic and Clinical Health (HITECH) Act
1999 Gramm-Leach-Bliley Act (GLBA) Financial PIFI X X X
1974
2012
Family Educational Rights and Privacy Act (FERPA) Education PII X X X
1974 Privact Act (PA) General PII X X X
Medical PHI
Year Law
Record
Type
Info
Type
Use
Data
Exchange
Reasonable
Security
Protections
1996 Health Insurance Portability and Accountability Act (HIPAA) of 1996 X
2003 HIPAA Privacy Rule of 2003 X
2005 HIPAA Security Rule of 2005 X
2013 Health Information Technology for Economic and Clinical Health (HITECH) Act X X
1999 Gramm-Leach-Bliley Act (GLBA) Financial PIFI X X
1974
2012
Family Educational Rights and Privacy Act (FERPA) Education PII X X X
1974 Privact Act (PA) General PII X
Medical PHI
What is covered?
33. The Legal Challenge
34
Year Law
Record
Type
Info
Type
De
Identification
Re
Identification
Self
Disclosure
1996 Health Insurance Portability and Accountability Act (HIPAA) of 1996
2003 HIPAA Privacy Rule of 2003 X
2005 HIPAA Security Rule of 2005
2013 Health Information Technology for Economic and Clinical Health (HITECH) Act
1999 Gramm-Leach-Bliley Act (GLBA) Financial PIFI
1974
2012
Family Educational Rights and Privacy Act (FERPA) Education PII
1974 Privact Act (PA) General PII
Medical PHI
De-identification is treated as an assumption in legal interpretation
Anonymization and Re-Identification is Assumed
34. The Legal Challenge
35
Current Laws
Access
Disclosure
Use
Reasonable Security Protections
Some Laws
De-Identification or Anonymization
No Laws (Legal Loophole)
Re-Identification or De-Anonymization
Self-Disclosure
35. The Corporate Challenge
HR Professionals, Recruiters and Managers may use digital
profiling to:
• Recruitment: find and assess potential employees
• Workplace Performance: Continuously evaluate the character
of their employees
Recruitment studies show
• 65% of recruiters Google a candidate during the hiring process.
• 50% of recruiters have turned down a candidate based on what
they found on their Facebook, Instagram, Twitter or LinkedIn.
• LinkedIn largely influences recruiters’ decisions and widely
popular among employers
Sources
• Frith (2016) Candidates' social media profiles influence hiring decisions
• Sameen & Cornelius (2015) Social Networking Sites and Hiring: How Social Media
Profiles Influence Hiring Decisions
36. How New Yorkers Really Feel About
“Surveillance Capitalism”
https://www.youtube.com/watch?v=xl6WHHlpkZc
(4.5min)
37
37. Data Broker Challenge
• Data Brokers are collecting 1000’s of personal attributes for
each individual by
• Buying or licensing data
• Mining public records, social networks, and other online
sources
• Electronic Data Interchanges
• Just asking you for the information
• FREE if you sign up
• Not Reading ULAs
Your Data is Being Harvested and Purchased Every day
Through Data Brokers
38. Data Broker Challenge
Who is using this data?
• Credit Reporting Companies to assess financial risk
• Heath Insurance companies to assess the state of your health
(Purchasing DNA data)
• Advertising and marketing to influence your purchasing through
targeted marketing
• Political Consultants attempting to influence your vote
• Government Agencies pursuing non-violent criminal suspects
• Anyone with a Credit Card and a people search website
Your Data is Being Harvested and Purchased Every day
Through Data Brokers
43. The Social Media Challenge
• Snapchat users share 527,760 photos
• More than 120 professionals join LinkedIn
• Users watch 4,146,600 YouTube videos
• 456,000 tweets are sent on Twitter
• Instagram users post 46,740 photos
Source: https://www.bernardmarr.com/default.asp?contentID=1438
44. The Social Media Challenge
• Internet users have an average of 7.6 social media accounts
• A 2011 study by AOL/Nielsen showed that 27 million pieces of
content were shared every day, and today 3.2 billion images are
shared each day
• Google: That’s an average of 40,000 search queries every
second
• Facebook: 79% of all online US adults use Facebook
• Twitter: There are 500 million Tweets sent each day. That’s
6,000 Tweets every second
• YouTube: 300 hours of video are uploaded to Youtube every
minute
• Instagram: Over 95 million photos are uploaded each day and
4.2 billion Instagram Likes per day
• Linkedin: Over 3 million companies have LinkedIn accounts
Source: https://www.brandwatch.com/blog/amazing-social-media-statistics-and-facts/
45. The Social Media Challenge
• 37% of
consumers
find purchasing
inspiration
from Social
networks
Source:
https://sproutsoc
ial.com/insights/
social-media-
statistics/
46. The biggest threat to controlling your
digital footprint is
You
You’re the Problem
48. How do you take
control of your
digital footprint?
49
49. Best Practices
50
• Don’t Sign up for unnecessary stuff even if its free
• Question every request for information
• Just because someone asks you for the information, doesn’t
mean you have to give them anything.
• Make a habit of profiling yourself and going through the
Arduous process of deleting you data.
• Go into your Email and Social media accounts and set your
privacy settings
• All Smart Digital Assistants are recording and creating vocal
profiles for “improved experience” question whether you need it.
• Yearly ‘Friends’ Purge
51. Search Engine Tools
52
Search engine Content Removal
• Google Remove outdated content Instructions
• Google Search Console - My Removal Requests:
• Bing Content Removal Tool - Bing Webmaster Tools
Information search tools
• Image Search yourself on Google and bing
• Internet Archive: Search Engine
• Internet Archive: Wayback Machine
• Foreverdata.org
• Just Delete Me | A directory of direct links to delete your account
from web services
• Have I Been Pwned: Check if your email has been
compromised in a data breach
52. Email and Social Media Privacy settings
53
• Apps with access to your google account.
https://myaccount.google.com/permissions?pli=1
• Google Account Settings https://myaccount.google.com/?pli=1
• Facebook Apps and Websites Management
https://www.facebook.com/settings?tab=applications
• Twitter 3rd party links https://twitter.com/settings/applications
53. Online Services for fee
54
Online Services to minimize or eliminate your online presence
• DMAchoice is a mail preference service offered by the DMA
https://dmachoice.thedma.org/static/about_dma.php
• DeleteMe - Privacy Protection Services
https://abine.com/deleteme/
• Deseat.me https://www.deseat.me/