SlideShare a Scribd company logo
1 of 17
Rediscovering the hidden Facebook semantic
search engine and reimagining open-source
intelligence
unchained Graph Search
akos.bardoczi.ch@ieee.org
Some important things about OSINT
• In most cases, the most efficient technique is
not well-known, hidden, but not (too) difficult to
use!
• The misunderstood deep web: the tale about
the size of the deep web is based on a 16-yrs-old
research…
• The misunderstood deep web #2: it usually
cannot provide up-to-date, relevant information
About Facebook Graph Search
• Announced by FB in March 2013
• FB almost immediately killed the Advanced
search box due to privacy concerns – but the
search options stay available 
• the dying of semantic search – huge machine
learning failures, such as too few complex
search queries from users
• After 4 years, still available only in US English
Example
• „Budapest University of Technology and
Economics students who are Budapest, Hungary
residents and like Shakira” (sic!) – this will
generate a simple keyword search without
relevant results
• https://www.facebook.com/search/106146106082559/stu
dents/106502519386806/residents/5027904559/likers/int
ersect URL will generate a smart, semantic search query
The universal scheme of GS’s URI structure
• https://facebook.com/(search)/(str)[n]/string_te
rm[n]/entity_id[m]/keyword[m]/(intersect)
• the „search” [optional] is not mandatory in
some cases
• the „str” [optional] indicates simple keyword
terms and must be placed before other terms
• the „intersect” is mandatory in complex queries
More about the scheme of GS’s URI
• the „entity_id” represents the entity of
something, e.g. names, places, religious views,
spoken languages – see below – you can find it
in the client-side code
• There isn’t any limit about query complexity or
length
• The „keyword” indicates the type of entity –
which is important, e.g. a university as a physical
location, as a school, or as a workplace
Some important things
• The queries works only with US English
Facebook but:
• the URI may contain any characters after the
„str” part, e.g. Москва or ‫الدولي‬ ‫دبي‬ ‫مطار‬
• The „word order of the sentence” matters in
most cases
What can you search with Graph Search?
• Basically almost anything!
• in theory you can find any content and relations
between entities and contents which you can
view with your permissions
Most frequently used Graph Search keywords
…now only without explanation
pages-liked, photos, photos-by, photos-liked, photos-of, photos-
tagged, photos-commented, videos, videos-by, videos-of, videos-
liked, videos-commented, apps-used, stories-by, stories-commented,
stories-tagged, friends, events, events-joined, events-interested,
places-visited, places-liked, groups, users-named, home-residents,
residents (/present, /past), likers, users-age, users-born, users-
political-view, visitors, employees (/present, /past), speakers, users-
checked-in, photos-in, videos-in, stories-keyword, date (YYYY), date-2
(MM/YYYY), date-3 (DD/MM/YYYY), react, studied (/present, /past)
The scope of search in practice
• as I mentioned, any content: texts in status
updates, comments, image descriptions, images,
geotags, likes, and other reacts on public pages,
on event pages, and on users’ timelines (even
the items hidden from timeline!)
• Full contents of open groups, full content of
closed and secret groups as a group member
• Basically anything except items specifically
deleted by the user
The scope of search in practice 0x200.
• keep in mind the audience selectors – and
bypass them 
• your scope will exponentially grow with more
friends and after joining more goups – note: the
avg. distance between two randomly chosen
users is 3.5 and users have 300 contacts on avg.
but the limit # of friends is 5000 and you can
join 5000 different groups
The scope of search in practice 0x300.
• You will need a professionally molded, realistic
character for a Facebook user depending on
your research interest
• A professionally molded character [actor] is not
a simple fake profile – and I think this is the
most difficult part – see also OPSEC
OPSEC considerations @ sophisticated
research
• In practice, you cannot make fully-virgin
searches – e.g. the order of results depends on
everything, the previous searches as well
• don’t try to use widely used anonimizer
techniques, for example TOR – the FB will know
it!
• the best practices are similar to the best
practices in forensics lab and in HUMINT
OPSEC considerations @ sophisticated
research 0x200.
• You will need a spare, non-virtual SIM card never used
before
• Depending on the sensitivity of research, you may need a
photoshopped goverment-issued ID – don’t worry,
nowadays researchers can generate realistic faces [difficult
&& not my business ]
• the FB reserves the right of account deactivation,
temporary suspension; let’s minimize this risk
OPSEC considerations @ sophisticated
research 0x300.
• It is recommended to use a virtual machine with
default browser settings – see also: browser
fingerprint
• Once again – do not use TOR! – instead use a reliable
VPN provider, and keep in mind that your IP address
is associated with an approx. location that affects the
order of your search results that you receive
• the Facebook traces user behaviour – e.g. statistical
information about keystrokes speed – including what
you deleted from a text field - and the distribution of
different operations, in short, your every click
• Of course, never mix your actor’s behavior and your
own – e.g. don’t send a friend request to someone
you know personally
OPSEC considerations @ sophisticated
research 0x400.
Tailor your actor’s character and behavior for
the concrete research field
• more complicated than you think
• an ideal actor is similar to a secret agent, who is
familiar with language, culture, language-culture
(!!) in different cases – e.g. counterterrorism,
social psychology researches, or cyber-threat
intelligence context
• in some cases you simply don’t need an actor,
you can search via your own account

More Related Content

Similar to Rediscover hidden facebook semantic search engine, reimagine open source intelligence

Digital literacy edpc605
Digital literacy edpc605Digital literacy edpc605
Digital literacy edpc605
Barbara M. King
 
05. EDT 513 Week 5 2023 Searching the Internet.pptx
05. EDT 513 Week 5 2023 Searching the Internet.pptx05. EDT 513 Week 5 2023 Searching the Internet.pptx
05. EDT 513 Week 5 2023 Searching the Internet.pptx
Gambari Amosa Isiaka
 
Complete research and film pitch
Complete research and film pitchComplete research and film pitch
Complete research and film pitch
WarpedGorilla
 

Similar to Rediscover hidden facebook semantic search engine, reimagine open source intelligence (20)

Digital literacy edpc605
Digital literacy edpc605Digital literacy edpc605
Digital literacy edpc605
 
Knoweldge-Repository-Academic-Searching-Techniques.ppt
Knoweldge-Repository-Academic-Searching-Techniques.pptKnoweldge-Repository-Academic-Searching-Techniques.ppt
Knoweldge-Repository-Academic-Searching-Techniques.ppt
 
Social Work Masters Literature Review: Practical Searching
Social Work Masters Literature Review: Practical SearchingSocial Work Masters Literature Review: Practical Searching
Social Work Masters Literature Review: Practical Searching
 
Searching skills
Searching skillsSearching skills
Searching skills
 
05. EDT 513 Week 5 2023 Searching the Internet.pptx
05. EDT 513 Week 5 2023 Searching the Internet.pptx05. EDT 513 Week 5 2023 Searching the Internet.pptx
05. EDT 513 Week 5 2023 Searching the Internet.pptx
 
Online research and research skills
Online research and research skillsOnline research and research skills
Online research and research skills
 
Complete research and film pitch
Complete research and film pitchComplete research and film pitch
Complete research and film pitch
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Advanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsAdvanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU Investigators
 
Sourcing on Social Media - Jeremy Bonewitz; recruitDC Spring 2018
Sourcing on Social Media - Jeremy Bonewitz; recruitDC Spring 2018Sourcing on Social Media - Jeremy Bonewitz; recruitDC Spring 2018
Sourcing on Social Media - Jeremy Bonewitz; recruitDC Spring 2018
 
The Impact of OpenSocial at UCSF
The Impact of OpenSocial at UCSFThe Impact of OpenSocial at UCSF
The Impact of OpenSocial at UCSF
 
The evolution of Search spscinci
The evolution of Search spscinciThe evolution of Search spscinci
The evolution of Search spscinci
 
Smart Literature Searching by Susanne Noll
Smart Literature Searching by Susanne NollSmart Literature Searching by Susanne Noll
Smart Literature Searching by Susanne Noll
 
IA - information_architecture.pptx
IA - information_architecture.pptxIA - information_architecture.pptx
IA - information_architecture.pptx
 
Nature jobsexpo 26sept2012osborne
Nature jobsexpo 26sept2012osborneNature jobsexpo 26sept2012osborne
Nature jobsexpo 26sept2012osborne
 
Searching the Internet
Searching the InternetSearching the Internet
Searching the Internet
 
Information Discovery and Search Strategies for Evidence-Based Research
Information Discovery and Search Strategies for Evidence-Based ResearchInformation Discovery and Search Strategies for Evidence-Based Research
Information Discovery and Search Strategies for Evidence-Based Research
 
Social networking in the job search
Social networking in the job searchSocial networking in the job search
Social networking in the job search
 
Staff study talk/ on search engine & internet in 2008
Staff study talk/ on search engine & internet in 2008Staff study talk/ on search engine & internet in 2008
Staff study talk/ on search engine & internet in 2008
 
Internet Search and DRM Issues
Internet Search and DRM IssuesInternet Search and DRM Issues
Internet Search and DRM Issues
 

Recently uploaded

audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
lolsDocherty
 
Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptx
ChloeMeadows1
 

Recently uploaded (16)

Thank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirtsThank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirts
 
Statistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfStatistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdf
 
Pvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdf
 
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebiThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
 
Reggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirts
 
How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?
 
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
 
Bug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideBug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's Guide
 
Premier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfPremier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdf
 
Topology of the Network class 8 .ppt pdf
Topology of the Network class 8 .ppt pdfTopology of the Network class 8 .ppt pdf
Topology of the Network class 8 .ppt pdf
 
I’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 ShirtI’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 Shirt
 
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresenceCyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
 
Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptx
 
The Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyThe Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case Study
 
Development Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of appsDevelopment Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of apps
 

Rediscover hidden facebook semantic search engine, reimagine open source intelligence

  • 1. Rediscovering the hidden Facebook semantic search engine and reimagining open-source intelligence unchained Graph Search akos.bardoczi.ch@ieee.org
  • 2. Some important things about OSINT • In most cases, the most efficient technique is not well-known, hidden, but not (too) difficult to use! • The misunderstood deep web: the tale about the size of the deep web is based on a 16-yrs-old research… • The misunderstood deep web #2: it usually cannot provide up-to-date, relevant information
  • 3. About Facebook Graph Search • Announced by FB in March 2013 • FB almost immediately killed the Advanced search box due to privacy concerns – but the search options stay available  • the dying of semantic search – huge machine learning failures, such as too few complex search queries from users • After 4 years, still available only in US English
  • 4. Example • „Budapest University of Technology and Economics students who are Budapest, Hungary residents and like Shakira” (sic!) – this will generate a simple keyword search without relevant results • https://www.facebook.com/search/106146106082559/stu dents/106502519386806/residents/5027904559/likers/int ersect URL will generate a smart, semantic search query
  • 5. The universal scheme of GS’s URI structure • https://facebook.com/(search)/(str)[n]/string_te rm[n]/entity_id[m]/keyword[m]/(intersect) • the „search” [optional] is not mandatory in some cases • the „str” [optional] indicates simple keyword terms and must be placed before other terms • the „intersect” is mandatory in complex queries
  • 6. More about the scheme of GS’s URI • the „entity_id” represents the entity of something, e.g. names, places, religious views, spoken languages – see below – you can find it in the client-side code • There isn’t any limit about query complexity or length • The „keyword” indicates the type of entity – which is important, e.g. a university as a physical location, as a school, or as a workplace
  • 7. Some important things • The queries works only with US English Facebook but: • the URI may contain any characters after the „str” part, e.g. Москва or ‫الدولي‬ ‫دبي‬ ‫مطار‬ • The „word order of the sentence” matters in most cases
  • 8. What can you search with Graph Search? • Basically almost anything! • in theory you can find any content and relations between entities and contents which you can view with your permissions
  • 9. Most frequently used Graph Search keywords …now only without explanation pages-liked, photos, photos-by, photos-liked, photos-of, photos- tagged, photos-commented, videos, videos-by, videos-of, videos- liked, videos-commented, apps-used, stories-by, stories-commented, stories-tagged, friends, events, events-joined, events-interested, places-visited, places-liked, groups, users-named, home-residents, residents (/present, /past), likers, users-age, users-born, users- political-view, visitors, employees (/present, /past), speakers, users- checked-in, photos-in, videos-in, stories-keyword, date (YYYY), date-2 (MM/YYYY), date-3 (DD/MM/YYYY), react, studied (/present, /past)
  • 10. The scope of search in practice • as I mentioned, any content: texts in status updates, comments, image descriptions, images, geotags, likes, and other reacts on public pages, on event pages, and on users’ timelines (even the items hidden from timeline!) • Full contents of open groups, full content of closed and secret groups as a group member • Basically anything except items specifically deleted by the user
  • 11. The scope of search in practice 0x200. • keep in mind the audience selectors – and bypass them  • your scope will exponentially grow with more friends and after joining more goups – note: the avg. distance between two randomly chosen users is 3.5 and users have 300 contacts on avg. but the limit # of friends is 5000 and you can join 5000 different groups
  • 12. The scope of search in practice 0x300. • You will need a professionally molded, realistic character for a Facebook user depending on your research interest • A professionally molded character [actor] is not a simple fake profile – and I think this is the most difficult part – see also OPSEC
  • 13. OPSEC considerations @ sophisticated research • In practice, you cannot make fully-virgin searches – e.g. the order of results depends on everything, the previous searches as well • don’t try to use widely used anonimizer techniques, for example TOR – the FB will know it! • the best practices are similar to the best practices in forensics lab and in HUMINT
  • 14. OPSEC considerations @ sophisticated research 0x200. • You will need a spare, non-virtual SIM card never used before • Depending on the sensitivity of research, you may need a photoshopped goverment-issued ID – don’t worry, nowadays researchers can generate realistic faces [difficult && not my business ] • the FB reserves the right of account deactivation, temporary suspension; let’s minimize this risk
  • 15. OPSEC considerations @ sophisticated research 0x300. • It is recommended to use a virtual machine with default browser settings – see also: browser fingerprint • Once again – do not use TOR! – instead use a reliable VPN provider, and keep in mind that your IP address is associated with an approx. location that affects the order of your search results that you receive
  • 16. • the Facebook traces user behaviour – e.g. statistical information about keystrokes speed – including what you deleted from a text field - and the distribution of different operations, in short, your every click • Of course, never mix your actor’s behavior and your own – e.g. don’t send a friend request to someone you know personally OPSEC considerations @ sophisticated research 0x400.
  • 17. Tailor your actor’s character and behavior for the concrete research field • more complicated than you think • an ideal actor is similar to a secret agent, who is familiar with language, culture, language-culture (!!) in different cases – e.g. counterterrorism, social psychology researches, or cyber-threat intelligence context • in some cases you simply don’t need an actor, you can search via your own account