SlideShare a Scribd company logo
1 of 30
Download to read offline
I Know What You Saw Last Minute -
The Chrome Browser Case
Ran Dubin1, Amit Dvir2 , Ofir Pele2,3 , Ofer Hadar1
1. Department of Communication System Engineering, Ben-Gurion University of the Negev, Israel.
2. Center for Cyber Technologies, Department of Computer Science, Ariel University, Israel.
3. Department of Electrical and Electronics Engineering, Ariel University, Israel.
About Me
• Ph.D. candidate at Ben-Gurion University, Israel
• Optimization of HTTP adaptive streaming
• Encrypted network traffic classification problems
• Senior data scientist at Seculert
• Seculert develop an automated breach analytics
platform in the cloud.
• Supervised machine learning for detection of
malicious activity within the enterprise network
Agenda
• Motivation
• The scenario
• Our goal
• How can “I know what you saw”?
• Related works
• Proposed algorithm
• Results
Motivation
• Google encourages network privacy:
• “77 percent of Google online traffic is encrypted"1
• “Google started giving HTTPS pages a ranking boost”
• HTTPS keeps your data anonymous:
• “No one will be able to snoop on the traffic — such as your ISP"2
• Let’s try to break it!
[1] http://gadgets.ndtv.com/internet/news/google-reveals-77-percent-of-its-online-traffic-is-encrypted-814191
[2] http://www.makeuseof.com/tag/can-you-really-be-anonymous-online/
The Scenario
•Passive Sniffing:
• Traffic control and optimization
• Open Source Intelligence Techniques (OSINT) vecto𝑟𝑟3
• Web searches, visited sites ..
•YouTube is the world’s leading social
network video platform
• YouTube is used also large protests and propaganda!
• Protecting user privacy and viewing habits is important!
[3] ISPs Sell Your Data to Advertisers, But FCC has a Plan to Protect Privacy, http://thehackernews.com/2016/03/isp-sells-data-to-advertisers.html
Our Goal
•To show that HTTPS2.0 is not enough in
order to protect your viewing habits
• Contribution:
• Dataset
• Data crawler - based on selenium
• New encrypted traffic feature and classification algorithm
Brief Partial Overview of SSL/TLS
• Step (0): browse to:
https://www.youtube.com/watch?v=_b
P6aVG6L1w
• Step (1): use Service Name Indicator
• Step (5): content and header are fully
encrypted
• HTTPS request (URL) is not visible
in the encrypted traffic
• All HTTP headers are encrypted
How Can “I Know What You Saw”?
1. How are YouTube videos encoded?
2. How is the video downloaded?
3. What is the video download behavior in the network?
4. How to tie everything together for a classification?
Introduction To HTTP Adaptive Streaming (HAS)
HAS Diagram HAS Download Diagram
Multi variable bit-rate streaming
* How YouTube Works: https://www.youtube.com/watch?v=UklDSMG9ffU
YouTube Encrypted Network Traffic
Chrome Automatic Mode Chrome Fixed Mode
YouTube Flow Patterns –
The Web Proxy Perspective
• Mixture of audio/video
in a single flow
• HTTP2 - multiplexed
application layer
protocol
• Multi-Bit-Rate Video
EncodingFirst 10 seconds of downloading only video
First 10 seconds of downloading audio + video
YouTube HTTP Byte Range
Fiddler (Video) Stream Request Vs Byte RangeByteRange[Bps]
Request Index
Related Works
1. Most discuss application type classification and not content classification
2. HTTPS classification was found to achieve low accuracy
3. Wright et al. exploit the VBR codec characteristics of encrypted Voice Over
Internet Protocol (VOIP) for language identification
4. Liu et al. and Saponas et al. presented methods for video title classification of
RTP/UDP and TCP internet traffic (not MBR)
5. Changes in video traffic over the Internet:
• HTTP byte range selection over HTTP
• MBR adaptive streaming
• HTTP version 2
Proposed Machine Learning Solution
1. Traffic Analysis
2. Traffic Features
3. Traffic Preprocessing
4. Machine Learning Algorithms
Feature Extraction
1. Many features:
Number of packets in a session, payload size, information bit rate,
Round-Trip Time (RTT), packet time differences
2. Bit Per Peak (BPP): Sum of bytes in each peak after TCP ACK mechanism
3. Why BPP?
• Represent the traffic On/Off behavior
• Real time classification constraints
• Compact feature representation
• Robust to packet loss and delays
BPP Index Vs Download Copy
BPP Index
#Download
Pre-Processing
• With/without audio removal
• <400 Kbytes BPPs are considered as audio
Proposed algorithms
1. Support Vector Machines (SVM) with
Radial Basis Function (RBF)
• With a BPP feature vector
2. Nearest Neighbor Algorithm – NN
• With a set of BPP features
SVM with Radial Basis Function (RBF) Kernel
• SVM RBF maps data to high
dimensional space. The classifier:
• Ongoing work uses SVM with
intersection similarities as features
BPP Set Feature
• 𝑆𝑆𝑖𝑖𝑖𝑖 is a set of Bit-Per-Peak (BPP) features (no duplicates)
• i - video title index
• j - stream index
• Note that each BPP-set may have different cardinality
NN Algorithm
• Similarity score between two BPP-sets is the cardinality of the intersection set:
sim(𝑆𝑆, 𝑆𝑆′) = |𝑆𝑆 ∩ 𝑆𝑆′|
• At test time, each video stream BPP-set, 𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡, is classified as the video title 𝑖𝑖
(class) that matches the maximum similarity score to class index. 𝑚𝑚𝑖𝑖 is the
number of streams per title 𝑖𝑖:
Dataset
Train/ test: 30 different titles, each with 100 streams copies (Train- 90, Test -10)
Videos outside of the dataset: 200 additional different video titles (titles not in the
regular dataset used only in testing)
Added delay evaluation: 4 subsets with added delay of 100/300/600/900 ms. (10
titles with 10 different downloads)
Added packet loss evaluation: 4 subsets with added packet loss of 1/3/6/9 % (10
titles with 10 different downloads)
Classification Accuracy
Identification[%]
Training Dataset Size
SVM+RBF
Ours
Confusion Matrices
SVM+RBF (accuracy 72%) Ours (accuracy 98%)
TrueLabel
TrueLabel
Predicted Label Predicted Label
Classification of Unknown Videos:
100% accuracy
Ongoing Results
Accuracy: 93.6%
Predicted Label
TrueLabel
Conclusions
• Created an OSINT vector from YouTube video traffic
• We demonstrated that HTTP2.0 is not protecting your viewing
habits.
• NN algorithm - 98% accuracy
• BPP feature - is robust to high network delays and packet loss
• Ongoing research – 10000 streams of 100 titles, similar results
• Contribution – crawler, dataset and algorithms
Different Network Conditions
SVM+RBF
Ours
SVM+RBF
Ours
Identification[%]
Identification[%]
Additional Packet Loss [%] Additional Delay [ms]

More Related Content

Viewers also liked

Viewers also liked (13)

Fork Lift Cert
Fork Lift CertFork Lift Cert
Fork Lift Cert
 
LAS WEBQUEST
LAS WEBQUESTLAS WEBQUEST
LAS WEBQUEST
 
Vgu vkghu ig
Vgu vkghu igVgu vkghu ig
Vgu vkghu ig
 
Mohammed Fiazuddin - Certificate
Mohammed Fiazuddin - CertificateMohammed Fiazuddin - Certificate
Mohammed Fiazuddin - Certificate
 
Megae Payment
Megae PaymentMegae Payment
Megae Payment
 
LejeA0002
LejeA0002LejeA0002
LejeA0002
 
Didáctica general una página del programa
Didáctica general una página del programaDidáctica general una página del programa
Didáctica general una página del programa
 
Tasks suitable for programming on the web
Tasks suitable for programming on the webTasks suitable for programming on the web
Tasks suitable for programming on the web
 
2016 Auction
2016 Auction2016 Auction
2016 Auction
 
Using flashcards
Using flashcardsUsing flashcards
Using flashcards
 
Moodle.
Moodle.Moodle.
Moodle.
 
WWII 20 Wall Family Papers_Finding Aid
WWII 20 Wall Family Papers_Finding AidWWII 20 Wall Family Papers_Finding Aid
WWII 20 Wall Family Papers_Finding Aid
 
Sitios turistico de norte america
Sitios turistico de norte americaSitios turistico de norte america
Sitios turistico de norte america
 

Recently uploaded

Introduction to Artificial Intelligence and History of AI
Introduction to Artificial Intelligence and History of AIIntroduction to Artificial Intelligence and History of AI
Introduction to Artificial Intelligence and History of AISheetal Jain
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxMustafa Ahmed
 
Introduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of ArduinoIntroduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of ArduinoAbhimanyu Sangale
 
Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)NareenAsad
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxMustafa Ahmed
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.MdManikurRahman
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxCHAIRMAN M
 
The Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxThe Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxMANASINANDKISHORDEOR
 
Multivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxMultivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxalijaker017
 
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...Roi Lipman
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxKarpagam Institute of Teechnology
 
handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailingAshishSingh1301
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docxrahulmanepalli02
 
Electrical shop management system project report.pdf
Electrical shop management system project report.pdfElectrical shop management system project report.pdf
Electrical shop management system project report.pdfKamal Acharya
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1T.D. Shashikala
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsMathias Magdowski
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdfAlexander Litvinenko
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfEr.Sonali Nasikkar
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxMustafa Ahmed
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksIJECEIAES
 

Recently uploaded (20)

Introduction to Artificial Intelligence and History of AI
Introduction to Artificial Intelligence and History of AIIntroduction to Artificial Intelligence and History of AI
Introduction to Artificial Intelligence and History of AI
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptx
 
Introduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of ArduinoIntroduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of Arduino
 
Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptx
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
 
The Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxThe Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptx
 
Multivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxMultivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptx
 
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptx
 
handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailing
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 
Electrical shop management system project report.pdf
Electrical shop management system project report.pdfElectrical shop management system project report.pdf
Electrical shop management system project report.pdf
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility Applications
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptx
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networks
 

Eu 16-dubin-i-know-what-you-saw-last-minute-the-chrome-browser-case 5

  • 1. I Know What You Saw Last Minute - The Chrome Browser Case Ran Dubin1, Amit Dvir2 , Ofir Pele2,3 , Ofer Hadar1 1. Department of Communication System Engineering, Ben-Gurion University of the Negev, Israel. 2. Center for Cyber Technologies, Department of Computer Science, Ariel University, Israel. 3. Department of Electrical and Electronics Engineering, Ariel University, Israel.
  • 2. About Me • Ph.D. candidate at Ben-Gurion University, Israel • Optimization of HTTP adaptive streaming • Encrypted network traffic classification problems • Senior data scientist at Seculert • Seculert develop an automated breach analytics platform in the cloud. • Supervised machine learning for detection of malicious activity within the enterprise network
  • 3. Agenda • Motivation • The scenario • Our goal • How can “I know what you saw”? • Related works • Proposed algorithm • Results
  • 4. Motivation • Google encourages network privacy: • “77 percent of Google online traffic is encrypted"1 • “Google started giving HTTPS pages a ranking boost” • HTTPS keeps your data anonymous: • “No one will be able to snoop on the traffic — such as your ISP"2 • Let’s try to break it! [1] http://gadgets.ndtv.com/internet/news/google-reveals-77-percent-of-its-online-traffic-is-encrypted-814191 [2] http://www.makeuseof.com/tag/can-you-really-be-anonymous-online/
  • 5. The Scenario •Passive Sniffing: • Traffic control and optimization • Open Source Intelligence Techniques (OSINT) vecto𝑟𝑟3 • Web searches, visited sites .. •YouTube is the world’s leading social network video platform • YouTube is used also large protests and propaganda! • Protecting user privacy and viewing habits is important! [3] ISPs Sell Your Data to Advertisers, But FCC has a Plan to Protect Privacy, http://thehackernews.com/2016/03/isp-sells-data-to-advertisers.html
  • 6. Our Goal •To show that HTTPS2.0 is not enough in order to protect your viewing habits • Contribution: • Dataset • Data crawler - based on selenium • New encrypted traffic feature and classification algorithm
  • 7. Brief Partial Overview of SSL/TLS • Step (0): browse to: https://www.youtube.com/watch?v=_b P6aVG6L1w • Step (1): use Service Name Indicator • Step (5): content and header are fully encrypted • HTTPS request (URL) is not visible in the encrypted traffic • All HTTP headers are encrypted
  • 8. How Can “I Know What You Saw”? 1. How are YouTube videos encoded? 2. How is the video downloaded? 3. What is the video download behavior in the network? 4. How to tie everything together for a classification?
  • 9. Introduction To HTTP Adaptive Streaming (HAS) HAS Diagram HAS Download Diagram Multi variable bit-rate streaming * How YouTube Works: https://www.youtube.com/watch?v=UklDSMG9ffU
  • 10. YouTube Encrypted Network Traffic Chrome Automatic Mode Chrome Fixed Mode
  • 11. YouTube Flow Patterns – The Web Proxy Perspective • Mixture of audio/video in a single flow • HTTP2 - multiplexed application layer protocol • Multi-Bit-Rate Video EncodingFirst 10 seconds of downloading only video First 10 seconds of downloading audio + video
  • 12. YouTube HTTP Byte Range Fiddler (Video) Stream Request Vs Byte RangeByteRange[Bps] Request Index
  • 13. Related Works 1. Most discuss application type classification and not content classification 2. HTTPS classification was found to achieve low accuracy 3. Wright et al. exploit the VBR codec characteristics of encrypted Voice Over Internet Protocol (VOIP) for language identification 4. Liu et al. and Saponas et al. presented methods for video title classification of RTP/UDP and TCP internet traffic (not MBR) 5. Changes in video traffic over the Internet: • HTTP byte range selection over HTTP • MBR adaptive streaming • HTTP version 2
  • 14. Proposed Machine Learning Solution 1. Traffic Analysis 2. Traffic Features 3. Traffic Preprocessing 4. Machine Learning Algorithms
  • 15. Feature Extraction 1. Many features: Number of packets in a session, payload size, information bit rate, Round-Trip Time (RTT), packet time differences 2. Bit Per Peak (BPP): Sum of bytes in each peak after TCP ACK mechanism 3. Why BPP? • Represent the traffic On/Off behavior • Real time classification constraints • Compact feature representation • Robust to packet loss and delays
  • 16. BPP Index Vs Download Copy BPP Index #Download
  • 17. Pre-Processing • With/without audio removal • <400 Kbytes BPPs are considered as audio
  • 18. Proposed algorithms 1. Support Vector Machines (SVM) with Radial Basis Function (RBF) • With a BPP feature vector 2. Nearest Neighbor Algorithm – NN • With a set of BPP features
  • 19. SVM with Radial Basis Function (RBF) Kernel • SVM RBF maps data to high dimensional space. The classifier: • Ongoing work uses SVM with intersection similarities as features
  • 20. BPP Set Feature • 𝑆𝑆𝑖𝑖𝑖𝑖 is a set of Bit-Per-Peak (BPP) features (no duplicates) • i - video title index • j - stream index • Note that each BPP-set may have different cardinality
  • 21. NN Algorithm • Similarity score between two BPP-sets is the cardinality of the intersection set: sim(𝑆𝑆, 𝑆𝑆′) = |𝑆𝑆 ∩ 𝑆𝑆′| • At test time, each video stream BPP-set, 𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡, is classified as the video title 𝑖𝑖 (class) that matches the maximum similarity score to class index. 𝑚𝑚𝑖𝑖 is the number of streams per title 𝑖𝑖:
  • 22. Dataset Train/ test: 30 different titles, each with 100 streams copies (Train- 90, Test -10) Videos outside of the dataset: 200 additional different video titles (titles not in the regular dataset used only in testing) Added delay evaluation: 4 subsets with added delay of 100/300/600/900 ms. (10 titles with 10 different downloads) Added packet loss evaluation: 4 subsets with added packet loss of 1/3/6/9 % (10 titles with 10 different downloads)
  • 24. Confusion Matrices SVM+RBF (accuracy 72%) Ours (accuracy 98%) TrueLabel TrueLabel Predicted Label Predicted Label
  • 25. Classification of Unknown Videos: 100% accuracy
  • 27. Conclusions • Created an OSINT vector from YouTube video traffic • We demonstrated that HTTP2.0 is not protecting your viewing habits. • NN algorithm - 98% accuracy • BPP feature - is robust to high network delays and packet loss • Ongoing research – 10000 streams of 100 titles, similar results • Contribution – crawler, dataset and algorithms
  • 28.
  • 29.