Early Recognition of Encrypted
Applications
Laurent Bernaille
with Renata Teixeira
Laboratoire LIP6 – CNRS
Université Pierre et Marie Curie – Paris 6
1
Can we find the application inside
an SSL connection?
InternetAlice Bob
Network administrator: profiling, QoS, policies
Alice and Bob: privacy issue
SSL Connection
Network administrator
2
Possible identification methods
for unencrypted traffic
• Port-based classification
– Map standard IANA ports to applications (Ex: 443/HTTPS)
– Unfortunately, this method is inaccurate
• Content-based approaches
– Search for signatures that identify the application
– Unfortunately, not possible with encrypted traffic
• Behavior-based methods
– Model applications with connection statistics
– Promising for encrypted traffic (not using port or content)
3
Early Application Identification
Offline Training Online Classifier
Sizes of the
first packets
Training traces
Behavior identification
Application
Models Labeling
application
• Identify applications using the sizes of the first
application packets in a TCP connection
Can this work with encrypted connections?
4
Can we find the sizes of the first
application packets in SSL?
• SSL mechanisms
– SSL connections begin with a handshake
– After handshake
• SSL payload = encrypted application packet
• Challenges for Early Application Identification
– Can we identify application packets?
– Can we infer the unencrypted sizes of these packets?
5
SSLv2 handshake
• SSLv2 negotiation
– 4 or 6 packets
– Identification through inspection of SSL headers
6
SSLv3 handshake
First application packet in P6 trace
• SSLv3 Negotiation
– Variable number of packets (implementation)
– Identification through inspection of SSL headers
7
Influence of ciphers on packet size
8
Applicability of Early Application
Identification to SSL traffic
• We can identify the first application packet
– Trough the analysis of SSL headers
– Need to start inspection at the third packet
• We can infer unencrypted sizes
• Proposed method
1. Identify SSL using the sizes of first 3 packets
2. For SSL traffic, find the packet with application data
3. Identify the application in SSL using the inferred sizes
of the first application packets
9
Classification mechanism
Sizes of first 3 packets
in the connection
Early Application
identification
Application
SSL? Application
No
Early Application
identification
Encrypted Application
Inferred sizes of
Packets N, N+1, N+2
Detection of packet
with encrypted payload: N
Yes
Packets 4, 5, …N
10
Detecting SSL connections
Sizes of first 3 packets
in the connection
Early Application
identification
Application
SSL? Application
No
Early Application
identification
Encrypted Application
Inferred sizes of
Packets N, N+1, N+2
Detection of packet
with encrypted payload: N
Yes
Packets 4, 5, …N
11
Detecting SSL connections:
Evaluation methodology
• Data sets: packet traces
– UMass campus, Paris 6 network
• Ground truth
– Non-SSL traffic: content-based classification
– SSL traffic: identification based on analysis of SSL
headers
• Parameters for Early Application Identification
– Using the payload size of the first 3 packets
– Training set: 5500 connections from 11 applications
12
Accuracy of SSL detection
• Test set
– 50k connections from Paris 6 network
– more than 2000 connections for each application
• Results
– SSL traffic: > 85% labeled SSL
– Other applications: accuracy >95%
13
Identification of encrypted
applicationsSizes of first 3 packets
in the connection
Early Application
identification
Application
SSL? Application
No
Early Application
identification
Encrypted Application
Inferred sizes of
Packets N, N+1, N+2
Detection of packet
with encrypted payload: N
Yes
Packets 4, 5, …N
14
Method to find ground truth for
encrypted traffic
• From packet traces collected at Paris 6
– Filtered traffic to well-known HTTPS and
POP3S servers (IP addresses and ports)
• Manual encryption of traffic
– Replay connections over an SSL tunnel
– Applications: bittorent, edonkey, FTP
15
Accuracy of identification of
applications in SSL connections
• Paris 6 traces
98.5%POP3
99.9%HTTP
Accuracy
• Manually Encrypted Traffic
96.5%Edonkey
86.5%Bittorent
92.5%FTP
Accuracy
16
Conclusion and Perspectives
• We can identify the application encrypted with SSL
– Using only the sizes of the first packets
– With a high accuracy
• Future work: IPsec and SSH
– Challenge: Finding the start of TCP connections
• Implementation
– Available at http://rp.lip6.fr/~bernaill/earlyclassif.html
17
Description of SSL Traffic
52%48%0.0%1.2%1700kUMass
46.6%53.2%0.2%8.6%1000kP6 2006
18.4%81%0.6%4.6%500kP6 2004
TLSSSLv3.0SSLv2SSLConnectionsTrace
1.5%5.0%UMass
4.2%1.1%P6 2006
1.1%1.9%P6 2004
SSL on non-SSL portSSL port not SSLTrace
18
Ciphers
7.0%9.7%RC4_128_SHA
<1%<2%Other
24.0%6.9%AES
68.1%81.7%RC4_xx_MD5
Proportion (2006)Proportion (2004)Cipher

Early recognition of encryted applications

  • 1.
    Early Recognition ofEncrypted Applications Laurent Bernaille with Renata Teixeira Laboratoire LIP6 – CNRS Université Pierre et Marie Curie – Paris 6
  • 2.
    1 Can we findthe application inside an SSL connection? InternetAlice Bob Network administrator: profiling, QoS, policies Alice and Bob: privacy issue SSL Connection Network administrator
  • 3.
    2 Possible identification methods forunencrypted traffic • Port-based classification – Map standard IANA ports to applications (Ex: 443/HTTPS) – Unfortunately, this method is inaccurate • Content-based approaches – Search for signatures that identify the application – Unfortunately, not possible with encrypted traffic • Behavior-based methods – Model applications with connection statistics – Promising for encrypted traffic (not using port or content)
  • 4.
    3 Early Application Identification OfflineTraining Online Classifier Sizes of the first packets Training traces Behavior identification Application Models Labeling application • Identify applications using the sizes of the first application packets in a TCP connection Can this work with encrypted connections?
  • 5.
    4 Can we findthe sizes of the first application packets in SSL? • SSL mechanisms – SSL connections begin with a handshake – After handshake • SSL payload = encrypted application packet • Challenges for Early Application Identification – Can we identify application packets? – Can we infer the unencrypted sizes of these packets?
  • 6.
    5 SSLv2 handshake • SSLv2negotiation – 4 or 6 packets – Identification through inspection of SSL headers
  • 7.
    6 SSLv3 handshake First applicationpacket in P6 trace • SSLv3 Negotiation – Variable number of packets (implementation) – Identification through inspection of SSL headers
  • 8.
    7 Influence of cipherson packet size
  • 9.
    8 Applicability of EarlyApplication Identification to SSL traffic • We can identify the first application packet – Trough the analysis of SSL headers – Need to start inspection at the third packet • We can infer unencrypted sizes • Proposed method 1. Identify SSL using the sizes of first 3 packets 2. For SSL traffic, find the packet with application data 3. Identify the application in SSL using the inferred sizes of the first application packets
  • 10.
    9 Classification mechanism Sizes offirst 3 packets in the connection Early Application identification Application SSL? Application No Early Application identification Encrypted Application Inferred sizes of Packets N, N+1, N+2 Detection of packet with encrypted payload: N Yes Packets 4, 5, …N
  • 11.
    10 Detecting SSL connections Sizesof first 3 packets in the connection Early Application identification Application SSL? Application No Early Application identification Encrypted Application Inferred sizes of Packets N, N+1, N+2 Detection of packet with encrypted payload: N Yes Packets 4, 5, …N
  • 12.
    11 Detecting SSL connections: Evaluationmethodology • Data sets: packet traces – UMass campus, Paris 6 network • Ground truth – Non-SSL traffic: content-based classification – SSL traffic: identification based on analysis of SSL headers • Parameters for Early Application Identification – Using the payload size of the first 3 packets – Training set: 5500 connections from 11 applications
  • 13.
    12 Accuracy of SSLdetection • Test set – 50k connections from Paris 6 network – more than 2000 connections for each application • Results – SSL traffic: > 85% labeled SSL – Other applications: accuracy >95%
  • 14.
    13 Identification of encrypted applicationsSizesof first 3 packets in the connection Early Application identification Application SSL? Application No Early Application identification Encrypted Application Inferred sizes of Packets N, N+1, N+2 Detection of packet with encrypted payload: N Yes Packets 4, 5, …N
  • 15.
    14 Method to findground truth for encrypted traffic • From packet traces collected at Paris 6 – Filtered traffic to well-known HTTPS and POP3S servers (IP addresses and ports) • Manual encryption of traffic – Replay connections over an SSL tunnel – Applications: bittorent, edonkey, FTP
  • 16.
    15 Accuracy of identificationof applications in SSL connections • Paris 6 traces 98.5%POP3 99.9%HTTP Accuracy • Manually Encrypted Traffic 96.5%Edonkey 86.5%Bittorent 92.5%FTP Accuracy
  • 17.
    16 Conclusion and Perspectives •We can identify the application encrypted with SSL – Using only the sizes of the first packets – With a high accuracy • Future work: IPsec and SSH – Challenge: Finding the start of TCP connections • Implementation – Available at http://rp.lip6.fr/~bernaill/earlyclassif.html
  • 18.
    17 Description of SSLTraffic 52%48%0.0%1.2%1700kUMass 46.6%53.2%0.2%8.6%1000kP6 2006 18.4%81%0.6%4.6%500kP6 2004 TLSSSLv3.0SSLv2SSLConnectionsTrace 1.5%5.0%UMass 4.2%1.1%P6 2006 1.1%1.9%P6 2004 SSL on non-SSL portSSL port not SSLTrace
  • 19.