The document discusses whitelisting and blacklisting approaches for security. It summarizes 5 research papers on these topics:
1. Paper 1 proposes using a whitelist to improve SYN cookie defenses against SYN flooding attacks. The whitelist contains trusted client IP addresses that can still connect during an attack. This allows more legitimate connections to succeed.
2. Paper 2 discusses log-based recovery for executing untrusted programs. It proposes isolating untrusted programs, monitoring their behavior with logs, and rolling back changes if issues are detected.
3. Paper 3 introduces the concept of a "highly predictive blacklist" that aims to pre-emptively blacklist IP addresses likely to launch attacks based on historical patterns. It ranks addresses by relevance
2. References
1. Tae-Hyung Kim,Young-Sik Choi,Jong Kim, Sung Je Hong, “Annulling
SYN Flooding Attacks with Whitelist”, 22nd International Conference on
Advanced Information Networking and Applications - Workshops, 2008.
(AINAW 2008). 25-28 March 2008 Page(s):371 – 376.
2. He, Peizhou; Wen, Xiangming; Zheng, Wei; “A Novel Method for Filtering
Group Sending Short Message Spam”,International Conference on
Convergence and Hybrid Information Technology, 2008. (ICHIT '08). 28-
30 Aug. 2008 Page(s):60 – 65.
3. Hui-Jun Lu; Shu-Zhen Leng; “Log-Based Recovery Scheme for
Executing Untrusted Programs”, Machine Learning and Cybernetics,
2007 International Conference on Volume 4, 19-22 Aug. 2007 Page(s):
2136 – 2139.
4. Phua, C.; Gayler, R.; Smith-Miles, K.; Lee, V.; “Communal Detection of
Implicit Personal Identity Streams”,Data Mining Workshops, 2006. ICDM
Workshops 2006. Sixth IEEE International Conference on Dec. 2006
Page(s):620 - 625
5. Jian Zhang, Phillip Porras, and Johannes Ullrich, “Highly Predictive
Blacklisting”, Usenix Security, August 2008.
3. Whitelists Blacklists
• Whitelist contains sources and software
that is deemed to be acceptable.
• Blacklist contains sources and software
that is harmful.
4. Whitelist Applications
• IP address classification
• SPAM reduction: approved sender list
• SMS
• Software execution
5. Review of [1]
Tae-Hyung Kim,Young-Sik Choi,Jong Kim, Sung Je Hong,
“Annulling SYN Flooding Attacks with Whitelist”,
22nd International Conference on Advanced Information
Networking and Applications - Workshops, 2008. (AINAW 2008).
25-28 March 2008 Page(s):371 – 376.
6. SYN Flooding attack
• Attacker sends many SYN (synchronize) requests to a
target system.
• Exploits the 3-way handshake used to set up a TCP
connection.
• Results in Denial of Service.
Client Server
SYN
SYN-ACK
ACK
TCP three – way handshake.
What if ACK never issued.
• Malicious client.
• Spoofed source IP address.
Incomplete connection .
• Waiting for network delayed ACK.
Large queues at the server.
Legit clients cannot connect.
[1]
7. Potential Defenses
• Bigger buffer queues: postpones the inevitable.
• SYN Cache
– Typically different buffers for each port
– SYN Cache uses one buffer for several ports
– Fails for aggressive SYN flooding attack
• Random Drop
– Randomly substitute an element in the buffer with a
new request
– Increases probability of successful connection
– Disrupts pending connections
[1]
8. Possible Defense - 2
• SYN Cookies
– Stores the source IP and port number in packet sent
back to the client
– ACK must contain the information
– No buffer needed at server
– In normal operations a backlog buffer queue is
maintained. When buffer is full then Cookies are
activated.
– If ACK info network delayed, then connection info is
lost.
• Preferred solution – how to improve it!
[1]
9. SYN handshakes
• Under SYN flooding attack
– SYN is lost, then it is
retransmitted
– What if ACK lost, server
cannot retransmit if SYN
Cookies is used.
• PSH/ACK has data
(a) Server can extract ACK from
PSH/ACK
(b) Cannot respond to packet loss
[1]
10. Features of Defense
• Service Continuity
– Service should not be disrupted during SYN
flooding attack
• Service Separation
– Legitimate connections from unknown
connection request
• Service differentiation
– Robust connection to legitimate connections
[1]
11. Whitelisting Defense
• Whitelist maintains IP addresses of trusted
clients.
• These IPs can make a successful
connection in spite of SYN flooding
attacks.
• Facilitate searches by using a hash
function.
[1]
12. Proposed Approach
• Normal state
– Conventional approach - use backlog queue
buffer.
• Attack response state
– Detecting attack state
• Buffer has many half connections
– Separate requests into legitimate (WL
consistent) and unknown.
• Legitimate use backlog queue
• Unknown handled with SYN Cookie
[1]
13. Managing Whitelists
• Initialization
– Sys admin collects trusted clients using logs for
services like SMTP and SSH. May include trusted
subnets.
• Additions
– Trusted clients based on policy
– Successful connection under SYN Flooding attack -
Completed SYN Cookie connection
• Removals
– IP has too many half open connections
[1]
14. Experimental Results
• Connection success % increases from 64%
(SYN cookie) to 90% (WL approach)
– Under attack client to server ACK and other
messages are lost.
• Fatal for SYN cookie – no recovery
• WL approach – retransmission possible
• WL approach requires less time for connection
establishment
• Backlog Queue usage is lower for WL approach
[1]
15. Hui-Jun Lu; Shu-Zhen Leng; “Log-Based
Recovery Scheme for Executing
Untrusted Programs”.
Machine Learning and Cybernetics, 2007 International
Conference on Volume 4, 19-22 Aug. 2007 Page(s):
2136 – 2139
16. Programs
• Whitelist (trusted programs)
• Blacklist
• Uncertified
– All programs cannot be white or black listed
– Safe execution of uncertified (untrusted)
programs is often required
[3]
17. Uncertified Program
• Detection
– Virus scanning
– Signature verification
• Protection
– Confine execution to sandbox or isolated environment
– More realistic the environment, the higher the penalty
• Recovery
– Should not interfere with the program execution
– Monitoring and recording to return to known good
state
[3]
18. Detection and Verification
• Virus checking: run anti-virus software
– Only detect known virus
• Digital signature and hash function
– Access a remote trusted site
• For new software
– Safe policy that guarantees safe behavior
[3]
19. Prevention and Isolation
• Untrusted programs can access limited
resources
– Predetermined security policy
• Realistic environment requires replication
of entire file system
• Virtual machines can isolate the untrusted
program
[3]
20. Log Based Recovery
• Checkpoints
– Save the state at regular time intervals
– In case of “fault” rollback to a checkpoint
• Logs are maintained
– Rollback as close to the event
• Effective recovery improves dependability
– Does not avoid failure (fault)
– Sort of a power UNDO
[3]
21. System Integrity
• Ensure file system integrity
• Other operations
• Untrusted systems operations that lead to
state change should be prevented
• Log based recovery
– Monitors the process
– No change to program or context
– Backs up the file modification
[3]
22. Approach
• Check if the program is in whitelist or
blacklist
– Label other programs as suspicious
• Log and back up system
– Roll back to the check point
[3]
23. System Requirements
• Application transparency
– No changes to the untrusted program or its context
– No restrictions on the file system access
• Easy recovery
– Rollback to an initial state
– Restore the file system
• Ease of use
– System provides summary
– Detect a failure
[3]
26. Network Address Blacklist
• Addresses that are undesirable
– Previous illicit activity
• Members of the volunteer DShield org
identify potential blacklist entries
• Blacklist
– Global Worst Offender List (GWOL)
• Broad based contributions
– Local Worst Offender List (LWOL)
• Historical patterns for the local networks
[5]
27. Global/Local Worst Offender List
• GWOL
– Prolific attack sources
– Too many – firewall may not be able to handle this list
– Miss targeted attacks
• Low global profile
• Maybe more dangerous
• LWOL
– Local behavior and defensive reaction
– Not useful for broader dissemination
• Offender must cross a threshold of attacks
[5]
28. High Quality Black List
Requirements
• Need to ready for insertion in firewalls early –
before an attack
– Lists should be updated in timely fashion
– High accuracy
– Typically number of attacks must pass a threshold
before list insertion
• Problems
– Contributors from a small part of the internet
– Directed attacks may not have enough global visibility
[5]
29. Highly Predictive Blacklist
• Pre-filter to remove unreliable alerts
• Relevance – based attack source ranking
• Severity analysis: modulate the analysis
to reflect the malware propagation
patterns
• Leads to individualized lists
[5]
31. Prefiltering
• Reduce errors (noise) in the data set
– Data may include log entries from non-hostile (benign
causes) activity
• Prefiltering involves
– Remove logs regarding unassigned or invalid IPs e.g.
192.168.x.x or 10.x.x.x
– Apply a white list of known addresses of web
crawlers, measurement service, common software
update sources
– Logs from source ports TCP 53 (DNS), 80 (HTTP), 25
(SMTP), 443 (secure web) and destination ports TCP
53 and 25.
[5]
32. Relevance Ranking
• Helps to specialize the blacklist to a specific
consumer
• Assess the closeness of the attacker to the
consumer: a measure of the likelihood of the
attacker targeting this consumer
• Does not assess the severity of the attack(er)
• Pairs of consumers share several attackers, i.e.
consumers have experience of attacks from a
common source IP
– This is not random, but a long term phenomenon
[5]
33. Relevance Ranking - 2
• Intuitive
underpinnings of
relevance.
• Relevance wrt v1
– s5 is more than s6.
– s5 is more than s7.
– s4 is more than s5, s6,
and s7
1 2
3
4 5
2
1
1
1
1
Correlation Graph [5]
34. Relevance Ranking - 3
• mi = # of attackers for vi
• mj = # of attackers for vj
• mij = # of attackers for vij
• Wij = strength of connection
between vi and vj = mi / mij
1 2
3
4 5
2
1
1
1
1
[5]
35. Relevance Ranking - 4
• Source relevance to victim
rs = W bs
• Calculation based only on observations
• Sample is very small fraction of the
internet
• Need to add “look ahead” capability
[5]
36. Relevance Ranking - 5
• Attack (star) on 2.
• How to assess the
relevance of this attack
to 1?
• Traverse the relevance
paths; assess link
weights
= 0.5*0.2+0.3*0.2 = 0.16
• Relevance propogation
[5]
37. Relevance Ranking - 6
• Which attack is more relevant to 1?
• Propagate the relevance
• More propagation possibilities the
completely connected sub-graph – more
paths [5]
38. • Relevance vector W bs
• After one more hop W W bs
• Total Relevance value = W bs + W2 bs
• Eventually Relevance vector will be
Σ∞
i = 1 (αW)I bs
• Similarity to Page Rank
Relevance Ranking - 7
[5]
39. Attack Severity
• Model based on 3 components
– Malicious behavior, number of IPs targeted, geographic metric
• Model of malicious behavior
– Identify typical scan-and-infect software
– Conduct IP sweep of small sets of ports
– Let MP be the set of malware associated ports
[5]
40. Attack Severity - 2
• Compute malware port score (PS) for
attacker s
• PS(s) ={(wu x cu)+(wm x cm)}/ cu
• cm= total number of malware ports
connected by s
• cu = total number of ports connected by s
• wu and wm are the respective weights
• wm > wu: authors use wm = 4 wu
[5]
41. Attack Severity - 3
• Second measure
– Number of unique IPs connected by s {TC(s)}
– Typically TC(s) is the prioritization metric used by
GWOL
• Third measure
– Ratio of national to international IPs targeted by
attacker s. {IR(s)}
• Overall measure =
PS(s)+ log (TC(s)) + δ log(IR(s))
• Log reduces the impact of the last two terms
[5]
42. Final Blacklist
• For each attacker relevance ranking and
severity score are used.
• Assume that the target is a list of L.
• First use attacker relevance ranking to
reduce the list. Produce a list of size cL.
• Next use severity to prune the list to L.
[5]
43. Final Blacklist - 2
• Final score is computed with k being relevance
rank of the attacker s.
[5]