Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2016 MapR Technologies 1© 2014 MapR Technologies
© 2016 MapR Technologies 2
Contact Information
Ted Dunning
Chief Applications Architect at MapR Technologies
Committer & P...
© 2016 MapR Technologies 3
Agenda
• What’s this persistent threat stuff?
– What attackers do
– How they do it
• Examples
•...
© 2016 MapR Technologies 4
Agenda of All Security Talks
• Terror
• Faint hope
• More terror
• Practical suggestions
• Summ...
© 2016 MapR Technologies 5
Operation Ababil – Brobots on Parade
• Dork attack to find unpatched default Joomla sites
– Esp...
© 2016 MapR Technologies 6
Attack Sequence
Source
First level
C&C
Second
level C&C
© 2016 MapR Technologies 7
Google
Attack Sequence
Source
First level
C&C
Second
level C&C
© 2016 MapR Technologies 8
Brobot
Brobot
Brobot
Attack Sequence
Source
First level
C&C
Second
level C&C
© 2016 MapR Technologies 9
Target
Brobot
Brobot
Brobot
Attack Sequence
Source
First level
C&C
Second
level C&C
© 2016 MapR Technologies 10
Outline of an Advanced Persistent Threat
• Advanced
– Common use of zero-day for preliminary a...
© 2016 MapR Technologies 11
APT in Summary
• Attack, penetrate, pivot, exfiltrate or exploit
• If you are a high-value tar...
© 2016 MapR Technologies 12
So are we totally screwed?
© 2016 MapR Technologies 13
So are we totally screwed?
Not entirely!
© 2016 MapR Technologies 14
Event Sequences Provide Clues
• Event sequence appear in many places
• Headers
– Header types,...
© 2016 MapR Technologies 15
Sequences and Cooccurrences
• All of these characteristics form symbolic sequences
• Current s...
© 2016 MapR Technologies 16
A core technique
• Many of these easy problems reduce to finding interesting
coincidences
• Th...
© 2016 MapR Technologies 17
How do you do that?
• This is well handled using G-test
– See wikipedia
– See http://bit.ly/su...
© 2016 MapR Technologies 18
Which one is the anomalous co-occurrence?
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
n...
© 2016 MapR Technologies 19
Which one is the anomalous co-occurrence?
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
n...
© 2016 MapR Technologies 20
How to Count (header-like documents)
For each “document”:
For each “word” A:
left[A]++
For eac...
© 2016 MapR Technologies 21
• We wanted this 2 x 2 table for each A,B
• But we only counted k11 directly
• But we did coun...
© 2016 MapR Technologies 22
How to Count (continued)
Map<PriorityQueue> queue
for each pair (A,B)
k11 = count[A,B]
k1x = l...
© 2016 MapR Technologies 23
How to Count (cooccurrence)
for each (C,B)=(“context”, “word”):
if (!filter(C) && !filter(B)):...
© 2016 MapR Technologies 24
Seriously...
It really can be that simple
© 2016 MapR Technologies 25
Basic techniques
• Counting – often the hardest part
• LLR – the basic tool
• Order models
– O...
© 2016 MapR Technologies 26
Target
Brobot
Brobot
Brobot
Example 1 - Ababil
Source
First level
C&C
Second
level C&C
Defense...
© 2016 MapR Technologies 27
Spot the Important Difference?
GET /personal/comparison-table.jsp?iODg2OQ=51a90 HTTP/1.1
Host:...
© 2016 MapR Technologies 28
Spot the Important Difference?
GET /personal/comparison-table.jsp?iODg2OQ=51a90 HTTP/1.1
Host:...
© 2016 MapR Technologies 29
This could only be found at scale
© 2016 MapR Technologies 30
Target
Brobot
Brobot
Brobot
Overall Outline Again
Source
First level
C&C
Second
level C&C
Trad...
© 2016 MapR Technologies 31
Large corpus analysis of source
IP’s wins big
© 2016 MapR Technologies 32
© 2016 MapR Technologies 33
Example 2 - Common Point of Compromise
• Scenario:
– Merchant 0 is compromised, leaks account ...
© 2016 MapR Technologies 34
Example 2 - Common Point of Compromise
skim exploit
Merchant 0
Skimmed
data
Merchant n
Card da...
© 2016 MapR Technologies 35
Simulation Setup
0 20 40 60 80 100
0100300500
day
count
Compromise period
Exploit period
compr...
© 2016 MapR Technologies 36
Simulation Strategy
• For each consumer
– Pick consumer parameters such as transaction rate, p...
© 2016 MapR Technologies 37
© 2016 MapR Technologies 38
●●●●●●●●●●●●●●●●●●●● ● ●● ●●● ●●● ●●●●● ●●●●● ●●● ●●● ●● ● ●● ●● ●● ● ●●●● ●●●● ●● ●●●● ●●●● ●...
© 2016 MapR Technologies 39
Historical cooccurrence gives high
S/N
© 2016 MapR Technologies 40
Summary
• The world can be seen as sequences of symbols
• We can find patterns
• Those pattern...
© 2016 MapR Technologies 41
© 2016 MapR Technologies 42
Short Books by Ted Dunning & Ellen Friedman
• Published by O’Reilly in 2014 and 2015
• For sal...
© 2016 MapR Technologies 43
Streaming Architecture
by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly)
Free c...
© 2016 MapR Technologies 44
Thank You!
© 2016 MapR Technologies 45
Q&A
@mapr maprtech
tdunning@mapr.tech.com
Engage with us!
MapR
maprtech
mapr-technologies
Upcoming SlideShare
Loading in …5
×

Using Sequence Statistics to Fight Advanced Persistent Threats

1,233 views

Published on

Using Sequence Statistics to Fight Advanced Persistent Threats

Published in: Technology
  • Be the first to comment

Using Sequence Statistics to Fight Advanced Persistent Threats

  1. 1. © 2016 MapR Technologies 1© 2014 MapR Technologies
  2. 2. © 2016 MapR Technologies 2 Contact Information Ted Dunning Chief Applications Architect at MapR Technologies Committer & PMC for Apache’s Drill, Zookeeper & others VP of Incubator at Apache Foundation Email tdunning@apache.org tdunning@maprtech.com Twitter @ted_dunning Hashtags today: #hs16dublin #mapr
  3. 3. © 2016 MapR Technologies 3 Agenda • What’s this persistent threat stuff? – What attackers do – How they do it • Examples • Sequence statistics – Really geeking with gas now! • Detection techniques • Specifics • Summary
  4. 4. © 2016 MapR Technologies 4 Agenda of All Security Talks • Terror • Faint hope • More terror • Practical suggestions • Summary
  5. 5. © 2016 MapR Technologies 5 Operation Ababil – Brobots on Parade • Dork attack to find unpatched default Joomla sites – Especially web servers with high bandwidth connections – Basically just Google searches for default strings – Joomla compromised into attack Brobot • C&C network checks in occasionally – Note C&C is incoming request and looks like normal web requests • Later, on command, multiple Brobots direct 50-75 Gb/s of attack – Attacks come from white-listed sites
  6. 6. © 2016 MapR Technologies 6 Attack Sequence Source First level C&C Second level C&C
  7. 7. © 2016 MapR Technologies 7 Google Attack Sequence Source First level C&C Second level C&C
  8. 8. © 2016 MapR Technologies 8 Brobot Brobot Brobot Attack Sequence Source First level C&C Second level C&C
  9. 9. © 2016 MapR Technologies 9 Target Brobot Brobot Brobot Attack Sequence Source First level C&C Second level C&C
  10. 10. © 2016 MapR Technologies 10 Outline of an Advanced Persistent Threat • Advanced – Common use of zero-day for preliminary attacks – Often attributed to state-level actors – Modern privateers blur the line • Persistent – Result of first attack is heavily muffled, no immediate exploit – Remote access toolset installed (RAT) • Threat – On command, data is exfiltrated covertly or en masse – Or the compromised host is used for other nefarious purpose
  11. 11. © 2016 MapR Technologies 11 APT in Summary • Attack, penetrate, pivot, exfiltrate or exploit • If you are a high-value target, attack is likely and stealthy – High-value = telecom, banks, utilities, retail targets, web100 – … and all their vendors – Conventional multi-factor auth is easily breached • Penetration and pivot are critical counter-measure opportunities – In 2010, RAT would contact command and control (C&C) – In 2016, C&C looks like normal traffic • Once exfiltration or exploit starts, you may no longer have a business
  12. 12. © 2016 MapR Technologies 12 So are we totally screwed?
  13. 13. © 2016 MapR Technologies 13 So are we totally screwed? Not entirely!
  14. 14. © 2016 MapR Technologies 14 Event Sequences Provide Clues • Event sequence appear in many places • Headers – Header types, ordering in requests • IP address accesses – Source and destination, sequences of either • TLS options – Which options, which values, which algorithms • Incoming component request ordering and timing – Body first, CSS, scripts and images next – But which are cached, what is round-trip time?
  15. 15. © 2016 MapR Technologies 15 Sequences and Cooccurrences • All of these characteristics form symbolic sequences • Current systems use hand-crafted rules about particular state – But hand-crafting depends on human knowledge • We can do much, much better by considering cooccurrence and ordering of symbols in these sequences • Log-likelihood ratio test (jargon alert) is a key tool
  16. 16. © 2016 MapR Technologies 16 A core technique • Many of these easy problems reduce to finding interesting coincidences • This can be summarized as a 2 x 2 table • Actually, many of these tables A Other B k11 k12 Other k21 k22
  17. 17. © 2016 MapR Technologies 17 How do you do that? • This is well handled using G-test – See wikipedia – See http://bit.ly/surprise-and-coincidence • Original application in linguistics now cited > 2000 times • Available in ElasticSearch, in Solr, in Mahout • Available in R, C, Java, Python
  18. 18. © 2016 MapR Technologies 18 Which one is the anomalous co-occurrence? A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000 A not A B 1 0 not B 0 2
  19. 19. © 2016 MapR Technologies 19 Which one is the anomalous co-occurrence? A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000 A not A B 1 0 not B 0 2 0.90 1.95 4.52 14.3 Dunning Ted, Accurate Methods for the Statistics of Surprise and Coincidence, Computational Linguistics vol 19 no. 1 (1993)
  20. 20. © 2016 MapR Technologies 20 How to Count (header-like documents) For each “document”: For each “word” A: left[A]++ For each “word” B after that (within window): count[A,B]++ right[B]++ total++
  21. 21. © 2016 MapR Technologies 21 • We wanted this 2 x 2 table for each A,B • But we only counted k11 directly • But we did count k*1 = k11 + k21 (how many A’s we saw) k1* = k11 + k12 (how many B’s we saw) k** = k11 + k21 + k12 + k22 (how many pairs in total) A Other B k11 k12 Other k21 k22
  22. 22. © 2016 MapR Technologies 22 How to Count (continued) Map<PriorityQueue> queue for each pair (A,B) k11 = count[A,B] k1x = left[A] kx1 = right[B] kxx = total k12 = k1x - k11 k21 = kx2 - k11 k22 = kxx - k11 - k12 - k21 queue.add(A, (LLR(k11,k12,k21,k22), B))
  23. 23. © 2016 MapR Technologies 23 How to Count (cooccurrence) for each (C,B)=(“context”, “word”): if (!filter(C) && !filter(B)): right[B]++ for each A in history(C): count[A,B]++ left[A]++ history(C) += B total++
  24. 24. © 2016 MapR Technologies 24 Seriously... It really can be that simple
  25. 25. © 2016 MapR Technologies 25 Basic techniques • Counting – often the hardest part • LLR – the basic tool • Order models – Ordered cooccurrences – Transition probabilities – Recurrent neural networks • Ploughing a quiet field – Reimage servers often – Force attackers to pivot repeatedly
  26. 26. © 2016 MapR Technologies 26 Target Brobot Brobot Brobot Example 1 - Ababil Source First level C&C Second level C&C Defense has to happen here
  27. 27. © 2016 MapR Technologies 27 Spot the Important Difference? GET /personal/comparison-table.jsp?iODg2OQ=51a90 HTTP/1.1 Host: www.sometarget.com User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;) Accept-Encoding: deflate Accept-Charset: UTF-8 Accept-Language: fr Cache-Control: no-cache Pragma: no-cache Connection: Keep-Alive GET /photo.jpg HTTP/1.1 Host: lh4.googleusercontent.com User-Agent: Mozilla/5.0 (Macint Accept: image/png,image/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate, Referer: https://www.google.com Connection: keep-alive If-None-Match: "v9” Cache-Control: max-age=0 Attacker request Real request
  28. 28. © 2016 MapR Technologies 28 Spot the Important Difference? GET /personal/comparison-table.jsp?iODg2OQ=51a90 HTTP/1.1 Host: www.sometarget.com User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;) Accept-Encoding: deflate Accept-Charset: UTF-8 Accept-Language: fr Cache-Control: no-cache Pragma: no-cache Connection: Keep-Alive GET /photo.jpg HTTP/1.1 Host: lh4.googleusercontent.com User-Agent: Mozilla/5.0 (Macint Accept: image/png,image/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate, Referer: https://www.google.com Connection: keep-alive If-None-Match: "v9” Cache-Control: max-age=0 Attacker request Real request
  29. 29. © 2016 MapR Technologies 29 This could only be found at scale
  30. 30. © 2016 MapR Technologies 30 Target Brobot Brobot Brobot Overall Outline Again Source First level C&C Second level C&C Tradecraft error!
  31. 31. © 2016 MapR Technologies 31 Large corpus analysis of source IP’s wins big
  32. 32. © 2016 MapR Technologies 32
  33. 33. © 2016 MapR Technologies 33 Example 2 - Common Point of Compromise • Scenario: – Merchant 0 is compromised, leaks account data during compromise – Fraud committed elsewhere during exploit – High background level of fraud – Limited detection rate for exploits • Goal: – Find merchant 0 • Meta-goal: – Screen algorithms for this task without leaking sensitive data
  34. 34. © 2016 MapR Technologies 34 Example 2 - Common Point of Compromise skim exploit Merchant 0 Skimmed data Merchant n Card data is stolen from Merchant 0 That data is used in frauds at other merchants
  35. 35. © 2016 MapR Technologies 35 Simulation Setup 0 20 40 60 80 100 0100300500 day count Compromise period Exploit period compromises frauds
  36. 36. © 2016 MapR Technologies 36 Simulation Strategy • For each consumer – Pick consumer parameters such as transaction rate, preferences – Generate transactions until end of sim-time • If merchant 0 during compromise time, possibly mark as compromised • For all transactions, possible mark as fraud, probability depends on history • Merchants are selected using hierarchical Pittman-Yor • Restate data – Flatten transaction streams – Sort by time • Tunables – Compromise probability, transaction rates, background fraud, detection probability
  37. 37. © 2016 MapR Technologies 37
  38. 38. © 2016 MapR Technologies 38 ●●●●●●●●●●●●●●●●●●●● ● ●● ●●● ●●● ●●●●● ●●●●● ●●● ●●● ●● ● ●● ●● ●● ● ●●●● ●●●● ●● ●●●● ●●●● ●●● ●● ●● ● ●● ● ●●●● ●● ● ●●●● ●●●●●● ●● ●● ●●● ●●● ●●●●● ● ●●● ●● ●●● ●●● ●● ●●●● ● ●● ●●● ●●● ● ● ● ●● ● ● ● ●● 020406080 LLR score for real data Number of Merchants BreachScore(LLR) Real truly bad guys 100 101 102 103 104 105 106 Really truly bad guys
  39. 39. © 2016 MapR Technologies 39 Historical cooccurrence gives high S/N
  40. 40. © 2016 MapR Technologies 40 Summary • The world can be seen as sequences of symbols • We can find patterns • Those patterns can nail opponents • Many patterns only appear at scale • You can do this
  41. 41. © 2016 MapR Technologies 41
  42. 42. © 2016 MapR Technologies 42 Short Books by Ted Dunning & Ellen Friedman • Published by O’Reilly in 2014 and 2015 • For sale from Amazon or O’Reilly • Free e-books currently available courtesy of MapR http://bit.ly/ebook-real- world-hadoop http://bit.ly/mapr-tsdb- ebook http://bit.ly/ebook- anomaly http://bit.ly/recommend ation-ebook
  43. 43. © 2016 MapR Technologies 43 Streaming Architecture by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly) Free copies at book signing today (oops… that was earlier) http://bit.ly/mapr-ebook-streams
  44. 44. © 2016 MapR Technologies 44 Thank You!
  45. 45. © 2016 MapR Technologies 45 Q&A @mapr maprtech tdunning@mapr.tech.com Engage with us! MapR maprtech mapr-technologies

×