Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Finding the Needle in the IP Stack<br />Dr. Sven Krasser<br />McAfee, Inc.<br />Session ID: RR-403<br />Session Classifica...
Agenda<br />Data Mining – A Human Approach<br />English Words<br />Bad Behavior<br />What’s in a File<br />Conclusions<br ...
Data Mining<br />A Human Approach<br />3<br />
Anthropometric Data<br />4<br />Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro<br />
Measurements<br />5<br />Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro<br />
Measurements (continueD)<br />6<br />Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro<br />
250 –<br />200 –<br />150 –<br />100 –<br />Weight (in pounds)<br />Height Versus Weight<br />60	65	70	75	80<br />Height (...
250 –<br />200 –<br />150 –<br />100 –<br />Women<br />Weight (in pounds)<br />Men<br />60	65	70	75	80<br />Height (in inc...
Putting Weight and Height Into Perspective<br />9<br />
Best Guess for Gender<br />100% male<br />0% female<br />50% male<br />50% female<br />Weight (in pounds)<br />Best Guess<...
One Dimension Only<br />0.15 –<br />0.10 –<br />0.05 –<br />0.00 –<br />55	60	65	70	75<br />Height (in inches)<br />11<br />
Better Features<br />200 –<br />180 –<br />160 –<br />140 –<br />120 –<br />100 –<br />Weight (in pounds)<br />800	900	100...
Best Guess for Revised Features<br />13<br />Weight (in pounds)<br />Best Guess<br />Buttock Circumference<br />
Further Improving the Separation<br />Signal to Noise<br />Features with very different distribution per class<br />Correl...
Email Data in Three Dimensions<br />15<br />
16<br />Sparse Data<br />25  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  ...
FinalVerdict<br />+<br />Classification Algorithms<br />Decision Trees<br />Decision Forests<br />Support Vector Machines<...
English Words<br />And why do they look English?<br />18<br />
Some English Words<br />militate<br />caterwaul<br />deracinate<br />arrant<br />concinnity<br />imprecation<br />vertigin...
Some English Explanations<br />militate: to have force or influence<br />caterwaul: to make a harsh cry or screech<br />de...
Transition Probabilities<br />21<br />
Active .com Domains<br />22<br />82 million active .com domains<br />
Markov Chains<br />.0073<br />.0641<br />.0213<br />.0912<br />.0912<br />.0732<br />.0014<br />.2175<br />.0143<br />.262...
Limitations of the Markov model<br />Useful to detect malicious domain names<br />Very effective for randomly generated na...
DNS Features<br />The number of the nameservers that hosted or are hosting this domain <br />The average time of one names...
0.15 –<br />0.10 –<br />0.05 –<br />0.00 –<br />Example Feature<br />Density<br />0	200	400	600<br />Time of domain on nam...
27<br />Results Analysis<br />True Positive Rate<br />False Positive Rate<br />27<br />
Bad Behavior<br />Email and Spam<br />28<br />
IP Blacklist Lookup<br />Mail server looks up sender IP over DNS<br />Simple classifier modeled on IP blacklist query logs...
Q?<br />Q=x<br />Q?<br />Q=x<br />IP Lookups<br />Sender<br />Receiver<br />DNS <br />Reputation server<br /><Q, S, T><br ...
Feature Extraction<br />Breadth features<br /><ul><li>Number of messages
Number of recipients
Burstiness (data transmitted in short, uneven spurts)
Sending sessions to individual recipients
Global sending sessions to any recipient</li></ul>Spectral features<br /><ul><li>Periodicity over 24-hour window
Average and standard deviation of low-frequency discrete Fourier transform (DFT coefficients)
Average and standard deviation of high-frequency DFT coefficients</li></ul>Distribution features<br /><ul><li>Source IPs (...
Selection of Advanced Features<br />Geographic features<br />32<br />Static features<br /><ul><li>Location of sender and r...
Distance
Local time at sender and receiver
Host name features
Dial-up Ips
Reputation of neighboring IPs</li></ul>Content features<br />Sparse distribution features<br /><ul><li>Ratio of good and b...
Number of “from” domains handled
Persistent sender/receiver address pairs
Message size distribution
Upcoming SlideShare
Loading in …5
×

Finding the Needle in the IP Stack

1,854 views

Published on

RSA Conference 2010

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Finding the Needle in the IP Stack

  1. 1. Finding the Needle in the IP Stack<br />Dr. Sven Krasser<br />McAfee, Inc.<br />Session ID: RR-403<br />Session Classification: Intermediate<br />
  2. 2. Agenda<br />Data Mining – A Human Approach<br />English Words<br />Bad Behavior<br />What’s in a File<br />Conclusions<br />2<br />
  3. 3. Data Mining<br />A Human Approach<br />3<br />
  4. 4. Anthropometric Data<br />4<br />Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro<br />
  5. 5. Measurements<br />5<br />Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro<br />
  6. 6. Measurements (continueD)<br />6<br />Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro<br />
  7. 7. 250 –<br />200 –<br />150 –<br />100 –<br />Weight (in pounds)<br />Height Versus Weight<br />60 65 70 75 80<br />Height (in inches)<br />7<br />
  8. 8. 250 –<br />200 –<br />150 –<br />100 –<br />Women<br />Weight (in pounds)<br />Men<br />60 65 70 75 80<br />Height (in inches)<br />Height Versus Weight (continued)<br />8<br />
  9. 9. Putting Weight and Height Into Perspective<br />9<br />
  10. 10. Best Guess for Gender<br />100% male<br />0% female<br />50% male<br />50% female<br />Weight (in pounds)<br />Best Guess<br />0% male<br />100% female<br />Height (in inches)<br />10<br />
  11. 11. One Dimension Only<br />0.15 –<br />0.10 –<br />0.05 –<br />0.00 –<br />55 60 65 70 75<br />Height (in inches)<br />11<br />
  12. 12. Better Features<br />200 –<br />180 –<br />160 –<br />140 –<br />120 –<br />100 –<br />Weight (in pounds)<br />800 900 1000 1100 1200<br />Buttock Circumference: “The circumference of the body measured at the level of the maximum posterior protuberance of the buttocks.”<br />12<br />
  13. 13. Best Guess for Revised Features<br />13<br />Weight (in pounds)<br />Best Guess<br />Buttock Circumference<br />
  14. 14. Further Improving the Separation<br />Signal to Noise<br />Features with very different distribution per class<br />Correlation<br />Features with low correlation<br />Dimensionality<br />Consider more features at the same time<br />14<br />
  15. 15. Email Data in Three Dimensions<br />15<br />
  16. 16. 16<br />Sparse Data<br />25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br />10 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br />40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 3 0 0 0 0 0 0 2 0 0 0 1 0 3 1 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br /> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <br />
  17. 17. FinalVerdict<br />+<br />Classification Algorithms<br />Decision Trees<br />Decision Forests<br />Support Vector Machines<br />Neural Networks<br />17<br />
  18. 18. English Words<br />And why do they look English?<br />18<br />
  19. 19. Some English Words<br />militate<br />caterwaul<br />deracinate<br />arrant<br />concinnity<br />imprecation<br />vertiginous<br />profuse<br />19<br />
  20. 20. Some English Explanations<br />militate: to have force or influence<br />caterwaul: to make a harsh cry or screech<br />deracinate: to uproot<br />arrant: outright; thoroughgoing<br />concinnity: elegance – used chiefly of literary style<br />imprecation: a curse<br />vertiginous: causing dizziness; also, giddy; dizzy<br />profuse: plentiful; copious<br />20<br />Source: http://dictionary.reference.com/<br />
  21. 21. Transition Probabilities<br />21<br />
  22. 22. Active .com Domains<br />22<br />82 million active .com domains<br />
  23. 23. Markov Chains<br />.0073<br />.0641<br />.0213<br />.0912<br />.0912<br />.0732<br />.0014<br />.2175<br />.0143<br />.2626<br />.0301<br />.0939<br />.0322<br />.2419<br />.3598<br />.1457<br />.0633<br />.1064<br />.0588<br />.1733<br />.0872<br />.2738<br />.0431<br />.1534<br />.0932<br />.0714<br />.2936<br />.0437<br />.1860<br />.0196<br />.0371<br />.0291<br />.1932<br />.1120<br />.1269<br />.0411<br />.4759<br />.2979<br />ab<br />bn<br />nk<br />ko<br />of<br />fp<br />pu<br />nj<br />ja<br />fe<br />er<br />rr<br />ry<br />yl<br />li<br />ne<br />es<br />eb<br />ba<br />ay<br />un<br />in<br />Analysis of recent domain registrations<br />Using Second Order Markov Chains to detect potentially malicious domain names<br />bnkofpunjab is not legitimate<br />ferrylines.com is legitimate<br />ebay.com is not determinable<br />23<br />
  24. 24. Limitations of the Markov model<br />Useful to detect malicious domain names<br />Very effective for randomly generated names<br />Detects some legitimate domain names as malicious domains<br />Malicious names similar to legitimate ones (e.g. ebay.com phishing sites)<br />International domain names and punycode<br />Solution: add DNS related features into classification process<br />24<br />
  25. 25. DNS Features<br />The number of the nameservers that hosted or are hosting this domain <br />The average time of one nameserver to host this domain <br />The maximum time of one nameserver to host this domain <br />The minimum time of one nameserver to host this domain <br />The number of non-activated nameservers that hosted this domain before<br />Whether the domain is an international one<br />25<br />
  26. 26. 0.15 –<br />0.10 –<br />0.05 –<br />0.00 –<br />Example Feature<br />Density<br />0 200 400 600<br />Time of domain on name server (in days)<br />26<br />
  27. 27. 27<br />Results Analysis<br />True Positive Rate<br />False Positive Rate<br />27<br />
  28. 28. Bad Behavior<br />Email and Spam<br />28<br />
  29. 29. IP Blacklist Lookup<br />Mail server looks up sender IP over DNS<br />Simple classifier modeled on IP blacklist query logs<br />Narrow data set – queried IP, source IP, timestamp<br />Deep data set – billions of query records monthly<br />More complex data can be included<br />29<br />
  30. 30. Q?<br />Q=x<br />Q?<br />Q=x<br />IP Lookups<br />Sender<br />Receiver<br />DNS <br />Reputation server<br /><Q, S, T><br />IP=S<br />IP=Q<br />30<br />
  31. 31. Feature Extraction<br />Breadth features<br /><ul><li>Number of messages
  32. 32. Number of recipients
  33. 33. Burstiness (data transmitted in short, uneven spurts)
  34. 34. Sending sessions to individual recipients
  35. 35. Global sending sessions to any recipient</li></ul>Spectral features<br /><ul><li>Periodicity over 24-hour window
  36. 36. Average and standard deviation of low-frequency discrete Fourier transform (DFT coefficients)
  37. 37. Average and standard deviation of high-frequency DFT coefficients</li></ul>Distribution features<br /><ul><li>Source IPs (thousands)</li></ul>31<br />
  38. 38. Selection of Advanced Features<br />Geographic features<br />32<br />Static features<br /><ul><li>Location of sender and receiver
  39. 39. Distance
  40. 40. Local time at sender and receiver
  41. 41. Host name features
  42. 42. Dial-up Ips
  43. 43. Reputation of neighboring IPs</li></ul>Content features<br />Sparse distribution features<br /><ul><li>Ratio of good and bad messages
  44. 44. Number of “from” domains handled
  45. 45. Persistent sender/receiver address pairs
  46. 46. Message size distribution
  47. 47. Source devices (thousands)
  48. 48. Extended HELO (EHLO) strings (millions)
  49. 49. “From” domains (billions)
  50. 50. “To” addresses (billions)</li></li></ul><li>Breadth Features<br />Spam<br />Normalized number of messages per receiver<br />Ham<br />Normalized number of receivers<br />33<br />
  51. 51. What’s in a File<br />A Look at Image Spam and Malware<br />34<br />
  52. 52. Image Spam—OCR Evasion<br />35<br />
  53. 53. Image Spam—Composition<br />36<br />
  54. 54. Close-Up of Gradient<br />37<br />
  55. 55. Close-Up of Gradient (continued)<br />38<br />
  56. 56. Gradient Field of Photo<br />39<br />
  57. 57. Gradient Directions<br />40<br />
  58. 58. Image Feature Analysis<br />1:0 2:266 3:285 4:0.933333 5:9678 6:7.83323 7:1 8:0 9:0.038768 10:0.0286506 11:0.0242844 12:12.9656 13:0.688315 14:0.688289 15:0.688927 16:0.688345 17:1.47216 18:1.48728 19:1.45537 20:1.4721 21:0.998652 22:0.998907 23:0.998662 24:1 25:1 26:1 27:1 28:1 29:1 30:1 31:1 32:1 33:1 34:1 35:1 36:1 37:1 38:1 39:1 40:1 41:1 42:1 43:1 44:1 45:1 46:1 47:1 48:1 49:1 50:1 51:1 52:1 53:1 54:1 55:1 56:1 57:1 58:1 59:1 60:62895.6 61:62894.4 62:62923.5 63:62897 64:11.9708 65:0.439338 66:0.0768368 67:0.0533835 68:0.694764 69:285 70:97 71:106 72:99 73:97 74:69979 75:69484 76:68665 77:69365 78:1 79:0 80:0 81:0.0342435 82:0.0281361 83:0.025709 84:1327.37 85:35.0028 86:28.6605 87:0.818808 88:1 89:2.98484e+07 90:4.16282e+06 91:8.01424e+06 92:1.49028e+07 93:3.56203e+09 94:7.21651e+06 95:4.73602e+06 96:3.10232e+07 97:0.0083796 98:0.576846 99:1.69219 100:0.480375 101:3.61226e+09 102:3.74413e+07 103:1.22301e+07 104:1.17737e+07 105:3.6044e+07 106:3.47745e+09<br />1:0 2:403 3:328 4:1.22866 5:14076 6:9.39074 7:1 8:0 9:0.0107123 10:0.00245869 11:0.00118774 12:8.11821 13:0.437548 14:0.43765 15:0.437561 16:0.437535 17:1.50918 18:1.49392 19:1.50991 20:1.50827 21:0.487349 22:3.32315e-05 23:9.95995e-05 24:2 25:1 26:4 27:2 28:1 29:4 30:2 31:1 32:4 33:2 34:1 35:4 36:2 37:1 38:4 39:2 40:1 41:4 42:2 43:1 44:4 45:2 46:1 47:4 48:2 49:1 50:4 51:2 52:1 53:4 54:2 55:1 56:4 57:2 58:1 59:4 60:87436.3 61:87446.5 62:87437.6 63:87435 64:21.4308 65:0.770517 66:0.0444456 67:0.0244281 68:0.549617 69:328 70:98 71:98 72:103 73:90 74:105800 75:99639 76:109102 77:104674 78:1 79:0 80:0 81:0.00520487 82:0.00256461 83:0.00166435 84:771.479 85:20.5683 86:47.573 87:2.31293 88:1 89:1.2547e+07 90:1.11096e+06 91:3.35713e+06 92:4.41541e+06 93:2.70918e+09 94:2.06067e+06 95:2.66906e+06 96:1.28006e+07 97:0.0046313 98:0.539126 99:1.2578 100:0.344938 101:2.72749e+09 102:1.02016e+07 103:1.04445e+07 104:1.03338e+07 105:1.00934e+07 106:2.69858e+09<br />1:0 2:418 3:320 4:1.30625 5:18652 6:7.17135 7:1 8:0 9:0.0106459 10:0.00264653 11:0.000994318 12:14.1862 13:0.243456 14:0.243497 15:0.243457 16:0.243446 17:2.41721 18:2.4152 19:2.41193 20:2.41671 21:7.91675e-05 22:8.63708e-05 23:0.339384 24:4 25:1 26:8 27:3 28:1 29:8 30:2 31:1 32:8 33:4 34:1 35:8 36:3 37:1 38:8 39:2 40:1 41:8 42:4 43:1 44:8 45:3 46:1 47:8 48:2 49:1 50:8 51:4 52:1 53:8 54:3 55:1 56:8 57:2 58:1 59:8 60:65998.9 61:66004.4 62:65999 63:65997.5 64:10.224 65:0.127104 66:0.0635766 67:0.056407 68:0.88723 69:320 70:53 71:48 72:57 73:57 74:111983 75:115960 76:114435 77:113875 78:1 79:0 80:0 81:0.006407 82:0.00189145 83:0.000485945 84:964.421 85:33.207 86:64.7237 87:1.9491 88:1 89:1.76351e+07 90:2.50429e+06 91:6.24028e+06 92:1.09962e+07 93:3.00335e+09 94:3.5386e+06 95:5.21808e+06 96:1.85759e+07 97:0.00587181 98:0.707707 99:1.1959 100:0.591959 101:3.02005e+09 102:2.17951e+07 103:2.6213e+07 104:2.59369e+07 105:2.15655e+07 106:2.98824e+09<br />1:0 2:425 3:213 4:1.99531 5:0 6:inf 7:1 8:0 9:0.0204143 10:0.0121072 11:0.00813035 12:14.5448 13:0.574197 14:0.562077 15:0.0938837 16:0.106849 17:2.52864 18:2.29707 19:5.7086 20:5.11698 21:0.0739991 22:0.95797 23:0.951505 24:1 25:1 26:1 27:1 28:1 29:1 30:1 31:1 32:1 33:2 34:1 35:2 36:1 37:1 38:2 39:1 40:1 41:2 42:2 43:5 44:1 45:1 46:1 47:1 48:1 49:1 50:1 51:3 52:3.66667 53:5 54:1 55:1 56:5 57:1 58:1 59:3 60:68596 61:67868.2 62:27737.3 63:29590.7 64:11.4527 65:1.08368 66:0.077625 67:0.0372273 68:0.479579 69:213 70:256 71:256 72:256 73:255 74:83329 75:78194 76:72107 77:77795 78:0 79:1 80:0 81:0.0200608 82:0.0118089 83:0.00857222 84:1814.96 85:43.1429 86:37.0588 87:0.858977 88:1 89:3.50206e+07 90:3.97185e+06 91:7.57905e+06 92:1.92885e+07 93:2.92089e+09 94:5.71381e+06 95:5.4605e+06 96:3.81577e+07 97:0.0119897 98:0.695132 99:1.38798 100:0.505495 101:2.99697e+09 102:2.84841e+07 103:1.06295e+07 104:1.04169e+07 105:2.79142e+07 106:2.93701e+09<br />1:0 2:345 3:328 4:1.05183 5:12654 6:8.94263 7:1 8:0 9:0.197119 10:0.144919 11:0.130974 12:16.5426 13:0.213558 14:0.213561 15:0.213558 16:0.213541 17:2.58033 18:2.58009 19:2.58045 20:2.57963 21:0.00235566 22:8.63563e-05 23:8.24988e-05 24:5 25:1 26:10 27:4 28:1 29:10 30:2 31:1 32:10 33:5 34:1 35:10 36:4 37:1 38:10 39:2 40:1 41:10 42:4 43:1.25 44:9 45:3 46:1.33333 47:8 48:2 49:1 50:8 51:5 52:1 53:10 54:4 55:1 56:10 57:2 58:1 59:10 60:52293.8 61:52294.2 62:52293.9 63:52291.8 64:10.1244 65:0.115826 66:0.0834305 67:0.0747702 68:0.896197 69:328 70:171 71:154 72:169 73:129 74:16728 75:14297 76:14292 77:15012 78:1 79:0 80:0 81:0.167754 82:0.150486 83:0.14035 84:1517.35 85:38.3333 86:65.9516 87:1.72048 88:1 89:2.79228e+07 90:3.07939e+06 91:6.79947e+06 92:1.53908e+07 93:1.22e+08 94:5.30236e+06 95:5.54061e+06 96:2.88332e+07 97:0.228875 98:0.580758 99:1.22721 100:0.533785 101:1.49517e+08 102:3.08441e+07 103:3.45075e+07 104:2.88255e+07 105:2.57652e+07 106:1.24897e+08<br />41<br />
  59. 59. View of two-dimensional subspace<br />Image Feature Analysis<br />Ham<br />Spam<br />42<br />
  60. 60. Malware Feature Analysis<br />43<br />
  61. 61. Conclusions<br />44<br />
  62. 62. Conclusions<br />Heuristics are limited<br />Mathematical descriptions<br />Dimensionality<br />Intuition<br />45<br />
  63. 63. http://www.trustedsource.org/en/resources/publications<br />March 4, 2010<br />TrustedSource Data Mining Technologies<br />46<br />Research Publications<br />46<br />

×