Performing Network & Security Analytics with Hadoop


Published on

Published in: Technology

Performing Network & Security Analytics with Hadoop

  1. 1. Performing Network & Security Analyticswith HadoopTravis DawsonDirector of Product Management Narus, IncHadoop Summit 2012
  2. 2. Agenda Who am I, What do I do What is Network & Security Analytics Using Hadoop in Network & Security Analytics What becomes possible with Big Data Analytics Putting it all together Lessons Learned Narus | 2
  3. 3. Who am IWhat do I do Geek Director of Product Management, Narus Inc – Narus Inc, A wholly owned subsidiary of Boeing – Build High Performance Network Intelligence Systems – I herd cats and make Powerpoints all day – Occasionally think about product requirements Principal Member Technical Staff, Sprint – Sprint Advanced Technology Labs – Wireline/Wireless Network Architecture, Design, Security – I broke stuff Narus | 3
  4. 4. What is Network & Security AnalyticsA type of voodoo, but with computersThe (black) art of finding malicious or problematic sessions in a mountain of network traffic Multiple approaches – Signatures/Blacklists – Behavior – Algorithmic – Ouiji Board, Live Chicken, Full Moon, etc Single Goal – Identify malicious or problematic traffic before it causes substantial harm to your network or your assets. Narus | 4
  5. 5. Network & Security AnalyticsWhat’s working against you The enemy is ever-changing and infinitely intelligent New attack vectors are more difficult to detect than ever – Polymorphic, Randomized – APTs are real – Zero-Days – Protocol, Application, OS Traditional Methods in-effective – Payloads ever changing – Simply too many new and existing Higher speeds of links makes deeper analysis harder – 10G/sec maxes out at ~15M packets per second Narus | 5
  6. 6. What is Network & Security AnalyticsFinding the Needle in a stack of Needles Where to look – Which stack of Needles do I need to look at What are you looking for – Do you know? – Are you guessing? – Do you know what you are NOT looking for? How to find something that is not ‘right’ – What is ‘right’, what is ‘not-right’, what is ‘wrong’? – What is the difference? – What is ‘normal’ vs what is ‘right’ ? – How much data do you need ? Narus | 6
  7. 7. What is Network & Security AnalyticsSolving the Network & Security Analytics Problem Multiple Methods, Multiple Algorithms, Multiple Passes Per Analytic You need a lot of data to determine what is ‘not-right’ – More data == More accurate results You need to run sophisticated algorithms across the data – Use new algorithms to find something ‘not-right’ – Not always easy You need multiple passes on the data – One Algorithm feeds the next Algorithm – Focus on the workflow, how an analyst would work. Narus | 7
  8. 8. Breaking out of the SQL PrisonA quick rant SQL has been around since the 70’s – So have I! – Great for solving ‘known’ problems Unable to perform the deep analytics required – No combination of SELECT, JOIN, UDF will get you what you need at times – Unstructured data is a nightmare and now more common However, use of one tool does not mean you can’t use another tool as well – SQL and Hadoop can live very happily together – The right tool for the right job, or more precisely: • The right tool for the right PART of the job Narus | 8
  9. 9. Network & Security AnalyticUsing Hadoop to solve the hard problems Amount of Data – 1 week -> 1 Month+ of data: 100’s of Billions of Sessions, 100’s of TB’s of Data, ingesting dozens of data types and millions of sessions per hour Algorithms – Looking for sessions that look something like this thing or maybe unlike this other thing. You can do that right??? Unstructured – We have no idea what we are going to get in terms of information Price per Analytic Hour – How much does it cost to run this analytic in a set amount of time Narus | 9
  10. 10. Network & Security AnalyticA Simple Workflow Example Find a Polymorphic BotNet/Worm infection vector Find the suspected infected hosts – Clustering/Behavior/Signatures to find possible bots and worms Find the Command & Control – From list of suspects, who are the most popular ‘servers’ Find ALL of the possible infections – From C&C servers, what hosts were communicated with – Cluster and group similar hosts to find even more Find the Infection Vector – From all the suspect hosts, cluster hosts by common Application ‘features’ and traffic patterns You need a LOT of data and it’s non-deterministic Narus | 10
  11. 11. Network & Security AnalyticWorkflow details What Makes This Work Hadoop Tools/Methods Used – Entropy, FFT, Behavior Jobs – Mahout (Clustering and Machine Learning) – Custom Clustering (Hourglass Co-Clustering) – Custom Correlation Other Tools Used – Streaming Classification/Statistics Engine – RDBMS – Visualization Front End Narus | 11
  12. 12. Network & Security Analytic In real life Many tools enabling each other I need to I know I don’t know I need to I need to view capture the what I am what I am organize the the findings traffic looking for looking for findings logically Metadata Datasets Deep Summary ViewsPackets Streaming Analysis Shallow Capture Analysis Hadoop Analysis RDBMS Narus | 12
  13. 13. Lessons learnedHow we learned to make it all work Don’t use a hammer when you need a scalpel – It just doesn’t work, don’t force it. – If there is a better way of doing it, use that way Hadoop does a lot of things really well – Complicated algorithms over vast amounts of data – Unstructured Data Hadoop does some things really poorly – Low Latency results for visualization – Simple Statistics and some groupings Use Hadoop in conjunction with other tools – Use the best tool for the job. – Break the job into pieces and evaluate the tools for each piece Narus | 13
  14. 14. ConclusionHadoop as a platform for Network Security Analytics Hadoop has allowed us to solve problems for our customers that were previously unsolvable in a reasonable amount of time New algorithms and analytics were made possible by Hadoop By using Hadoop in conjunction with our Streaming Engine and an RDBMS we were able to create a system that performed better then just the sum of its parts. We are now able to scale into larger datasets and extract even better insights then before No longer confined by any tool, we leverage the power of Hadoop to solve many of our problems Narus | 14
  15. 15. Q&A Narus | 15