• Save
Hadoop World 2011: Sherpasurfing - Wayne Wheeles

Hadoop World 2011: Sherpasurfing - Wayne Wheeles



Consider this - each day, billions of packets both benign and some malicious flow in and out of networks. The ability to survive the sheer volume of data, bring the NETFLOW data to rest, enrich it, ...

Consider this - each day, billions of packets both benign and some malicious flow in and out of networks. The ability to survive the sheer volume of data, bring the NETFLOW data to rest, enrich it, correlate it and perform analysis is essential tasks of the modern Defensive Cyber Security Organization. SHERPASURFING is an open source platform built on the proven Cloudera stack enabling organizations to perform the Cyber Security mission at scale at an affordable price point. This session will include an overview of the solution, presentation of components and a demonstration of analytics.



Total Views
Views on SlideShare
Embed Views



7 Embeds 460

http://www.cloudera.com 435
http://www.linkedin.com 13
http://blog.cloudera.com 4
http://cloudera.brian.dev 2
http://cloudera.louddog.net 2
http://cloudera.matt.dev 2
https://www.linkedin.com 2



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • On an opening note please hold all questions until the end.Two days ago I was practicing this presentation with my daughter/I opened with “This presentation has been 20 years in the making” to which my daughter replied “it took you twenty years to put together a power point?”OK, got that out of the waySHERPAsurfing takes all of the knowledge that has been captured over that 22 years and puts it to workI love quotes and this is one is a personal favorite that explains my view of SHERPASurfingAn open source Cyber Security platform is representative of my personal Everest and the fact that I am in no shape to scale Everest
  • I work for Novii Design, based in scenic Fulton Maryland, beautiful place between Washington and BaltimoreMy name is Wayne Wheeles, I am a Cyber Security Defensive Analytic Developer – Active Defense for Novii Design based in Fulton MDOver the last twenty years, I have worked as a software developer, Cyber Security Analytic developer, Data Scientist, Architect and Database Engineer for a variety of customers in the public and private sector. I have worked with high speed data feeds, quick response analytics and BIG DATA over 2.5PI have over 18 analytics in production, 12 different forms of data enrichment
  • Equally important to what is SHERPASURFING is answering “Why did you do this?”I attended a Cyber Security Conference earlier this year in Nashville, I was presenting on a very large Cyber Security solution I was working on.Over three days I was approached repeatedly by customers from small to medium sized organizations askingabout the potential for a cost effective solution.SHERPASURFING or SHERPA for short is an open source solution, providing a framework, foundation set of services, data sources, cookbooks and patterns for developmentThe purpose of all of these components is the design, development and deployment of analytics with a ultimate objective of exchange of tradecraft/analytics Based on the proven Apache HADOOP stackThe Goal, a cost effective solution, that medium sized organizations can afford that will handleBillions of records each day10 – 50 Terabytes of data each day
  • This afternoon we are going to discuss the problem with a focus on doing something about this is not a problem appreciation exerciseWe will investigate the potential provided by SHERPA and projects like it to explore the potential of “Is there a better way”We will circle back with conclusionsThen I will happily entertain questions and discussion
  • I view Cyber Security as one of the central challenges facing public and private organizations as well as individuals over the next decadeI wanted to open this afternoon with a brief introduction to the problem space posed by Cyber SecuritySomething to provide an initial context
  • In order to complete this story I wanted to level set on the Forces and Economics driving Cyber SecurityLets Open by discussing hackers and hackingThe commonly held opinion is that hackers are either script kitties or Jedi Knights They have created their own underground economy - Well financed, well trained, well organized and dedicated (over 1 trillion dollars annually estimated 2009) - The cost of hacking has declined by significantly the last five years - Tools - Tradecraft - Techniques - They are driven by a pervasive perception that they cannot be caught - The hackers play outside the boxPotential Victims - The cost to “defend yourself” has exploded several hundred percent, the Cyber Security Defense industry was a reported 1.4T in 2009 - An effective solution to this threat must be comprehensive and include all of the staff not just Cyber Security professionals - Companies bottom lines are being squeezed so they do not have the resources to face the threat/riskDefendersI met with a group of defenders who likened it to “putting on a hamburger necklace and jumping in a bear cage” - The defenders who are well versed in the threat since it is not static are few and far between - Tools are available at a cost that is generally unaffordable or address only one element of the threat - The tools do not bring the data together to see what is going on across the enterprise - Defenders are too often not trained to harden the environment - Often opt to use out of the box solutions
  • I would like to share a brief story that will illustrate how companies/organizations learn about Cyber SecurityAftershock Widget Corporation is representative of so many companies/organizations across the country and the globeThey are a small company that develops software for a variety of platformsThere education on Cyber Security begins with a series of bizarre credit card charges (anyone familiar with this ? Show of hands)Next, Aftershocks next generation app that is on the drawing board is released earlier by a competitor six months. We hear so much about Web site defacementAnd hear so very little about economic espionage/or theft of intellectual propertyFinally, so who has spent some time in a back up lately? Excellent now you understand what a Denial of Service. This is an resource exhaustion attack vectorAnd what is the result ? In one recent study it cost 6.3M for each 24hr period of outageSo what did Aftershock do?
  • Aftershock called for help like so many companies/organizations often follow a similar design patternWe called in some smart peopleWho brought in some more smart peopleWho brought in even more smart peopleWho brought in an off the shelf solution resulting in a clear and compelling roadmap for the companyThe solution was butts in seats, big iron, elongated time lines/delivery schedules and huge integration costs. Any time a solution is measured in pallets of money you are in troubleThis is marketecture
  • If I learned one thing from Being with Six3 it is Practical Cyber…..I want to share this with you today and the SHERPA Solution is built on its tenants:Really, my realization that I share with you is that there has to be a better waySHERPA is a simple attempt to provide an alternative that investigates/explores/challenges this very question
  • So based on over a decade of experience as a developer and countless interviews with customers I arrived at a set of driving tenants for the solutionI have to provide a solution that I can afford to scale and grow from gigabytes of data to petabytes of data without starting over again each year It has been my experience that you must bring the data together for real discovery to occur. I must be able to correlate what I am seeing from one dataSource with data from other sourcesMy solution must support: structured, unstructured, high volume, low volume, schema, nonschema data My solution must be question agnostic because the questions are everchanging (today malware, tomorrow DDOS, enext week xfiltration) My solution must offer the potential for sharing of analytics, tradecraft across the organization and potentially across the Cyber Security community (Stretch goal)
  • Take a look at the environment to determine what sources are available, how much data do they produceWhat format is the data in?Is it ASCII, binary, files, streams,What do I need to derive from the data?What do I do with the information ? Who do I send it to?Say I found out that there is someone Exfiltrating information Who would I inform? Corporate? Law enforcement?What do we do then?When you turn over a rock there is no telling what you will find
  • Six3 Systems was founded on the principle of STOP TALKING AND START DOING”So SHERPA is a Larry the Cable Guy approach to Cyber SecuritySHERPA provides a specification for the hardware, deployment plan, and tuning guideThis is the simple checklist that was derived from building our development environmentWe took some commodity hardware, 5 commodity boxes and a switch, 15000 in hardware, provisioned with Red Hat 6.1Installed the JDKInstalled all of our components based on the SHERPASURFING deployment planMy Environment is now up and running in a day or two and is ready for data!
  • As we mentioned earlier, I identify the sources of data that I need to be piped into the systemI inventory the things I need to defend:I inventory where sensitive data is located on my network: User Data/Personal Identifying Information, Intellectual Property, and Protected Data I inventory the sources available to defend my network:What are the sources ? Where are they ?What do they tell meWe started at our connection to the internet which is Aftershock corporate Internet gateway 1This gateway provides the customer user domain with access to applications, websites, email, resources, Webex, and increasingly VOIPAs we covered earlier, it also will provide potential adversaries with access to my Intellectual Property, User Sensitive Data, Trade Secrets and Protected Data So traditionally, moving data from the outer defenses to the point of analysis has been a major task(hand scripted, scp, cron jobs)FLUME – Which is a log and data collector framework is awesomeIt took what was a several week function down to a day or two All I need to deliver data to my analysis platform is:1.) Install FLUME on the boxes to be collected from2.) Use built in functionality and test by dumping the data3.) Configure the “SINK” or destination for each of the Agents
  • Reference Architecture with services inlaid
  • This slide we demonstrate an analytic on top of the stackWe are finalizing the analytic to be incorporated
  • This slide we demonstrate an analytic on top of the stackWe are finalizing the analytic to be incorporated

Hadoop World 2011: Sherpasurfing - Wayne Wheeles Hadoop World 2011: Sherpasurfing - Wayne Wheeles Presentation Transcript

  • "Everest for me, and I believe for the world, is the physical and symbolic manifestation of overcoming odds to achieve a dream." ~Tom WhittakerSherpasurfingOpen Source Cyber Security SolutionWayne Wheeles, Six3 SystemsActive Defensive Analytic DeveloperHadoop World 2011 cloudera Six3 Systems
  • About me• Employed by Six3 Systems, based in Fulton MD• Defensive Analytic Developer (CND-OPS)• Decade of solutions in Analytics & Big data• 18 analytics in production, 12 forms of enrichmentHadoop World 2011 cloudera Six3 Systems
  • What is SHERPASURFINGOpen source Cyber Security Solution, providing aframework, base set of proven services, data sources,how to guides and patterns for analytic developmentbuilt on top of the Apache Hadoop stackHadoop World 2011 cloudera Six3 Systems
  • Topics• The Problem• Is there a better way?• Conclusions• QuestionsHadoop World 2011 cloudera Six3 Systems
  • The ProblemHadoop World 2011 cloudera Six3 Systems
  • Forces and Economics driving Cyber Security Hackers… Potential Victims… Defenders…Hadoop World 2011 cloudera Six3 Systems
  • A Story: Aftershock Widget Corporation Aftershock Widget Corporation PROFILE • Software development firm that develops applications for variety of different platforms A series of bizarre fraudulent charges appeared on 2009 – Credit card theft victimized 11.1M Aftershock Widgets credit cards Americans costing 54B The next generation application that Aftershock has 2009 – Intellectual designed is stolen and hits market six months Property cost the economy 1.2T dollars annually before release by a competitor In a recent poll 94% of During peak ordering season web-traffic grinds to a respondents stated that halt a DDOS was a major concernHadoop World 2011 cloudera Six3 Systems
  • Aftershock Widget Corporation calls for HELP!We are going to bring in some smart people Who brought in more smart people Who brought even more smart people ! Who brought in some off the shelf technology solution, resulting in a MARKETECTURE clear and compelling roadmap for the organization Their solution was butts in seats, big iron and huge integration costs Hadoop World 2011 cloudera Six3 Systems
  • Is there a better way ?Hadoop World 2011 cloudera Six3 Systems
  • Driving tenants of SHERPA:• Cost effective scaling for handling BIGDATA• Brings all data together• Must support all forms of data• Must be question agnostic• Foster sharing and exchanging of analytics/tradecraftHadoop World 2011 cloudera Six3 Systems
  • Assess the environment:• What are my data sources, how much data, how fast?• What are the data formats?• What do I need to know from the collected data?• What do I do with the information?• Who needs the results and what do we do with results ?Hadoop World 2011 cloudera Six3 Systems
  • Enough Talk ! Let’s get Started TODO List 1.) Need some commodity X 5 Hardware, configure network32GB RAM4x300GB 6G SAS 15k HDD8-Core AMD Opteron Processor Model 6128 (2.0GHz, 80W) 2.) Provision them with RHEL X86_64 Server 6.1 3.) Install JDK 1.6u26 4.) Install the Cloudera CDH3U1, Enterprise 3.5.2 5.) Configure HDFS, HBASE,HDFS - 4 Data Nodes/Task Trackers, Name Node ZOOKEEPER, FLUMEHBASE – 1 Master, 4 Region ServersZookeeper – 4 Zookeeper serversHUE - 1 HUE, 4 HUE AgentsFLUME – 1 Master, 2 NodesHadoop World 2011 cloudera Six3 Systems
  • Sources, Sinks and Agents Users Intellectual Protected TODO List Property Data 1.) Identify data sources of potential value 2.) Install FLUME onUser Logging each source from Cloudera Intrusion Corporate Detection Enterprise App Server(s) System(s) 3.) Configure FLUME Agent to tailsink IPS/IDS signature hits a file or directory for each sourceLog data 4.) Test each data source to ensure data is being collected correctly sink using command line (dump) Corporate Firewall Firewall logs $ flume dump text("/cp/10/21/0800/current.log") sink sink 5.) Configure the sink(destination) Flow Data Flow Capture for each FLUME Agent Packet Capture Aftershock Corporate Internet Gateway 1Hadoop World 2011 sink cloudera Six3 Systems
  • The Pieces come Together SHERPA – Analytic Framework SHERPA Components HUE HBASE PIG HIVE Data Sets GEO Enrichment S 30 days 61,764,205 netflows T 30 days 1,065,977 SNORT SQOOP Port Enrichment A T 30 days 4,065,977 Firewall Protocol Enrichment S Packet Data HDFS Firewall Logs ZOOKEEPER Netflow Data Application Server Logging User Logging IDS/IPS Logs sink sink sink sink sink sink User App Netflow Packet Firewall IDS/IPS Logging Server Logging Capture Logs Logs Hadoop World 2011 Logging cloudera Six3 Systems
  • Develop and Deploy Analytics Risk Potential Correlate Perform Index Report Analytic Results Enrichment Flow Characterization Analytic Analytic Runtime Environment Health & Analytic CORE Data Services Job Control SHERPA Status Registry Services SDK SHERPA – Analytic FrameworkHadoop World 2011 cloudera Six3 Systems
  • SHERPASURFING Toolkit • FLUME Sinks, Decorators • HBASE Object Definitions • Multiple forms of Enrichment • SHERPA Developers Guide and Cookbook • Two Sample Analytics • Enterprise Analytic FrameworkHadoop World 2011 cloudera Six3 Systems
  • ConclusionsHadoop World 2011 cloudera Six3 Systems
  • The Wrap-up • The threat is very real, well funded and determined • The problem has an incredible often hidden impact • Apache Hadoop stack provide an effective foundation • SHERPA solution builds on that stack • Provides a framework for Cyber Security AnalyticsHadoop World 2011 cloudera Six3 Systems
  • sherpasurfing@gmail.com QUESTIONS?“Imagination is more important than knowledge. For knowledge is limited to all wenow know and understand, while imagination embraces the entire world, and allthere ever will be to know and understand.”~Albert EinsteinHadoop World 2011 cloudera Six3 Systems