November 2013 HUG: Cyber Security with Hadoop


Published on

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Narus is a wholly-owned subsidiary of The Boeing CompanyWe are a software company, focused in cybersecurity and research and development. We are based in Silicon Valley.In the past few years, we’ve all seen how the web has changed from being static (what we call 1.0), serving up pages, to becoming the semantic web, where data has meaning and context. We are seeing a parallel evolution in approaches to cybersecurity. Narus is pioneering this new approach, termed Cyber 3.0, to address cybersecurity and risk management. We will discuss Cyber 3.0 in more detail later.One of the key elements of Cyber 3.0 is machine learning which speeds up and lowers the cost of problem solving. Narus’ solution incorporates big data analytics to handle huge volume of diverse set of information collected from a wide range of devices and incorporates sentiment analysis – to truly understand context and what is going on in your network and what the users are doing.
  • While there are lots of notes on this slide – the goal is to provide examples for the customer, talk about how in a consumer world, you start by reading content (for example manuals posted on web sites) to where now based on one choice you have made, other recommendations are given to you (shopping recommendations on Amazon, for example) and how with more context (like your age, your birthday etc.,) those choices can be refined further – that is what constitutes the semantic web and Intelligent Cyber. Let’s start the journey with Web 1.0, the Static Web. Remember during the early days, we had read-only content and static HTML websites. Information was served to us. In the areas of cyber, where we conduct business over the Internet, we start with Cyber 1.0, what we call “Siloed Cyber”. During this phase, we had large volumes of data, not as much as we see today, but the information tend to be homogeneous (e.g. corporate data with credit card info, customer info). The data tend to be “siloed”, collected and managed by individual organizations (e.g. sales, vs marketing vs support). Data is analyzed on demand. We had limited numbers of applications and protocols. We had powerpoint, email, excel, word. Resources resided mostly with IT and not with the organizations that are trying to accomplish specific missions. We relied heavily on human analysts to analyze data and bring meaning and context to the data. We then enter the phase of Web 2.0, or the Social Web, with the introduction of Facebook, Twitter, LinkedIn, Instagram, where users and communities now generate content. We have the read-write web. Posting content on the web is not limited to the few authorized personnel and is not controlled by IT. This led to Cyber 2.0, the “Integrated Cyber”. There is now higher volumes of data on faster networks to be dealt with. With the proliferation of devices, we are also seeing a huge growth of applications and protocols. With Facebook, Twitter and the other social networking applications, people are now connecting with each other and sharing content. With increased complexity, we are still mostly relying on the highly-skilled but scarce analysts to look at the data, extract information and bring context to it. With the increased variety of interactions (applications, devices, people), there is looser control, cyber criminals are taking advantage of the situation and are unleashing new cyber threats.We are now entering the era of “Semantic Web” – we are adding “context” to data and the internet traffic based on superior understanding of “relationships” within the data such as who created the information, who receive the information; when was the information created, where did it come from, where did it go. With changes brought on by “Web 3.0”, conducting business in the cyber world just got even more complex. This leads to the challenges we face today “Cyber 3.0” – “Intelligent Cyber”. Besides high volume of data and traveling at high velocity, we also have to deal with the variety of data available (not just the Microsoft Suite, we also have data generated from social networking, mobile applications. People are now hyperconnected and hyper interactive. Just think about how many devices you use today and how often you interact with your applications. This has led to the need for organizations to automate alignment of resources and missions, placing resources outside of IT where it’s needed. With the added complexity, automating processes and reliance on machine learning to gain more intelligence and context of communications become prevalent. Let’s look at some of these concepts in more detail.
  • In addition to the changing dynamic of how content is generated and consumed, there are other dynamics that have changed the very nature of networks and how we address cyber security.In 2012, daily traffic volume was 10+ petabytes per day. By 205, it’s going to double to 20.33 petabytes per day. Now, on average, each person uses 2.5 devices and by 2016, we will be generating 19 billion connections by 2016.On the Velocity front – the rate at which information is flowing through the network (for example how quickly you can download a movie from Netflix) we are seeing the adoption of fixed line speeds growing from 1 Gb to 100 Gb.And the variety of devices, applications, operating systems, network topologies has a significant impact on cyber security.Did you know that there is more than 1.5 million applications in the Android and Apple Stores?Types of data we have to discover, analyze and track has grown. In the past, enterprises mostly dealt with corporate data which are mostly text and numbers stored in relational data bases in nice rows and columns, we now have to add in voice and video data; more protocols and network types (local, cloud, virtualized and hybrid).
  • Today we are addressing new problems using old approaches and tools. VisibilityOrganizations are having a very difficult time keeping pace with the unprecedented and unpredictable emergence of new applications, protocols, hosts and users on their networks. It’s like operating in the dark. ContextTo investigate the impact and root-cause of cyber threats and security breaches requires the understanding of the “context” of the communications. We traditionally relied on highly-skilled, scarce and highly-paid analysts to digest the overwhelming amount of data. Besides being costly, it also took a lot of time, while a cyber attack could be causing havoc in your network. ControlIn recent headlines, we read about theft of intellectual property, attack on networks, financial theft over the internet. Organizations are losing the security battle despite major investments in network security and information security. The static approach is not sufficient to solve today’s dynamic problems. We all have lots of tools however, we still can’t enforce the more granular policies that match the mission and allow tighter control over networks, which ensures that the resources are aligned with your mission or business goals.
  • In order to analyze the data that is streaming through the network, Narus has developed a methodology to disaggregate unique dimensions, so they can be processed and managed independently. The value of deriving these dimensions is in the analytics, where we fuse these dimensions together to paint a complete picture and provide the complete visibility that security organizations need.The unique dimensions are categorized into 3 different "planes."The network plane consists of information about devices (brand, type and operating system) and hosts (client, server, applications, protocols and services). The semantic plane consists of content, topics, trends, communities and locations. The user plane consists of presence, profiles, identities, associations and relationships related to users. Narus solutions automate the understanding of each of these planes, identify the context of the interactions, and aggregate data across the planes to deliver incisive intelligence.
  • Better visibility and control over your network, allowing you to proactively defend against new threats with machine-based intelligence for certainty in a complex world
  • November 2013 HUG: Cyber Security with Hadoop

    1. 1. Cyber Security Analytics & Big Data Padmanabh Dabke, PhD VP, Analytics & Visualization Narus Inc.
    2. 2. Agenda • Company overview • Narus Technology • Key challenges & solutions • Summary Narus Confidential, © 2013 Narus, Inc. 2
    3. 3. Company Overview Wholly-owned subsidiary of The Boeing Company • Cybersecurity & R&D software company based in Silicon Valley Sunnyvale, CA Innovative technology protected by a broad IP portfolio • Focused on fusing semantic and data planes, applying it to cybersecurity and risk management • Making sense of physical, content, and social networks Established customer and partner base 3
    4. 4. Journey to Cyber 3.0 Semantic Web & Cyber Intersect Adds “Context” to data/Internet traffic based on a superior understanding of relationships within data Social Web User/Community-generated content and the read-write web Static Web Primarily read-only content and static HTML websites Siloed Cyber • • • • • Web 3.0 Web 2.0 Web 1.0 Cyber 1.0 Semantic Web Cyber 2.0 Integrated Cyber • • • • • Cyber 3.0 Intelligent Cyber • • • Voluminous high velocity data • Growth of applications and protocols People connecting with each other & content • Human approaches to extract & contextualize Variety of interactions driving looser control (new threats) High volume, velocity and variety of data Explosion in applications and protocols Hyper connected people & content, interactivity between, people machines-machines-people Automated alignment of resources & missions Machine learning for intelligence & context Voluminous, homogenous information Siloed, on demand non-interactive content Limited number of applications and protocols Resources and missions not fully aligned Manual contextualization of data Narus Confidential, © 2013 Narus, Inc. 4
    5. 5. Changing Landscape Volume, Velocity & Variety • ~1.5Million Applications in Android & Apple Store • Types of data: growing media (data, voice, video), protocols, network types (local, cloud, virtualized, hybrid) • Fixed Line Speeds Growing from 10Gbps (2012) 100Gbps (2015) • Daily traffic volume 20.33 PB/Day (2015), 10+ PB/Day (2012) • 2.5 devices / person, & 19 billion connections (2016) Network Bandwidth in Gb 1 10 Narus Confidential, © 2013 Narus, Inc. 100 5
    6. 6. Visibility, Context & Control: Key to Enhance Cybersecurity & Protect Assets • Need for continuous visibility into every dimension (hosts, users, etc.) Visibility Context • Impact & root-cause is manual, requires highly-skilled & paid analysts to digest overwhelming amounts of data Control • More efficient spending, faster resolution, dynamic approach to solve a dynamic problem • Lots of tools, but policies not aligned with mission to allow tighter control Narus Confidential, © 2013 Narus, Inc. 6
    7. 7. Narus’ Innovative Technology An Integrated perspective 7
    8. 8. Narus nSystem Comprehensive & Adaptive Analytics To Enhance Cybersecurity and Protect Critical Assets with Machine Learning nAnalytics • • Single UI with interactive dashboards offer multidimensional views of cyber activity ‒ Network, Semantic & User Analytics ‒ Targeted Session Captures Advanced analytics for automated data fusion with machine learning nProcessing • • • • Centralized scalable data processing & storage framework Automated ability to deal with petabytes of data Support for streaming, query-based and big-data analytics Machine learning applied to large volumes of data nCapture • • • • Architected for distribution at multiple sites & links ‒ 100% of packets examined, metadata with necessary session fidelity Plugins to assimilate data from heterogeneous sources Precision targeted full packet capture Support for 20G (duplex 10G) per-link, path to 100G 8
    9. 9. Narus Analytics Framework Real Time (< 5 sec latency) Protocol Vector Creator Real Time Visualization In-Memory Analytics Real Time Analytics • Close Enough (5 min latency) Hadoop Map Reduce Jobs HBase RDBMS • • • • Volumetric & Topical Trends Anomaly Detection Classification Clustering Summarization Ad-Hoc/Sliding Window Analytics Data-At-Rest Analytics On Demand • • • Long term trends Opportunistic Correlations Model Training ETL Confidential / For Internal Use Only / © 2013 Narus, Inc. 9
    10. 10. Machine Learning for Cyber Security • Automated Signature Generation – Protocol Identification – Parser Generation – Mobile App Detection – App Categorization • Text Analytics – Topic Detection – Sentiment Analysis • Anomaly Detection – Baseline profile generation – Alerts & workflow – Malicious Application Detection 10
    11. 11. Key Challenges • Increasing network traffic – Line speeds from 20 Gbps to 600 Gbps and above – 210 TB to 6.3 PB Per Day • Diversity of deployments – Data rates, vertical application areas, SLA, price points: everything is a variable • Operational issues – Datacenter connectivity – Burstiness of network traffic • Data Security 11
    12. 12. Lessons Learned/Solutions • Extract and store all metadata and provide full packets as identified by the analyst – 90% reduction in data volume • Use domain knowledge for message compression – Short codes for enumerated values (mobile apps, protocols, etc.) – Session associations to eliminate referential fields • Hbase over HDFS – provides abstractions useful for modelling dynamic schema • Off load CPU work to special purpose co-processors to accelerate performance 12
    13. 13. Lessons Learned/Solutions • Relational databases are not evil – Believe it or not, relational algebra is quite powerful – We use it for fast, in-memory computations in combination with Java code for processing rule sets • SQL interfaces on HDFS/Hbase are catching up Analytics Data Store Performance mySQL Cluster Avg. Query Processing Time 7 6 5 Impala 4 3 2 Big SQL 1 0 10 20 Big SQL Impala Database Size 50 mySQL Cluster 13
    14. 14. Business Considerations • Optimizing Total Cost of Ownership (TCO) – System acquisition – Data center costs – Administration and maintenance • Analytics development and skillset required • Global support 14
    15. 15. Data Warehousing Vs Hadoop Source: “Big Data: What does it really cost?” By Winter Corp 15
    16. 16. Summary • We blend network, semantic, and user-oriented views to create unique insights – Data Loss Prevention – Threat Detection – Network pattern mining • Real Time & At-Rest Analytics – Stateless analysis and short term trends and classification – At-Rest analysis for training models, opportunistic correlations, and mega-trends • Hybrid Approach – Hadoop/Hbase for horizontal scaling and cost-effective storage and processing of massive data sets – Relational databases for creating efficient business intelligence views 16