Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark based Lambda Architecture

1,650 views

Published on

Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark based Lambda Architecture

Published in: Technology
  • Be the first to comment

Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark based Lambda Architecture

  1. 1. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 1 Near real-time network anomaly detection and traffic analysis Pankaj Rastogi Tech Manager Debasish Das Data Scientist
  2. 2. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 2 Agenda • Network data overview • DDoS as network anomaly • Design challenges • Trapezium overview • Results • Q&A
  3. 3. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 3 Network: Aggregated data overview • Network Management Protocol (SNMP)  Network management console  Network devices (routers, bridges, intelligent hubs) • Data collection: Aggregated per router interface • Inbound and outbound traffic statistics sampled at regular interval - Bits per second (bps) - Packets per second (pps) - CPU - Memory SNMP Manager Routers SNMP Protocol SNMP Statistics
  4. 4. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 4 Network: Flow data overview Web browser 192.168.1.10 Web server 10.1.2.3 Request flow #1 TCP connection Response flow #2 • Flow #1 - Source address 192.168.1.10 - Destination address 10.1.2.3 - Source port 1025 - Destination port 80 - Protocol TCP • Flow #2 - Source address 10.1.2.3 - Destination address 192.168.1.10 - Source port 1025 - Destination port 80 - Protocol TCP • A single flow may consist of several packets and many bytes • TCP connections consists of two flows - Each flow will mirror the other - Can use TCP flags to determine the client and the server • ICMP, UDP and other IP protocol streams may contain one or two flows
  5. 5. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 5 DDoS as network anomaly Remote command & control Attacker Bots Router Customer Attacker + Bots + Customer locations Attacker + Bots + Customer IPs Netflow SNMP Customer + Volumetric attack magnitude
  6. 6. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 6 SNMP Anomaly detection on time series Nonparametric models for SNMP DDOS detection
  7. 7. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 7 SNMP Network Analysis on SNMP • Usage of each router/interface • Find routers that have high packets flow
  8. 8. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 8 Anomaly detection on high frequency data Parametric models for NetFlow DDOS detection • Generate customer IP focused features based on DDOS definition NetFlow 0 75,000 150,000 225,000 300,000 0:00 9/14/15 0:27 9/14/15 0:54 9/14/15 1:21 9/14/15 1:48 9/14/15 2:15 9/14/15 2:42 9/14/15 3:09 9/14/15 3:36 time flow
  9. 9. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 9 NetFlow Network Analysis on NetFlow • Find customer with maximum upload bytes • Find customer with maximum download bytes • Find peak usage for given customer
  10. 10. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 10 Why we chose Apache Spark • Good support for machine learning algorithms • Spark’s micro-batching capabilities > Sufficient for our streaming requirements • Vibrant Spark community • Excellent talent availability within our group
  11. 11. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 11 Lessons learned -- Spark • Coalesce partitions when writing to HDFS • Harmless action like take(1) can result in huge costs • Multiple actions on a DataFrame/DStreams result in multiple jobs • Spark DStream checkpointing with RDD models • spark.sql.parquet.compression.codec – snappy • spark.sql.shuffle.partitions – 2000+ when partition block size crosses 2 GB
  12. 12. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 12 Design challenges NFS/GFS Data source? Algorithms? Persistence?
  13. 13. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 13 Design challenges -- SNMP Near Real time model updates needed Lambda architecture • Batch job MUST process data at fixed interval (e.g., 15 min) • Stream job MUST > Handle hot starts (e.g., 90 days of data) > Analyze data and generate anomalies > Updates model every sampling interval > Start from the last model timestamp on restart Coordination between Batch and Stream processes NEEDED • Batch job updates ZooKeeper node at fixed interval (e.g., 15 min) • Stream job uses the same ZooKeeper node to load features
  14. 14. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 14 Design challenges -- NetFlow Seed the model with good parameter estimates • Batch job populates the initial model parameter • Stream job hot-starts with model and detect anomalies • Stream job updates the model and persist it to Cassandra Model maintained in Cassandra • Stream job read the model to Spark partitions from Cassandra • Spark partition updates the model • Spark partition generates anomalies • Models across partition are combined using Spark • Anomalies are persisted to Cassandra Network analysis • Find peak usage for a given customer • Find customer with highest network usage • Find number of distinct source IPs connected to a destination IP
  15. 15. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 15 Network anomaly flow design
  16. 16. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 16 Design challenges – multiple applications
  17. 17. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 17 Trapezium
  18. 18. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 18 What is Trapezium?
  19. 19. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 19 What is Trapezium? • Ability to read data > From multiple data sources, e.g., HDFS, NFS, Kafka > In Batch and Streaming modes to support lambda architecture • Ability to write data > To multiple data sources, e.g., HDFS, NFS, Kafka • Plug and Play architecture > Evaluate multiple algorithms > Evaluate different features of same algorithm • Break down complex analytics problem in Transactions • Build a workflow pipeline combining different Transactions • Validation and filtering of input data • Embedded Zookeeper, Kafka, C*, Hbase, etc available for unit tests • Enable real time query processing capability > Akka HTTP server provides Spark as a Service
  20. 20. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 20 Trapezium architecture TrapeziumD1 D2 D3 O1 O2 O3 Validation D1 V1 V1 O1 D2 O2 D3 O1 VARIOUS TRANSACTIONS
  21. 21. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 21 Workflow hdfsFileBatch = { batchTime = 5 batchInfo = [{ name = "hdfs_source" dataDirectory = {prod = "/prod/data/files"} }] } transactions = [{ transactionName="com.verizon.bda.DataAggregator" inputData=[{ name="hdfs_source" }] persistDataName="aggregatedOutput" },{ transactionName="com.verizon.bda.DataAligner" inputData=[{ name="aggregatedOutput" }] persistDataName="alignedOutput" },{ transactionName="com.verizon.bda.AnomalyFinder" inputData=[{ name="aggregatedOutput” }, { name="alignedOutput” }] persistDataName=”anomalyOutput" }] • Workflow is a collection of transactions in batch or streaming mode • Each transaction can take multiple data sources as input • Output of one transaction can be input to another transaction • Output of each transaction could be persisted or kept only in memory • Single place to handle exceptions and raise failure events
  22. 22. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 22 Transaction Traits
  23. 23. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 23 Transaction Traits
  24. 24. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 24 Support data sources • Trapezium can read data from HDFS, Kafka, NFS, GFS • Config entry for reading data from HDFS/NFS/GFS dataSource="HDFS" dataDirectory = { local="/local/data/files" dev= "/dev/data/files" prod= "/prod/data/files" } • Config entry for defining protocol fileSystemPrefix="hdfs://" fileSystemPrefix="file://" fileSystemPrefix="s3://" • Trapezium can read data in various formats including text, gzip, json, avro and parquet • Config entry for reading from Kafka topics kafkaTopicInfo = { consumerGroup = "KafkaStreamGroup" maxRatePerPartition = 970 batchTime = "5" streamsInfo = [{ name = "queries" topicName = "deviceanalyzer" }] } • Config entry for reading fileFormat fileFormat="avro" fileFormat="json" fileFormat="parquet”
  25. 25. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 25 Run modes • Trapezium supports reading data in batch as well streaming mode • Config entry for reading in batch mode runMode="STREAM" batchTime=5 • Config entry for reading in stream mode runMode="BATCH" batchTime=5 • Read data by timestamp offset=2 • Process historical data in sequence of smaller data sets fileSplit=true • Process same data multiple times oneTime=true
  26. 26. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 26 Data validation • Validates data at the source • Filters out all invalid rows • Validates schema of the input data • Config entry for data validation validation = { columns = ["name", "age", "birthday", "location"] datatypes = ["String", "Int", "Timestamp", "String"] dateFormat = "yyyy-MM-dd HH:mm:ss" delimiter = "|" minimumColumn = 4 rules = { name=[maxLength(30),minLength(1)] age=[maxValue(100),minValue(1)] } }
  27. 27. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 27 Plug and play capability • Any transaction can be added/removed by modifying workflow config file • Output from multiple algorithms can be compared in real time • Multiple features can be evaluated in different transactions • Data sources can be switched with config change • Model training can be done on different time windows to achieve best results
  28. 28. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 28 Trapezium – github url https://github.com/Verizon/trapezium Version: 1.0.0-SNAPSHOT Release: 14-Oct-2016
  29. 29. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 29 Results
  30. 30. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 30 SNMP Spark runtime with Hive/C* read/write Data volume: 10 routers, 2.2 MB per 5 min, 650 MB per day Compute: 10 executors, 4 cores Memory: 16 GB per executor, 4 GB driver With sampling rate of 2 min: • 2 nodes with 20 cores each for 10 routers • 200 nodes for 1000 routers With sampling rate of 4 min: • 2 nodes can process 20 ro uters • 100 nodes for 1000 routers
  31. 31. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 31 SNMP Spark shuffle – read/write Data volume: 10 routers, 2.2 MB per 5 min, 650 MB per day Compute: 10 executors, 4 cores Memory: 16 GB per executor, 4 GB driver
  32. 32. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 32 Data volume: 2 router, 50 MB per min, 70 GB per day Compute: 10 executors, 4 cores Memory: 16 GB per executor, 4 GB driver NetFlow Spark + C* read/write runtime • Due to parametric model, run time is better than SNMP • NetFlow data is X times more than SNMP data 16 18 32 47 94.8 0 25 50 75 100 2 4 8 16 32 Runtime(s) Router
  33. 33. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 33 NetFlow Spark + C* shuffle write Shuffle (MB) 2 4 8 16 32 Spark 71.2 150.5 275.7 612.1 1261.4 Cassandra 30.2 64.4 115.6 263.7 545.1 0. 350. 700. 1050. 1400. 2 4 8 16 32 Shuffle(MB) Router Spark Cassandra
  34. 34. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 34 Summary • Reuse code across multiple applications • Improve developer efficiency • Encourage standard coding practices • Provide unit-test framework for better code coverage • Decouple ETL, analytics and algorithms in different Transactions • Distribute query processing using Spark as a service • Easy integration provided by configuration driven architecture
  35. 35. © Verizon 2016 All Rights Reserved Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 35 Thank you

×