Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi


Published on

Cybersecurity requires an organization to collect data, analyze it, and alert on cyber anomalies in near real-time. This is a challenging endeavor when considering the variety of data sources which need to be collected and analyzed. Everything from application logs, network events, authentications systems, IOT devices, business events, cloud service logs, and more need to be taken into consideration. In addition, multiple data formats need to be transformed and conformed to be understood by both humans and ML/AI algorithms.

To solve this problem, the Aetna Global Security team developed the Unified Data Platform based on Apache NiFi, which allows them to remain agile and adapt to new security threats and the onboarding of new technologies in the Aetna environment. The platform currently has over 60 different data flows with 95% doing real-time ETL and handles over 20 billion events per day. In this session learn from Aetna’s experience building an edge to AI high-speed data pipeline with Apache NiFi.

Published in: Technology
  • Be the first to comment

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

  1. 1. Building the High Speed Cyber Security Data Pipeline Using Apache NiFi Praveen Kanumarlapudi
  2. 2. Cyber Security
  3. 3. 60% of Small Businesses Fold Within 6 Months of a Cyber Attack.
  4. 4. How to make it success ?
  5. 5. Global Security Key Stake Holders Security Operations Center Data Scientists Data Analysts Executives An information security operations center ("ISOC" or "SOC") is a facility where enterprise information systems (websites, applications, databases, data centers and servers, networks, desktops and other endpoints) are monitored, assessed, and defended. Technology : SIEM Security data scientists have the skills to understand complex algorithms and build advanced models for threat and anomaly detection and applying these concepts to real security data sets in single or clustered environments. Technology : Python, R, Big Data, Spark/Scala or MATLAB… Map and trace the data from system to system for solving a given business or incident problem. Design and create data reports using various reporting tools that help business executive to make better decisions. Implements new metrics for business (KPIs) Technology : SQL, SIEM, Big Data, Reporting tools CSO’s, CISO’s
  6. 6. Cyber Security ‘BIG data’ challenges • Speed , Volume and Variety  Data Ingestion  Cleansing  Transformation • data reliance  Executives – KPI Metrics  Data scientists  SOC  Data Analysts • Real-Time context
  7. 7. A couple of years Ago ! Network logs Web logs AD Logs Infrastructure logs Application Logs Threat Intel 3rd Party RG RDBMS unstructured(semi)structured Syslog servers SIEM APP Sqoop PySpark SIEM Tool Data Source Ingestion Integration Delivery Flume UBA Tools SOCDataScienceKPI/Reporting
  8. 8. Challenges • Complexity of Architecture • Debugging • Data Source Dependencies • Lack of Centralized logging • Multiple Data Copies • Stress on Network • Transformations with respect to destination
  9. 9. Solution Framework  Single Data entry point – avoids network traffic and duplicate data flowing around  Transformations according destination – reduces the reliance on source  Should be capable of handling different formats and different sources Ingest Clean/Route Transform for 1 Transform for 2 Route to 1 Route to 2 Archive
  10. 10. Deployment Models Data Sources
  11. 11. Challenges  Good architectural understanding of all systems  Good amount of coding effort  Long development hours  Maintenance overheads  Maintain the sync between the systems  Provenance
  12. 12. • Guaranteed delivery • Processors that supports multiple formats • Ease to develop the flows and deploy in minutes • Open Source and rich community
  13. 13. The Data Gateway Network logs Web logs AD Logs Infrastructure logs Application Logs Threat Intel 3rd Party RG RDBMS unstructured(semi)structured Data Source Data Gateway Delivery SOCDataScienceKPI/Reporting SOC
  14. 14. Sample Flow
  15. 15. To Azure
  16. 16. To Splunk
  17. 17. Grafana dashboard – Last 7 days
  18. 18. Yesterday
  19. 19. In the middle of the day
  20. 20. Metrics  100+ production flows  ~ 20 Billion events  1000+ Transformations
  21. 21. Next ?  MiNiFi  Stateless NiFi  Registry  SAM  Real-Time Model training  CI/CD, NiFi API’s