SlideShare a Scribd company logo
Building the High Speed Cyber Security
Data Pipeline Using Apache NiFi
Praveen Kanumarlapudi
Cyber Security
60% of Small
Businesses Fold Within
6 Months of a Cyber
Attack.
How to make it
success ?
Global Security Key Stake Holders
Security Operations Center Data Scientists Data Analysts Executives
An information security
operations center
("ISOC" or "SOC") is a
facility where
enterprise information
systems (websites,
applications, databases,
data centers and
servers, networks,
desktops and other
endpoints) are
monitored, assessed,
and defended.
Technology : SIEM
Security data scientists
have the skills to
understand complex
algorithms and build
advanced models for
threat and anomaly
detection and applying
these concepts to real
security data sets in
single or clustered
environments.
Technology : Python, R,
Big Data, Spark/Scala or
MATLAB…
Map and trace the data
from system to system
for solving a given
business or incident
problem.
Design and create data
reports using various
reporting tools that
help business executive
to make better
decisions.
Implements new
metrics for business
(KPIs)
Technology : SQL, SIEM,
Big Data, Reporting
tools
CSO’s,
CISO’s
Cyber Security ‘BIG data’ challenges
• Speed , Volume and Variety
 Data Ingestion
 Cleansing
 Transformation
• data reliance
 Executives – KPI Metrics
 Data scientists
 SOC
 Data Analysts
• Real-Time context
A couple of years Ago !
Network logs
Web logs
AD Logs
Infrastructure
logs
Application
Logs
Threat Intel
3rd Party RG
RDBMS
unstructured(semi)structured
Syslog
servers
SIEM APP
Sqoop
PySpark
SIEM Tool
Data Source Ingestion Integration Delivery
Flume
UBA Tools
SOCDataScienceKPI/Reporting
Challenges
• Complexity of Architecture
• Debugging
• Data Source Dependencies
• Lack of Centralized logging
• Multiple Data Copies
• Stress on Network
• Transformations with respect to destination
Solution Framework
 Single Data entry point – avoids network traffic and
duplicate data flowing around
 Transformations according destination – reduces the
reliance on source
 Should be capable of handling different formats and
different sources
Ingest Clean/Route
Transform for
1
Transform for
2
Route to 1
Route to 2
Archive
Deployment Models
Data Sources
Challenges
 Good architectural understanding of all
systems
 Good amount of coding effort
 Long development hours
 Maintenance overheads
 Maintain the sync between the systems
 Provenance
• Guaranteed delivery
• Processors that supports multiple
formats
• Ease to develop the flows and
deploy in minutes
• Open Source and rich community
The Data Gateway
Network logs
Web logs
AD Logs
Infrastructure
logs
Application
Logs
Threat Intel
3rd Party RG
RDBMS
unstructured(semi)structured
Data Source Data Gateway Delivery
SOCDataScienceKPI/Reporting
SOC
Sample Flow
To Azure
To Splunk
Grafana dashboard – Last 7 days
Yesterday
In the middle of the day
Metrics
 100+ production flows
 ~ 20 Billion events
 1000+ Transformations
Next ?
 MiNiFi
 Stateless NiFi
 Registry
 SAM
 Real-Time Model training
 CI/CD, NiFi API’s
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

More Related Content

What's hot

Microsoft Defender and Azure Sentinel
Microsoft Defender and Azure SentinelMicrosoft Defender and Azure Sentinel
Microsoft Defender and Azure Sentinel
David J Rosenthal
 
Zero trust Architecture
Zero trust Architecture Zero trust Architecture
Zero trust Architecture
AddWeb Solution Pvt. Ltd.
 
Splunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go KölnSplunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go Köln
Splunk
 
Cloud-Enabled: The Future of Endpoint Security
Cloud-Enabled: The Future of Endpoint SecurityCloud-Enabled: The Future of Endpoint Security
Cloud-Enabled: The Future of Endpoint Security
CrowdStrike
 
Microsoft 365 Security and Compliance
Microsoft 365 Security and ComplianceMicrosoft 365 Security and Compliance
Microsoft 365 Security and Compliance
David J Rosenthal
 
Make Your SOC Work Smarter, Not Harder
Make Your SOC Work Smarter, Not HarderMake Your SOC Work Smarter, Not Harder
Make Your SOC Work Smarter, Not Harder
Splunk
 
Proactive Threat Hunting: Game-Changing Endpoint Protection Beyond Alerting
Proactive Threat Hunting: Game-Changing Endpoint Protection Beyond AlertingProactive Threat Hunting: Game-Changing Endpoint Protection Beyond Alerting
Proactive Threat Hunting: Game-Changing Endpoint Protection Beyond Alerting
CrowdStrike
 
introduction to Azure Sentinel
introduction to Azure Sentinelintroduction to Azure Sentinel
introduction to Azure Sentinel
Robert Crane
 
SOC and SIEM.pptx
SOC and SIEM.pptxSOC and SIEM.pptx
SOC and SIEM.pptx
SandeshUprety4
 
Tenable Solutions for Enterprise Cloud Security
Tenable Solutions for Enterprise Cloud SecurityTenable Solutions for Enterprise Cloud Security
Tenable Solutions for Enterprise Cloud Security
MarketingArrowECS_CZ
 
SOC Architecture - Building the NextGen SOC
SOC Architecture - Building the NextGen SOCSOC Architecture - Building the NextGen SOC
SOC Architecture - Building the NextGen SOC
Priyanka Aash
 
Cloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-PremiseCloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-Premise
Araf Karsh Hamid
 
Dragos S4x20: How to Build an OT Security Operations Center
Dragos S4x20: How to Build an OT Security Operations CenterDragos S4x20: How to Build an OT Security Operations Center
Dragos S4x20: How to Build an OT Security Operations Center
Dragos, Inc.
 
Next-Gen security operation center
Next-Gen security operation centerNext-Gen security operation center
Next-Gen security operation center
Muhammad Sahputra
 
Zero Trust
Zero TrustZero Trust
Zero Trust
Boaz Shunami
 
Modern Enterprise integration Strategies
Modern Enterprise integration StrategiesModern Enterprise integration Strategies
Modern Enterprise integration Strategies
Jesus Rodriguez
 
CLOUD NATIVE SECURITY
CLOUD NATIVE SECURITYCLOUD NATIVE SECURITY
CLOUD NATIVE SECURITY
Maganathin Veeraragaloo
 
Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)
Sqrrl
 
[Round table] zeroing in on zero trust architecture
[Round table] zeroing in on zero trust architecture[Round table] zeroing in on zero trust architecture
[Round table] zeroing in on zero trust architecture
Denise Bailey
 
Threat Hunting
Threat HuntingThreat Hunting
Threat Hunting
Splunk
 

What's hot (20)

Microsoft Defender and Azure Sentinel
Microsoft Defender and Azure SentinelMicrosoft Defender and Azure Sentinel
Microsoft Defender and Azure Sentinel
 
Zero trust Architecture
Zero trust Architecture Zero trust Architecture
Zero trust Architecture
 
Splunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go KölnSplunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go Köln
 
Cloud-Enabled: The Future of Endpoint Security
Cloud-Enabled: The Future of Endpoint SecurityCloud-Enabled: The Future of Endpoint Security
Cloud-Enabled: The Future of Endpoint Security
 
Microsoft 365 Security and Compliance
Microsoft 365 Security and ComplianceMicrosoft 365 Security and Compliance
Microsoft 365 Security and Compliance
 
Make Your SOC Work Smarter, Not Harder
Make Your SOC Work Smarter, Not HarderMake Your SOC Work Smarter, Not Harder
Make Your SOC Work Smarter, Not Harder
 
Proactive Threat Hunting: Game-Changing Endpoint Protection Beyond Alerting
Proactive Threat Hunting: Game-Changing Endpoint Protection Beyond AlertingProactive Threat Hunting: Game-Changing Endpoint Protection Beyond Alerting
Proactive Threat Hunting: Game-Changing Endpoint Protection Beyond Alerting
 
introduction to Azure Sentinel
introduction to Azure Sentinelintroduction to Azure Sentinel
introduction to Azure Sentinel
 
SOC and SIEM.pptx
SOC and SIEM.pptxSOC and SIEM.pptx
SOC and SIEM.pptx
 
Tenable Solutions for Enterprise Cloud Security
Tenable Solutions for Enterprise Cloud SecurityTenable Solutions for Enterprise Cloud Security
Tenable Solutions for Enterprise Cloud Security
 
SOC Architecture - Building the NextGen SOC
SOC Architecture - Building the NextGen SOCSOC Architecture - Building the NextGen SOC
SOC Architecture - Building the NextGen SOC
 
Cloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-PremiseCloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-Premise
 
Dragos S4x20: How to Build an OT Security Operations Center
Dragos S4x20: How to Build an OT Security Operations CenterDragos S4x20: How to Build an OT Security Operations Center
Dragos S4x20: How to Build an OT Security Operations Center
 
Next-Gen security operation center
Next-Gen security operation centerNext-Gen security operation center
Next-Gen security operation center
 
Zero Trust
Zero TrustZero Trust
Zero Trust
 
Modern Enterprise integration Strategies
Modern Enterprise integration StrategiesModern Enterprise integration Strategies
Modern Enterprise integration Strategies
 
CLOUD NATIVE SECURITY
CLOUD NATIVE SECURITYCLOUD NATIVE SECURITY
CLOUD NATIVE SECURITY
 
Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)
 
[Round table] zeroing in on zero trust architecture
[Round table] zeroing in on zero trust architecture[Round table] zeroing in on zero trust architecture
[Round table] zeroing in on zero trust architecture
 
Threat Hunting
Threat HuntingThreat Hunting
Threat Hunting
 

Similar to Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
Amazon Web Services
 
On the Application of AI for Failure Management: Problems, Solutions and Algo...
On the Application of AI for Failure Management: Problems, Solutions and Algo...On the Application of AI for Failure Management: Problems, Solutions and Algo...
On the Application of AI for Failure Management: Problems, Solutions and Algo...
Jorge Cardoso
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Paige_Roberts
 
Distributed Data Processing for Real-time Applications
Distributed Data Processing for Real-time ApplicationsDistributed Data Processing for Real-time Applications
Distributed Data Processing for Real-time Applications
ScyllaDB
 
inmation Presentation_2017
inmation Presentation_2017inmation Presentation_2017
inmation Presentation_2017
inmation Software GmbH
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
Denodo
 
Real-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo LogicReal-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo Logic
Amazon Web Services
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
Sri Ambati
 
Preparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity RenaissancePreparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity Renaissance
Cloudera, Inc.
 
Emerging IT Trends and Innovation Concepts.pptx
Emerging IT Trends and Innovation Concepts.pptxEmerging IT Trends and Innovation Concepts.pptx
Emerging IT Trends and Innovation Concepts.pptx
Roshni814224
 
Operating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentOperating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environment
DataWorks Summit
 
Security Delivery Platform: Best practices
Security Delivery Platform: Best practicesSecurity Delivery Platform: Best practices
Security Delivery Platform: Best practices
Mihajlo Prerad
 
inmation Presentation
inmation Presentationinmation Presentation
inmation Presentation
inmation Software GmbH
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
Cloudera, Inc.
 
Splunk in the Cisco Unified Computing System (UCS)
Splunk in the Cisco Unified Computing System (UCS) Splunk in the Cisco Unified Computing System (UCS)
Splunk in the Cisco Unified Computing System (UCS)
Splunk
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
Denodo
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Precisely
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
DATAVERSITY
 
Old Dogs, New Tricks: Big Data from and for Mainframe IT
Old Dogs, New Tricks: Big Data from and for Mainframe ITOld Dogs, New Tricks: Big Data from and for Mainframe IT
Old Dogs, New Tricks: Big Data from and for Mainframe IT
Precisely
 
Webinar: Improve Splunk Analytics and Automate Processes with SnapLogic
Webinar: Improve Splunk Analytics and Automate Processes with SnapLogicWebinar: Improve Splunk Analytics and Automate Processes with SnapLogic
Webinar: Improve Splunk Analytics and Automate Processes with SnapLogic
SnapLogic
 

Similar to Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi (20)

(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
 
On the Application of AI for Failure Management: Problems, Solutions and Algo...
On the Application of AI for Failure Management: Problems, Solutions and Algo...On the Application of AI for Failure Management: Problems, Solutions and Algo...
On the Application of AI for Failure Management: Problems, Solutions and Algo...
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
 
Distributed Data Processing for Real-time Applications
Distributed Data Processing for Real-time ApplicationsDistributed Data Processing for Real-time Applications
Distributed Data Processing for Real-time Applications
 
inmation Presentation_2017
inmation Presentation_2017inmation Presentation_2017
inmation Presentation_2017
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Real-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo LogicReal-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo Logic
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Preparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity RenaissancePreparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity Renaissance
 
Emerging IT Trends and Innovation Concepts.pptx
Emerging IT Trends and Innovation Concepts.pptxEmerging IT Trends and Innovation Concepts.pptx
Emerging IT Trends and Innovation Concepts.pptx
 
Operating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentOperating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environment
 
Security Delivery Platform: Best practices
Security Delivery Platform: Best practicesSecurity Delivery Platform: Best practices
Security Delivery Platform: Best practices
 
inmation Presentation
inmation Presentationinmation Presentation
inmation Presentation
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
 
Splunk in the Cisco Unified Computing System (UCS)
Splunk in the Cisco Unified Computing System (UCS) Splunk in the Cisco Unified Computing System (UCS)
Splunk in the Cisco Unified Computing System (UCS)
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Old Dogs, New Tricks: Big Data from and for Mainframe IT
Old Dogs, New Tricks: Big Data from and for Mainframe ITOld Dogs, New Tricks: Big Data from and for Mainframe IT
Old Dogs, New Tricks: Big Data from and for Mainframe IT
 
Webinar: Improve Splunk Analytics and Automate Processes with SnapLogic
Webinar: Improve Splunk Analytics and Automate Processes with SnapLogicWebinar: Improve Splunk Analytics and Automate Processes with SnapLogic
Webinar: Improve Splunk Analytics and Automate Processes with SnapLogic
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 

Recently uploaded

"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

  • 1. Building the High Speed Cyber Security Data Pipeline Using Apache NiFi Praveen Kanumarlapudi
  • 3. 60% of Small Businesses Fold Within 6 Months of a Cyber Attack.
  • 4. How to make it success ?
  • 5. Global Security Key Stake Holders Security Operations Center Data Scientists Data Analysts Executives An information security operations center ("ISOC" or "SOC") is a facility where enterprise information systems (websites, applications, databases, data centers and servers, networks, desktops and other endpoints) are monitored, assessed, and defended. Technology : SIEM Security data scientists have the skills to understand complex algorithms and build advanced models for threat and anomaly detection and applying these concepts to real security data sets in single or clustered environments. Technology : Python, R, Big Data, Spark/Scala or MATLAB… Map and trace the data from system to system for solving a given business or incident problem. Design and create data reports using various reporting tools that help business executive to make better decisions. Implements new metrics for business (KPIs) Technology : SQL, SIEM, Big Data, Reporting tools CSO’s, CISO’s
  • 6. Cyber Security ‘BIG data’ challenges • Speed , Volume and Variety  Data Ingestion  Cleansing  Transformation • data reliance  Executives – KPI Metrics  Data scientists  SOC  Data Analysts • Real-Time context
  • 7. A couple of years Ago ! Network logs Web logs AD Logs Infrastructure logs Application Logs Threat Intel 3rd Party RG RDBMS unstructured(semi)structured Syslog servers SIEM APP Sqoop PySpark SIEM Tool Data Source Ingestion Integration Delivery Flume UBA Tools SOCDataScienceKPI/Reporting
  • 8. Challenges • Complexity of Architecture • Debugging • Data Source Dependencies • Lack of Centralized logging • Multiple Data Copies • Stress on Network • Transformations with respect to destination
  • 9. Solution Framework  Single Data entry point – avoids network traffic and duplicate data flowing around  Transformations according destination – reduces the reliance on source  Should be capable of handling different formats and different sources Ingest Clean/Route Transform for 1 Transform for 2 Route to 1 Route to 2 Archive
  • 11. Challenges  Good architectural understanding of all systems  Good amount of coding effort  Long development hours  Maintenance overheads  Maintain the sync between the systems  Provenance
  • 12. • Guaranteed delivery • Processors that supports multiple formats • Ease to develop the flows and deploy in minutes • Open Source and rich community
  • 13. The Data Gateway Network logs Web logs AD Logs Infrastructure logs Application Logs Threat Intel 3rd Party RG RDBMS unstructured(semi)structured Data Source Data Gateway Delivery SOCDataScienceKPI/Reporting SOC
  • 17. Grafana dashboard – Last 7 days
  • 19. In the middle of the day
  • 20. Metrics  100+ production flows  ~ 20 Billion events  1000+ Transformations
  • 21. Next ?  MiNiFi  Stateless NiFi  Registry  SAM  Real-Time Model training  CI/CD, NiFi API’s

Editor's Notes

  1. 60 Percent of Small Businesses went out of business in just 6 months after cyber attacks