SlideShare a Scribd company logo
1 of 25
Hadoop Summit 2015
CloudFire Analytics: Transforming Security
Analyzing Symantec’s security data lake
Stephen Brodsky and Darrell Kienzle
Hadoop Summit 2015
Outline
CPE CloudFire: Analytics and Products1
Analytics Services and Data2
Analytics Administration and Monitoring3
Self-Service Analytics and Dynamic Clusters4
Symantec CloudFire Analytics 2
Hadoop Summit 2015
CPE CloudFire: Analytics and Products
• CPE – Cloud Platform Engineering
– Symantec’s Private Cloud for security products and analytics
– Spans 50+ data centers around the world
• CloudFire – CPE’s scalable cloud platform
– New data centers, new hardware, scalable build-out
– All open source
– OpenStack for virtualization
– Analytics for big data analysis
• Cloud for bringing together and integrating Symantec’s:
– Products
– Big Data, Analytic applications, and Services
– Compute, Network
Symantec CloudFire Analytics 3
Hadoop Summit 2015
Analytics for supporting Security Products
Goals of the Security Teams
1. Do we have the data?
2. Can we analyze the
data?
3. Can we provide timely
Insights?
CloudFire Analytics for supporting Security
1. Data available
• All frequently used Security Data available
• Data at scale: Data available for parallel analysis
(PB scale)
• Leveraging CPE Data Center Availability,
Compute, Net
2. Analysis engines
• Hadoop ecosystem engines (MR, Hive, Kafka,
Spark, HBase, Storm, Phoenix, ++)
• Analytics Pipeline
3. Analysis in near real-time
• Analytics timescale days/hours -> seconds
• Batch -> Streaming
4Symantec CloudFire Analytics
CPE Analytics Architecture Overview
Inbound Messaging
(Data import, Kafka)
Products
Distributed Storage (HDFS), Metal or Virtual (OpenStack) Servers
Analytic Applications, Workload Management (YARN)
Stream Processing
(Storm, Spark)
Real-time Results
(HBase, ElasticSearch)
Query
(Hive, Spark SQL)
Device
Agents
Telemetry, Data
Threats: Top Web-based
Attacks
192 web-based attacks recorded last month
36%
30%
14%
10%
7%3% Malvertisement
Exploit Kit
Suspicious
Download
Analytics Clusters
• All major open
source engines
• PB-scale Data Store
• Key Telemetry
• Multi-Data Center
• Administration
• Monitoring
• Prod, Dev/Test
• Application
Deployments
CloudFire Analytics
Data Transfer
5
Hadoop Summit 2015
A key security Product
Symantec CloudFire Analytics 6
• Live with hundreds of external customers
• Improved time to analyzed results from hours to seconds (5000x) at production
scale
• Leverage Streaming Analytics
• Move analytics to more powerful, high performance open source technologies
– Kafka for queuing
– Storm for analytics
• Improve analytics
– Server reputation
– URL reputation
• Moving forward
– Leverage Analytics Pipeline
– NoSQL DBs
– Graph DBs
HDFSMonitors
Hadoop Summit 2015
Analytics Services and Data
Symantec CloudFire Analytics 7
Hadoop Summit 2015
Product Journey with CloudFire Analytics
Symantec CloudFire Analytics 8
SDAP current
• Scale?
• Availability?
• Usage?
• Hours/Days
timeframe
Batch Analytics
•Data Scale: Multi-PB
HDFS
•Analytics scale: Hadoop
MR, Hive
•Availability – high
•Usage prioritization
•Minutes/Hours
timeframe
Streaming Analytics
•Analytics Scale: Kafka,
Storm, Spark,
ElasticSearch, HBase, ++
•Availability – high
•Usage prioritization
•Seconds timeframe
Application style
• Bring in data
• Analyze
• Architecture: Serial->Parallel
• Workflow oriented
Application style
• Bring in data in zip
files
• Unzip, load, SQL
• Architecture: Serial
• Application oriented
Application style
• Stream in data directly
• Analytics pipeline
• Architecture: Streaming
Parallel
• Streaming oriented
CloudFire Analytics Pipeline
Data Sources
Streaming
Analysis
Batch
Analysis
Actively
Available
Results
Online Access
Research
Telemetry Clients
World
Cloud
Data
Assemble Analytic Applications
Analytic Pipeline Message Queue
9
Storm, Kafka, Spark Hive, Spark
HBase, HDFS, Kafka, Cassandra
HBase, Index, Cache
Hadoop Summit 2015
CloudFire Analytics Services
Symantec CloudFire Analytics 10
Analytic Services and Data Transfer
• Hadoop, YARN - Compute
• HDFS/Knox/HttpFS – File Storage
• Hive, Pig – SQL Batch
• Oozie – Workflow and Scheduling
• Service Endpoints – Admin / BDSE
• Storm – Streaming
• Kafka - Messaging
• HBase – BigTable column store
• Spark (SQL,ML) – Stream analytics
• Research and admin CLI VMs
• MTS Parallel file Import / Export
• Falcon data management
• ElasticSearch indexing
• Cassandra, Graph, Drill, Solr
Web UIs
Admin Tools
• Ambari - Admin
• OpsView (Zabbix) – Uptime monitor
• YARN and HDFS UIs – Perf & Logs
• Ganglia & Nagios – Performance
• Storm & Kafka lag – Perf/Problems
• LMM – Problem determination & metrics
• Resource Manager, Name Node UIs
• Network data transfer monitoring
• Puma performance usage analysis
• Continuous Validation
• Ranger user management
Data Science Tools
• HUE – SQL, Jobs, Files, Workflow
• Ambari Views
Hadoop Summit 2015
Analytics Administration and Monitoring
Symantec CloudFire Analytics 11
Hadoop Summit 2015
Ambari Analytics Administration
12CPE CloudFire Analytics
Extensible administration platform for automated
deployment, monitoring and alerting, configuration
management, rolling upgrades, and rolling restarts.
Hadoop Summit 2015
Ambari + ElasticSearch – Ambari Stack extensibility
Symantec CloudFire Analytics 13
Ambari custom
service for
ElasticSearch, a new
service to enable
Ambari to manage.
Ambari deploy,
start/stop, config,
monitor
Uses the new
Ambari Views,
integrated via
iFrame
Elastic HQ admin UI
as our use case.
Keeps the
dashboard clean,
yet extensible.
On github to share…
Hadoop Summit 2015
Ambari Views with iFrame – Ambari extensibility
Symantec CloudFire Analytics 14
Hadoop Summit 2015
Cluster Usage Analysis using Analytics
Symantec CloudFire Analytics
15
Hadoop Summit 2015
Ambari Metrics
Rapid Metrics Dashboards Construction - Reliability
Symantec CloudFire Analytics 16
Prototype for
building out
flexible
dashboards of
most
interesting
application
and platform
metrics.
Alerting
available via
the LMM
service.
Hadoop Summit 2015
Ambari Metrics
Rapid Metrics Dashboards Construction - Availability
Symantec CloudFire Analytics 17
Hadoop Summit 2015
OpsView / Nagios monitor and alert service
Symantec CloudFire Analytics 18
Hadoop Summit 2015
Data Transfer Network Monitor
Symantec CloudFire Analytics 19
Hadoop Summit 2015
Data Transfer Rate Monitor
Symantec CloudFire Analytics 20
Hadoop Summit 2015
Self-Service Analytics and Dynamic Clusters
CloudBreak integration and Development In Progress
Symantec CloudFire Analytics 21
Hadoop Summit 2015
Self-Service Analytics and Dynamic Clusters
CloudBreak integration and Development In Progress
Symantec CloudFire Analytics 22
Purpose
● Create and connect Analytics Clusters:
● Authoritative Data Lake, Regional Hubs
● Development, Test, Integration, Staging, and Production
● Ability for CloudFire users to spin up new clusters to develop their applications.
Feature
● One-click spin up of Analytics clusters using OpenStack VMs
● VMs that developers spin up are part of their own OpenStack quota
● Developers will have admin access to the cluster they spin up, so they can
install more services if needed.
Automation
● We control the version of the software by using the same automation code
that we use for deploying our production clusters.
Hadoop Summit 2015
Creating and managing clusters
Symantec CloudFire Analytics 23
Cluster
Directory
Hadoop Summit 2015
Self-Service Analytics Architecture
Symantec CloudFire Analytics 24
Managed Dynamic Clusters
Cross-Cluster Capabilities
Cluster
Management
Cluster 1
Cluster
Directory
Continuous
Validation &
Monitoring
Cluster 2 Cluster ... Cluster N
Data
Catalog
Analytics
Tools
Each Cluster:
• Analytic Services
• Deployment
• Networking
• Monitoring integration
• OpenStack VMs +
Containers
• Deployed Applications
Assemble Analytic Applications
Analytic Pipeline Message Queue
Hadoop Summit 2015
Thank You!
Stephen_Brodsky@Symantec.com
Darrell_Kienzle@Symantec.com

More Related Content

What's hot

Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseDataWorks Summit
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNDataWorks Summit
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in RealtimeDataWorks Summit
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsDataWorks Summit
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesDataWorks Summit
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive DataWorks Summit
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDataWorks Summit
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Hortonworks
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentDataWorks Summit/Hadoop Summit
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveHortonworks
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Hortonworks
 
Provisioning Big Data Platform using Cloudbreak & Ambari
Provisioning Big Data Platform using Cloudbreak & AmbariProvisioning Big Data Platform using Cloudbreak & Ambari
Provisioning Big Data Platform using Cloudbreak & AmbariDataWorks Summit/Hadoop Summit
 
Securing data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerSecuring data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerDataWorks Summit
 
HAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopHAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopBigData Research
 

What's hot (20)

Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best Practices
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service Deployment
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
 
Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security
 
Provisioning Big Data Platform using Cloudbreak & Ambari
Provisioning Big Data Platform using Cloudbreak & AmbariProvisioning Big Data Platform using Cloudbreak & Ambari
Provisioning Big Data Platform using Cloudbreak & Ambari
 
Securing data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerSecuring data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache Ranger
 
HAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopHAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoop
 

Viewers also liked

End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)Eva Tse
 
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentRealistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentDataWorks Summit
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...DataWorks Summit
 
Running Spark and MapReduce together in Production
Running Spark and MapReduce together in ProductionRunning Spark and MapReduce together in Production
Running Spark and MapReduce together in ProductionDataWorks Summit
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)DataWorks Summit
 
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenDataWorks Summit
 
HBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLHBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLDataWorks Summit
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopPractical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopDataWorks Summit
 
Inspiring Travel at Airbnb [WIP]
Inspiring Travel at Airbnb [WIP]Inspiring Travel at Airbnb [WIP]
Inspiring Travel at Airbnb [WIP]DataWorks Summit
 
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...DataWorks Summit
 
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...DataWorks Summit
 
Karta an ETL Framework to process high volume datasets
Karta an ETL Framework to process high volume datasets Karta an ETL Framework to process high volume datasets
Karta an ETL Framework to process high volume datasets DataWorks Summit
 
Hadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance InitiativeHadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance InitiativeDataWorks Summit
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDataWorks Summit
 
Spark Application Development Made Easy
Spark Application Development Made EasySpark Application Development Made Easy
Spark Application Development Made EasyDataWorks Summit
 
Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?DataWorks Summit
 

Viewers also liked (20)

End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)
 
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentRealistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
 
Running Spark and MapReduce together in Production
Running Spark and MapReduce together in ProductionRunning Spark and MapReduce together in Production
Running Spark and MapReduce together in Production
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP Haven
 
HBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLHBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQL
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopPractical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
 
Inspiring Travel at Airbnb [WIP]
Inspiring Travel at Airbnb [WIP]Inspiring Travel at Airbnb [WIP]
Inspiring Travel at Airbnb [WIP]
 
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
 
Hadoop for Genomics__HadoopSummit2010
Hadoop for Genomics__HadoopSummit2010Hadoop for Genomics__HadoopSummit2010
Hadoop for Genomics__HadoopSummit2010
 
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Karta an ETL Framework to process high volume datasets
Karta an ETL Framework to process high volume datasets Karta an ETL Framework to process high volume datasets
Karta an ETL Framework to process high volume datasets
 
Hadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance InitiativeHadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance Initiative
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
 
Spark Application Development Made Easy
Spark Application Development Made EasySpark Application Development Made Easy
Spark Application Development Made Easy
 
Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?
 
NoSQL Needs SomeSQL
NoSQL Needs SomeSQLNoSQL Needs SomeSQL
NoSQL Needs SomeSQL
 

Similar to Analyzing the World's Largest Security Data Lake!

What's New in IBM Streams V4.1
What's New in IBM Streams V4.1What's New in IBM Streams V4.1
What's New in IBM Streams V4.1lisanl
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceWilfried Hoge
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
Look Before You Leap: Migrating On-Premises Hadoop to AWS
Look Before You Leap: Migrating On-Premises Hadoop to AWSLook Before You Leap: Migrating On-Premises Hadoop to AWS
Look Before You Leap: Migrating On-Premises Hadoop to AWSDevOps.com
 
API and Big Data Solution Patterns
API and Big Data Solution Patterns API and Big Data Solution Patterns
API and Big Data Solution Patterns WSO2
 
SQL Server 2019 hotlap - WARDY IT Solutions
SQL Server 2019 hotlap - WARDY IT SolutionsSQL Server 2019 hotlap - WARDY IT Solutions
SQL Server 2019 hotlap - WARDY IT SolutionsMichaela Murray
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionCloudera, Inc.
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?OVHcloud
 
Multi cloud governance best practices - AWS, Azure, GCP
Multi cloud governance best practices - AWS, Azure, GCPMulti cloud governance best practices - AWS, Azure, GCP
Multi cloud governance best practices - AWS, Azure, GCPFaiza Mehar
 
MongoDB World 2019: Wipro Software Defined Everything Powered by MongoDB
MongoDB World 2019: Wipro Software Defined Everything Powered by MongoDBMongoDB World 2019: Wipro Software Defined Everything Powered by MongoDB
MongoDB World 2019: Wipro Software Defined Everything Powered by MongoDBMongoDB
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsCloudera, Inc.
 
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...Lucas Jellema
 
Actian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL EditionActian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL EditionAlessandro Salvatico
 
Sabrina Kirstein @ RapidMiner
Sabrina Kirstein @ RapidMinerSabrina Kirstein @ RapidMiner
Sabrina Kirstein @ RapidMinerPAPIs.io
 
Informatica Cloud Summer 2016 Release Webinar Slides
Informatica Cloud Summer 2016 Release Webinar SlidesInformatica Cloud Summer 2016 Release Webinar Slides
Informatica Cloud Summer 2016 Release Webinar SlidesInformatica Cloud
 
Slides PAPIs.io'14 RapidMiner
Slides PAPIs.io'14 RapidMinerSlides PAPIs.io'14 RapidMiner
Slides PAPIs.io'14 RapidMinerSabrina Kirstein
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewYafang Chang
 
60 minutes in the cloud: Predictive analytics made easy
60 minutes in the cloud: Predictive analytics made easy60 minutes in the cloud: Predictive analytics made easy
60 minutes in the cloud: Predictive analytics made easyNicolas Morales
 

Similar to Analyzing the World's Largest Security Data Lake! (20)

What's New in IBM Streams V4.1
What's New in IBM Streams V4.1What's New in IBM Streams V4.1
What's New in IBM Streams V4.1
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
Look Before You Leap: Migrating On-Premises Hadoop to AWS
Look Before You Leap: Migrating On-Premises Hadoop to AWSLook Before You Leap: Migrating On-Premises Hadoop to AWS
Look Before You Leap: Migrating On-Premises Hadoop to AWS
 
System center seminar presentation
System center seminar presentationSystem center seminar presentation
System center seminar presentation
 
API and Big Data Solution Patterns
API and Big Data Solution Patterns API and Big Data Solution Patterns
API and Big Data Solution Patterns
 
SQL Server 2019 hotlap - WARDY IT Solutions
SQL Server 2019 hotlap - WARDY IT SolutionsSQL Server 2019 hotlap - WARDY IT Solutions
SQL Server 2019 hotlap - WARDY IT Solutions
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?
 
Un-clouding the cloud
Un-clouding the cloudUn-clouding the cloud
Un-clouding the cloud
 
Multi cloud governance best practices - AWS, Azure, GCP
Multi cloud governance best practices - AWS, Azure, GCPMulti cloud governance best practices - AWS, Azure, GCP
Multi cloud governance best practices - AWS, Azure, GCP
 
MongoDB World 2019: Wipro Software Defined Everything Powered by MongoDB
MongoDB World 2019: Wipro Software Defined Everything Powered by MongoDBMongoDB World 2019: Wipro Software Defined Everything Powered by MongoDB
MongoDB World 2019: Wipro Software Defined Everything Powered by MongoDB
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
 
Actian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL EditionActian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL Edition
 
Sabrina Kirstein @ RapidMiner
Sabrina Kirstein @ RapidMinerSabrina Kirstein @ RapidMiner
Sabrina Kirstein @ RapidMiner
 
Informatica Cloud Summer 2016 Release Webinar Slides
Informatica Cloud Summer 2016 Release Webinar SlidesInformatica Cloud Summer 2016 Release Webinar Slides
Informatica Cloud Summer 2016 Release Webinar Slides
 
Slides PAPIs.io'14 RapidMiner
Slides PAPIs.io'14 RapidMinerSlides PAPIs.io'14 RapidMiner
Slides PAPIs.io'14 RapidMiner
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
 
60 minutes in the cloud: Predictive analytics made easy
60 minutes in the cloud: Predictive analytics made easy60 minutes in the cloud: Predictive analytics made easy
60 minutes in the cloud: Predictive analytics made easy
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Recently uploaded (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

Analyzing the World's Largest Security Data Lake!

  • 1. Hadoop Summit 2015 CloudFire Analytics: Transforming Security Analyzing Symantec’s security data lake Stephen Brodsky and Darrell Kienzle
  • 2. Hadoop Summit 2015 Outline CPE CloudFire: Analytics and Products1 Analytics Services and Data2 Analytics Administration and Monitoring3 Self-Service Analytics and Dynamic Clusters4 Symantec CloudFire Analytics 2
  • 3. Hadoop Summit 2015 CPE CloudFire: Analytics and Products • CPE – Cloud Platform Engineering – Symantec’s Private Cloud for security products and analytics – Spans 50+ data centers around the world • CloudFire – CPE’s scalable cloud platform – New data centers, new hardware, scalable build-out – All open source – OpenStack for virtualization – Analytics for big data analysis • Cloud for bringing together and integrating Symantec’s: – Products – Big Data, Analytic applications, and Services – Compute, Network Symantec CloudFire Analytics 3
  • 4. Hadoop Summit 2015 Analytics for supporting Security Products Goals of the Security Teams 1. Do we have the data? 2. Can we analyze the data? 3. Can we provide timely Insights? CloudFire Analytics for supporting Security 1. Data available • All frequently used Security Data available • Data at scale: Data available for parallel analysis (PB scale) • Leveraging CPE Data Center Availability, Compute, Net 2. Analysis engines • Hadoop ecosystem engines (MR, Hive, Kafka, Spark, HBase, Storm, Phoenix, ++) • Analytics Pipeline 3. Analysis in near real-time • Analytics timescale days/hours -> seconds • Batch -> Streaming 4Symantec CloudFire Analytics
  • 5. CPE Analytics Architecture Overview Inbound Messaging (Data import, Kafka) Products Distributed Storage (HDFS), Metal or Virtual (OpenStack) Servers Analytic Applications, Workload Management (YARN) Stream Processing (Storm, Spark) Real-time Results (HBase, ElasticSearch) Query (Hive, Spark SQL) Device Agents Telemetry, Data Threats: Top Web-based Attacks 192 web-based attacks recorded last month 36% 30% 14% 10% 7%3% Malvertisement Exploit Kit Suspicious Download Analytics Clusters • All major open source engines • PB-scale Data Store • Key Telemetry • Multi-Data Center • Administration • Monitoring • Prod, Dev/Test • Application Deployments CloudFire Analytics Data Transfer 5
  • 6. Hadoop Summit 2015 A key security Product Symantec CloudFire Analytics 6 • Live with hundreds of external customers • Improved time to analyzed results from hours to seconds (5000x) at production scale • Leverage Streaming Analytics • Move analytics to more powerful, high performance open source technologies – Kafka for queuing – Storm for analytics • Improve analytics – Server reputation – URL reputation • Moving forward – Leverage Analytics Pipeline – NoSQL DBs – Graph DBs HDFSMonitors
  • 7. Hadoop Summit 2015 Analytics Services and Data Symantec CloudFire Analytics 7
  • 8. Hadoop Summit 2015 Product Journey with CloudFire Analytics Symantec CloudFire Analytics 8 SDAP current • Scale? • Availability? • Usage? • Hours/Days timeframe Batch Analytics •Data Scale: Multi-PB HDFS •Analytics scale: Hadoop MR, Hive •Availability – high •Usage prioritization •Minutes/Hours timeframe Streaming Analytics •Analytics Scale: Kafka, Storm, Spark, ElasticSearch, HBase, ++ •Availability – high •Usage prioritization •Seconds timeframe Application style • Bring in data • Analyze • Architecture: Serial->Parallel • Workflow oriented Application style • Bring in data in zip files • Unzip, load, SQL • Architecture: Serial • Application oriented Application style • Stream in data directly • Analytics pipeline • Architecture: Streaming Parallel • Streaming oriented
  • 9. CloudFire Analytics Pipeline Data Sources Streaming Analysis Batch Analysis Actively Available Results Online Access Research Telemetry Clients World Cloud Data Assemble Analytic Applications Analytic Pipeline Message Queue 9 Storm, Kafka, Spark Hive, Spark HBase, HDFS, Kafka, Cassandra HBase, Index, Cache
  • 10. Hadoop Summit 2015 CloudFire Analytics Services Symantec CloudFire Analytics 10 Analytic Services and Data Transfer • Hadoop, YARN - Compute • HDFS/Knox/HttpFS – File Storage • Hive, Pig – SQL Batch • Oozie – Workflow and Scheduling • Service Endpoints – Admin / BDSE • Storm – Streaming • Kafka - Messaging • HBase – BigTable column store • Spark (SQL,ML) – Stream analytics • Research and admin CLI VMs • MTS Parallel file Import / Export • Falcon data management • ElasticSearch indexing • Cassandra, Graph, Drill, Solr Web UIs Admin Tools • Ambari - Admin • OpsView (Zabbix) – Uptime monitor • YARN and HDFS UIs – Perf & Logs • Ganglia & Nagios – Performance • Storm & Kafka lag – Perf/Problems • LMM – Problem determination & metrics • Resource Manager, Name Node UIs • Network data transfer monitoring • Puma performance usage analysis • Continuous Validation • Ranger user management Data Science Tools • HUE – SQL, Jobs, Files, Workflow • Ambari Views
  • 11. Hadoop Summit 2015 Analytics Administration and Monitoring Symantec CloudFire Analytics 11
  • 12. Hadoop Summit 2015 Ambari Analytics Administration 12CPE CloudFire Analytics Extensible administration platform for automated deployment, monitoring and alerting, configuration management, rolling upgrades, and rolling restarts.
  • 13. Hadoop Summit 2015 Ambari + ElasticSearch – Ambari Stack extensibility Symantec CloudFire Analytics 13 Ambari custom service for ElasticSearch, a new service to enable Ambari to manage. Ambari deploy, start/stop, config, monitor Uses the new Ambari Views, integrated via iFrame Elastic HQ admin UI as our use case. Keeps the dashboard clean, yet extensible. On github to share…
  • 14. Hadoop Summit 2015 Ambari Views with iFrame – Ambari extensibility Symantec CloudFire Analytics 14
  • 15. Hadoop Summit 2015 Cluster Usage Analysis using Analytics Symantec CloudFire Analytics 15
  • 16. Hadoop Summit 2015 Ambari Metrics Rapid Metrics Dashboards Construction - Reliability Symantec CloudFire Analytics 16 Prototype for building out flexible dashboards of most interesting application and platform metrics. Alerting available via the LMM service.
  • 17. Hadoop Summit 2015 Ambari Metrics Rapid Metrics Dashboards Construction - Availability Symantec CloudFire Analytics 17
  • 18. Hadoop Summit 2015 OpsView / Nagios monitor and alert service Symantec CloudFire Analytics 18
  • 19. Hadoop Summit 2015 Data Transfer Network Monitor Symantec CloudFire Analytics 19
  • 20. Hadoop Summit 2015 Data Transfer Rate Monitor Symantec CloudFire Analytics 20
  • 21. Hadoop Summit 2015 Self-Service Analytics and Dynamic Clusters CloudBreak integration and Development In Progress Symantec CloudFire Analytics 21
  • 22. Hadoop Summit 2015 Self-Service Analytics and Dynamic Clusters CloudBreak integration and Development In Progress Symantec CloudFire Analytics 22 Purpose ● Create and connect Analytics Clusters: ● Authoritative Data Lake, Regional Hubs ● Development, Test, Integration, Staging, and Production ● Ability for CloudFire users to spin up new clusters to develop their applications. Feature ● One-click spin up of Analytics clusters using OpenStack VMs ● VMs that developers spin up are part of their own OpenStack quota ● Developers will have admin access to the cluster they spin up, so they can install more services if needed. Automation ● We control the version of the software by using the same automation code that we use for deploying our production clusters.
  • 23. Hadoop Summit 2015 Creating and managing clusters Symantec CloudFire Analytics 23 Cluster Directory
  • 24. Hadoop Summit 2015 Self-Service Analytics Architecture Symantec CloudFire Analytics 24 Managed Dynamic Clusters Cross-Cluster Capabilities Cluster Management Cluster 1 Cluster Directory Continuous Validation & Monitoring Cluster 2 Cluster ... Cluster N Data Catalog Analytics Tools Each Cluster: • Analytic Services • Deployment • Networking • Monitoring integration • OpenStack VMs + Containers • Deployed Applications Assemble Analytic Applications Analytic Pipeline Message Queue
  • 25. Hadoop Summit 2015 Thank You! Stephen_Brodsky@Symantec.com Darrell_Kienzle@Symantec.com