SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011–2018. All rights reserved
Just the Sketch
Advanced Streaming Analytics
Casey Stella
Principal Software Engineer & VP Apache Metron
2 © Hortonworks Inc. 2011–2018. All rights reserved
Apache Metron: A Cybersecurity Analytics Solution
• Metron provides a scalable advanced security analytics framework to offer a centralized
tool for security monitoring and analysis
• Ultimately, this means that we provide a solution to ingest, enrich and detect anomalies in
disparate data sources
• Metron was initiated at Cisco in 2014 as OpenSOC
• Metron was submitted to the Apache Incubator in December 2015
• Metron graduated to a top level project in April 2017
3 © Hortonworks Inc. 2011–2018. All rights reserved
Apache Metron: The Stack
• We aggressively use the Hadoop stack for both batch as well as streaming processing of
data.
• We use
• Apache Zookeeper for distributed configuration management
• Apache HBase for random access key/value data
• Apache Storm as a stream processing framework
• HDFS for long-term storage of processed data
• Apache Solr and Elasticsearch for low latency querying
• We’ve built a UI to display these alerts to security analysts and allow them to manage
them.
4 © Hortonworks Inc. 2011–2018. All rights reserved
5 © Hortonworks Inc. 2011–2018. All rights reserved
6 © Hortonworks Inc. 2011–2018. All rights reserved
7 © Hortonworks Inc. 2011–2018. All rights reserved
Enrichment by any means necessary
• There are a couple of different ways to enrich, each with their own semantics
Changing Scope
Retrieval
Semantics
Ingestion
Semantics
Metron
Solution
Static/Slow-moving Event Key/Value Lookup Batch HBase Enrichment
Static/Slow-moving Event Complex Batch Summarizer
Dynamic Event Key/Value Lookup Streaming
Streaming HBase
Enrichment
Static/Slow-moving Event Complex Batch Model as a Service
Dynamic Multi-event Complex Streaming Profiler
8 © Hortonworks Inc. 2011–2018. All rights reserved
The Profiler
t = 1 t = 2 t = 3 t = n
The Profiler creates logical windows
that span both time and data sources
 Trending across time
 Anomaly detection across time
 Stores summaries and sketches in
Hbase for efficient retrieval in-
stream
 Provides a query language that
allows seasonal adjustment
Approx. Data
Sketch
Approx. Data
Sketch
Approx. Data
Sketch
Approx. Data
Sketch
Combined
Baseline
Statistic
9 © Hortonworks Inc. 2011–2018. All rights reserved
The Profiler: Data Access Semantics
• Provide a mechanism to store user-defined summaries for every k minutes
• Provide a mechanism to query stored profile data
• Fixed Lookback – Look back a fixed amount of time
• Seasonal Adjusted Window – Look back for a time period applying seasonal adjustment
• Often the data we are interacting with behaves seasonally and queries should support
seasonal adjustment
• from 1 hour ago
• from 1 hour ago until 30 minutes ago
• 1 hour window every 24 hours from 56 days ago including this day of
the week excluding holidays:us, weekends
10 © Hortonworks Inc. 2011–2018. All rights reserved
Data Sketches
• Data sketches provide fast, approximate answers to common questions about data
• Statistical questions (e.g. median, percentile, standard deviation)
• Set Operations (e.g. containment, existence, cardinality)
• Sampling
• They are (generally) sub-linear in size because they are approximate
• They are able to be merged and questions able to be asked of the merged results
• query(sketch(data1) + sketch(data2)) = query(sketch(data1 + data2))
11 © Hortonworks Inc. 2011–2018. All rights reserved
Data Sketches and the Profiler
• The Profiler can persist anything; not just numbers
• We can, for instance, persist a sketch per time period
• Allows the profiler to scale to large datasets
• Allows questions to be asked of data across long time ranges
• Allows time ranges to be specified at read-time rather than write-time
• Allows users to ask different questions of the same data
12 © Hortonworks Inc. 2011–2018. All rights reserved© Hortonworks Inc. 2011 – 2016. All Rights Reserved31
Data Sketches - Example
{
"profile": ”http-length”,
"foreach": ”’global’",
"onlyif": “source.type == ‘bro’ and protocol == 'HTTP'",
"update": { "sk": "STATS_ADD(sk, length)" },
"result": "sk"
}
[Stellar]>>> stats := PROFILE_GET( “http-length", "global", PROFILE_FIXED(24, "HOURS"))
[Stellar]>>> stats
[org.apache.metron.common.math.stats.OnlineStatisticsProvider@79fe4ab9, ...]
⬢ These aren’t just numbers
[Stellar]>>> STATS_MEAN( GET_FIRST( stats))
15979.0625
[Stellar]>>> STATS_PERCENTILE( GET_FIRST(stats), 90)
30310.958
⬢ Ask different queries of the same data ⬢ Merge to change the time horizon
[Stellar]>>> merged := STATS_MERGE( stats)
[Stellar]>>> STATS_PERCENTILE(merged, 90)
29810.992
⬢ A simple Profile that tracks URL length over time
13 © Hortonworks Inc. 2011–2018. All rights reserved
Where in the world is Carmen San Diego?: Anatomy of a Solution
• One way to find anomalous behavior on a network is to monitor the locations from
which a user logs in. If a user logs in from a vastly different place than usual, this could
be circumstantial evidence of malicious behavior.
• The trick to any analytic is how to define “vastly different” and “than usual”
• We can track the distance from the geographic center of the user’s previous login events over time
• We can compare that to the distribution of the distances from the geographic center for all users to
decide if it’s truly abnormal.
• In order to do this, we’ll need to
• Ingest authentication data that associates users with login events with a Parser
• Track the location of the logins across users in a scalable way using the Profiler
• Enrich login events by interrogating the user’s login history and determining if their login is
sufficiently abnormal to bump their threat level
14 © Hortonworks Inc. 2011–2018. All rights reserved
Ingest the auth data
{ "parserClassName" : "org.apache.metron.parsers.csv.CSVParser"
,"sensorTopic" : "auth"
,"parserConfig" : {
"columns" : { "user" : 0,"ip" : 1, "timestamp" : 2}
}
,"fieldTransformations" : [
{
"transformation" : "STELLAR"
,"output" : [ "hash" ]
,"config" : {
"hash" : "GEOHASH_FROM_LOC(GEO_GET(ip))"
}
}
]}
Auth data has 3 fields
We’ll add a new field
called “hash” that is
the geohash of the ip
15 © Hortonworks Inc. 2011–2018. All rights reserved
Profiler: Track user login locations
{
"profile": "locations_by_user",
"foreach": "user",
"onlyif": "exists(hash) && hash != null && LENGTH(hash) > 0",
"init" : {
"s": "MULTISET_INIT()"
},
"update": {
"s": "MULTISET_ADD(s, hash)"
},
"result": "s“
}
This profile tracks user
login behavior
We will compute a
profile for each user
We’ll use the multiset
Stellar functions to
track the geohash and
the # of occurrences
16 © Hortonworks Inc. 2011–2018. All rights reserved
Enrich the auth data with more context
"enrichment": {
"fieldMap": {
"stellar" : {
"config" : [
"geo_locations := MULTISET_MERGE( PROFILE_GET( 'locations_by_user', user,
PROFILE_FIXED( 4, 'HOURS')))",
"geo_centroid := GEOHASH_CENTROID(geo_locations)",
"geo_distance := TO_INTEGER(GEOHASH_DIST(geo_centroid, hash))",
"geo_locations := null"
]
}
}
,"fieldToTypeMap": { }
}
Get the set of geohashes (and
occurrences) for the user over the
last 4 hours
Calculate the
geographic center of
the logins
Find the distance of the
current login from the
geographic center and
create a new field
“geo_distance” to hold
that for every login
event
17 © Hortonworks Inc. 2011–2018. All rights reserved
Profiler: Baseline across all users
{
"profile": "geo_distribution_from_centroid",
"foreach": "'global'",
"onlyif": "exists(geo_distance) && geo_distance != null",
"init" : {
"s": "STATS_INIT()"
},
"update": {
"s": "STATS_ADD(s, geo_distance)"
},
"result": "s"
}
Track the distribution
of distances from the
geographic center over
the previous 4 hours
For all users
We’ll use the STATS
Stellar functions to
track the distribution
of our newly enriched
field, geo_distance
18 © Hortonworks Inc. 2011–2018. All rights reserved
Compute the threat given global context and per-user context
"threatIntel": {
"fieldMap": {
"stellar" : {
"config" : [
"geo_distance_distr:= STATS_MERGE( PROFILE_GET( 'geo_distribution_from_centroid',
'global', PROFILE_FIXED( 4, ’HOURS')))",
"dist_median := STATS_PERCENTILE(geo_distance_distr, 50.0)",
"dist_sd := STATS_SD(geo_distance_distr)",
"geo_outlier := ABS(dist_median - geo_distance) >= 5*dist_sd",
"is_alert := exists(is_alert) && is_alert",
"is_alert := is_alert || (geo_outlier != null && geo_outlier == true)",
"geo_distance_distr := null"
]
}
}
Get the statistical distribution of the
‘geo_distance’ field for all users
Decide if the
geo_distance is an
outlier by testing how
many standard
deviations it is from
the median
Update “is_alert” accordingly. If this
is true, then we will need to triage
the alert level.
19 © Hortonworks Inc. 2011–2018. All rights reserved
Triage Threat
"triageConfig" : {
"riskLevelRules" : [
{
"name" : "Geographic Outlier",
"comment" : "Determine if the user's geographic distance from the centroid of the
historic logins is an outlier as compared to all users.",
"rule" : "geo_outlier != null && geo_outlier",
"score" : 10,
"reason" : "FORMAT('user %s has a distance (%d) from the centroid of their last login
is 5 std deviations (%f) from the median (%f)', user, geo_distance, dist_sd, dist_median)"
}
],
"aggregator" : "MAX"
}
Because this is only a
circumstantial
indicator, we’ll only
give this a threat score
of 10
We’ll need to ensure
the security analyst has
enough context to
make a decision here.
In a normal system,
there would be many
rules triaging the
threat, the maximum
score would be taken
to the score for the
message.
20 © Hortonworks Inc. 2011–2018. All rights reserved
Questions?
21 © Hortonworks Inc. 2011–2018. All rights reserved
Thank you
Come visit us at the Hortonworks Booth and attend the Cybersecurity Birds of a Feather on Wednesday!

More Related Content

What's hot

Data Privacy at Scale
Data Privacy at ScaleData Privacy at Scale
Data Privacy at Scale
DataWorks Summit
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...
DataWorks Summit
 
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
DataWorks Summit
 
Make Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the DetailsMake Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the Details
DataWorks Summit/Hadoop Summit
 
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
DataWorks Summit
 
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
DataWorks Summit
 
Overcoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus onOvercoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus on
DataWorks Summit
 
Leveraging advanced technologies to support critical applications in a secure...
Leveraging advanced technologies to support critical applications in a secure...Leveraging advanced technologies to support critical applications in a secure...
Leveraging advanced technologies to support critical applications in a secure...
DataWorks Summit
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight
DataWorks Summit/Hadoop Summit
 
BI on Big Data with instant response times at Verizon
BI on Big Data with instant response times at VerizonBI on Big Data with instant response times at Verizon
BI on Big Data with instant response times at Verizon
DataWorks Summit
 
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXTDriving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
DataWorks Summit
 
Hardening Hadoop for Healthcare with Project Rhino
Hardening Hadoop for Healthcare with Project RhinoHardening Hadoop for Healthcare with Project Rhino
Hardening Hadoop for Healthcare with Project Rhino
Amazon Web Services
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
DataWorks Summit/Hadoop Summit
 
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun MurthySpark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark Summit
 
The Implacable advance of the data
The Implacable advance of the dataThe Implacable advance of the data
The Implacable advance of the data
DataWorks Summit
 
Security event logging and monitoring techniques
Security event logging and monitoring techniquesSecurity event logging and monitoring techniques
Security event logging and monitoring techniques
DataWorks Summit
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
Hortonworks
 
Shaping a Digital Vision
Shaping a Digital VisionShaping a Digital Vision
Shaping a Digital Vision
DataWorks Summit/Hadoop Summit
 
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelMoving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
DataWorks Summit
 
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
DataWorks Summit/Hadoop Summit
 

What's hot (20)

Data Privacy at Scale
Data Privacy at ScaleData Privacy at Scale
Data Privacy at Scale
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...
 
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
 
Make Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the DetailsMake Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the Details
 
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
 
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
 
Overcoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus onOvercoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus on
 
Leveraging advanced technologies to support critical applications in a secure...
Leveraging advanced technologies to support critical applications in a secure...Leveraging advanced technologies to support critical applications in a secure...
Leveraging advanced technologies to support critical applications in a secure...
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight
 
BI on Big Data with instant response times at Verizon
BI on Big Data with instant response times at VerizonBI on Big Data with instant response times at Verizon
BI on Big Data with instant response times at Verizon
 
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXTDriving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
 
Hardening Hadoop for Healthcare with Project Rhino
Hardening Hadoop for Healthcare with Project RhinoHardening Hadoop for Healthcare with Project Rhino
Hardening Hadoop for Healthcare with Project Rhino
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
 
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun MurthySpark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun Murthy
 
The Implacable advance of the data
The Implacable advance of the dataThe Implacable advance of the data
The Implacable advance of the data
 
Security event logging and monitoring techniques
Security event logging and monitoring techniquesSecurity event logging and monitoring techniques
Security event logging and monitoring techniques
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
 
Shaping a Digital Vision
Shaping a Digital VisionShaping a Digital Vision
Shaping a Digital Vision
 
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelMoving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
 
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
 

Similar to Just the sketch: advanced streaming analytics in Apache Metron

Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at Scale
DataWorks Summit
 
Apache Metron Profiler - Cyber Bootcamp 2017
Apache Metron Profiler - Cyber Bootcamp 2017Apache Metron Profiler - Cyber Bootcamp 2017
Apache Metron Profiler - Cyber Bootcamp 2017
Nick Allen
 
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using Druid
DataWorks Summit
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San Jose
Hao Chen
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
DataWorks Summit/Hadoop Summit
 
Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...
Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...
Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...
InfluxData
 
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using Druid
DataWorks Summit/Hadoop Summit
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
Jürgen Ambrosi
 
XCube-overview-brochure-revB
XCube-overview-brochure-revBXCube-overview-brochure-revB
XCube-overview-brochure-revBRichard Jaenicke
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
Splunk
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
Splunk
 
A streaming architecture for Cyber Security - Apache Metron
A streaming architecture for Cyber Security - Apache MetronA streaming architecture for Cyber Security - Apache Metron
A streaming architecture for Cyber Security - Apache Metron
Simon Elliston Ball
 
Druid Scaling Realtime Analytics
Druid Scaling Realtime AnalyticsDruid Scaling Realtime Analytics
Druid Scaling Realtime Analytics
Aaron Brooks
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
Michael Häusler
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
Artem Ervits
 
ASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH dataASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH data
John Beresniewicz
 
Apache Eagle in Action
Apache Eagle in ActionApache Eagle in Action
Apache Eagle in Action
Hao Chen
 
Api Statistics- The Scalable Way
Api Statistics- The Scalable WayApi Statistics- The Scalable Way
Api Statistics- The Scalable Way
WSO2
 
Elks for analysing performance test results - Helsinki QA meetup
Elks for analysing performance test results - Helsinki QA meetupElks for analysing performance test results - Helsinki QA meetup
Elks for analysing performance test results - Helsinki QA meetup
Anoop Vijayan
 
WSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsWSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needs
Sriskandarajah Suhothayan
 

Similar to Just the sketch: advanced streaming analytics in Apache Metron (20)

Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at Scale
 
Apache Metron Profiler - Cyber Bootcamp 2017
Apache Metron Profiler - Cyber Bootcamp 2017Apache Metron Profiler - Cyber Bootcamp 2017
Apache Metron Profiler - Cyber Bootcamp 2017
 
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using Druid
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San Jose
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 
Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...
Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...
Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...
 
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using Druid
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
XCube-overview-brochure-revB
XCube-overview-brochure-revBXCube-overview-brochure-revB
XCube-overview-brochure-revB
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
 
A streaming architecture for Cyber Security - Apache Metron
A streaming architecture for Cyber Security - Apache MetronA streaming architecture for Cyber Security - Apache Metron
A streaming architecture for Cyber Security - Apache Metron
 
Druid Scaling Realtime Analytics
Druid Scaling Realtime AnalyticsDruid Scaling Realtime Analytics
Druid Scaling Realtime Analytics
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
ASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH dataASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH data
 
Apache Eagle in Action
Apache Eagle in ActionApache Eagle in Action
Apache Eagle in Action
 
Api Statistics- The Scalable Way
Api Statistics- The Scalable WayApi Statistics- The Scalable Way
Api Statistics- The Scalable Way
 
Elks for analysing performance test results - Helsinki QA meetup
Elks for analysing performance test results - Helsinki QA meetupElks for analysing performance test results - Helsinki QA meetup
Elks for analysing performance test results - Helsinki QA meetup
 
WSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsWSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needs
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 

Recently uploaded (20)

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 

Just the sketch: advanced streaming analytics in Apache Metron

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved Just the Sketch Advanced Streaming Analytics Casey Stella Principal Software Engineer & VP Apache Metron
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved Apache Metron: A Cybersecurity Analytics Solution • Metron provides a scalable advanced security analytics framework to offer a centralized tool for security monitoring and analysis • Ultimately, this means that we provide a solution to ingest, enrich and detect anomalies in disparate data sources • Metron was initiated at Cisco in 2014 as OpenSOC • Metron was submitted to the Apache Incubator in December 2015 • Metron graduated to a top level project in April 2017
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved Apache Metron: The Stack • We aggressively use the Hadoop stack for both batch as well as streaming processing of data. • We use • Apache Zookeeper for distributed configuration management • Apache HBase for random access key/value data • Apache Storm as a stream processing framework • HDFS for long-term storage of processed data • Apache Solr and Elasticsearch for low latency querying • We’ve built a UI to display these alerts to security analysts and allow them to manage them.
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved Enrichment by any means necessary • There are a couple of different ways to enrich, each with their own semantics Changing Scope Retrieval Semantics Ingestion Semantics Metron Solution Static/Slow-moving Event Key/Value Lookup Batch HBase Enrichment Static/Slow-moving Event Complex Batch Summarizer Dynamic Event Key/Value Lookup Streaming Streaming HBase Enrichment Static/Slow-moving Event Complex Batch Model as a Service Dynamic Multi-event Complex Streaming Profiler
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved The Profiler t = 1 t = 2 t = 3 t = n The Profiler creates logical windows that span both time and data sources  Trending across time  Anomaly detection across time  Stores summaries and sketches in Hbase for efficient retrieval in- stream  Provides a query language that allows seasonal adjustment Approx. Data Sketch Approx. Data Sketch Approx. Data Sketch Approx. Data Sketch Combined Baseline Statistic
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved The Profiler: Data Access Semantics • Provide a mechanism to store user-defined summaries for every k minutes • Provide a mechanism to query stored profile data • Fixed Lookback – Look back a fixed amount of time • Seasonal Adjusted Window – Look back for a time period applying seasonal adjustment • Often the data we are interacting with behaves seasonally and queries should support seasonal adjustment • from 1 hour ago • from 1 hour ago until 30 minutes ago • 1 hour window every 24 hours from 56 days ago including this day of the week excluding holidays:us, weekends
  • 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved Data Sketches • Data sketches provide fast, approximate answers to common questions about data • Statistical questions (e.g. median, percentile, standard deviation) • Set Operations (e.g. containment, existence, cardinality) • Sampling • They are (generally) sub-linear in size because they are approximate • They are able to be merged and questions able to be asked of the merged results • query(sketch(data1) + sketch(data2)) = query(sketch(data1 + data2))
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved Data Sketches and the Profiler • The Profiler can persist anything; not just numbers • We can, for instance, persist a sketch per time period • Allows the profiler to scale to large datasets • Allows questions to be asked of data across long time ranges • Allows time ranges to be specified at read-time rather than write-time • Allows users to ask different questions of the same data
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved© Hortonworks Inc. 2011 – 2016. All Rights Reserved31 Data Sketches - Example { "profile": ”http-length”, "foreach": ”’global’", "onlyif": “source.type == ‘bro’ and protocol == 'HTTP'", "update": { "sk": "STATS_ADD(sk, length)" }, "result": "sk" } [Stellar]>>> stats := PROFILE_GET( “http-length", "global", PROFILE_FIXED(24, "HOURS")) [Stellar]>>> stats [org.apache.metron.common.math.stats.OnlineStatisticsProvider@79fe4ab9, ...] ⬢ These aren’t just numbers [Stellar]>>> STATS_MEAN( GET_FIRST( stats)) 15979.0625 [Stellar]>>> STATS_PERCENTILE( GET_FIRST(stats), 90) 30310.958 ⬢ Ask different queries of the same data ⬢ Merge to change the time horizon [Stellar]>>> merged := STATS_MERGE( stats) [Stellar]>>> STATS_PERCENTILE(merged, 90) 29810.992 ⬢ A simple Profile that tracks URL length over time
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved Where in the world is Carmen San Diego?: Anatomy of a Solution • One way to find anomalous behavior on a network is to monitor the locations from which a user logs in. If a user logs in from a vastly different place than usual, this could be circumstantial evidence of malicious behavior. • The trick to any analytic is how to define “vastly different” and “than usual” • We can track the distance from the geographic center of the user’s previous login events over time • We can compare that to the distribution of the distances from the geographic center for all users to decide if it’s truly abnormal. • In order to do this, we’ll need to • Ingest authentication data that associates users with login events with a Parser • Track the location of the logins across users in a scalable way using the Profiler • Enrich login events by interrogating the user’s login history and determining if their login is sufficiently abnormal to bump their threat level
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved Ingest the auth data { "parserClassName" : "org.apache.metron.parsers.csv.CSVParser" ,"sensorTopic" : "auth" ,"parserConfig" : { "columns" : { "user" : 0,"ip" : 1, "timestamp" : 2} } ,"fieldTransformations" : [ { "transformation" : "STELLAR" ,"output" : [ "hash" ] ,"config" : { "hash" : "GEOHASH_FROM_LOC(GEO_GET(ip))" } } ]} Auth data has 3 fields We’ll add a new field called “hash” that is the geohash of the ip
  • 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved Profiler: Track user login locations { "profile": "locations_by_user", "foreach": "user", "onlyif": "exists(hash) && hash != null && LENGTH(hash) > 0", "init" : { "s": "MULTISET_INIT()" }, "update": { "s": "MULTISET_ADD(s, hash)" }, "result": "s“ } This profile tracks user login behavior We will compute a profile for each user We’ll use the multiset Stellar functions to track the geohash and the # of occurrences
  • 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved Enrich the auth data with more context "enrichment": { "fieldMap": { "stellar" : { "config" : [ "geo_locations := MULTISET_MERGE( PROFILE_GET( 'locations_by_user', user, PROFILE_FIXED( 4, 'HOURS')))", "geo_centroid := GEOHASH_CENTROID(geo_locations)", "geo_distance := TO_INTEGER(GEOHASH_DIST(geo_centroid, hash))", "geo_locations := null" ] } } ,"fieldToTypeMap": { } } Get the set of geohashes (and occurrences) for the user over the last 4 hours Calculate the geographic center of the logins Find the distance of the current login from the geographic center and create a new field “geo_distance” to hold that for every login event
  • 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved Profiler: Baseline across all users { "profile": "geo_distribution_from_centroid", "foreach": "'global'", "onlyif": "exists(geo_distance) && geo_distance != null", "init" : { "s": "STATS_INIT()" }, "update": { "s": "STATS_ADD(s, geo_distance)" }, "result": "s" } Track the distribution of distances from the geographic center over the previous 4 hours For all users We’ll use the STATS Stellar functions to track the distribution of our newly enriched field, geo_distance
  • 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved Compute the threat given global context and per-user context "threatIntel": { "fieldMap": { "stellar" : { "config" : [ "geo_distance_distr:= STATS_MERGE( PROFILE_GET( 'geo_distribution_from_centroid', 'global', PROFILE_FIXED( 4, ’HOURS')))", "dist_median := STATS_PERCENTILE(geo_distance_distr, 50.0)", "dist_sd := STATS_SD(geo_distance_distr)", "geo_outlier := ABS(dist_median - geo_distance) >= 5*dist_sd", "is_alert := exists(is_alert) && is_alert", "is_alert := is_alert || (geo_outlier != null && geo_outlier == true)", "geo_distance_distr := null" ] } } Get the statistical distribution of the ‘geo_distance’ field for all users Decide if the geo_distance is an outlier by testing how many standard deviations it is from the median Update “is_alert” accordingly. If this is true, then we will need to triage the alert level.
  • 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved Triage Threat "triageConfig" : { "riskLevelRules" : [ { "name" : "Geographic Outlier", "comment" : "Determine if the user's geographic distance from the centroid of the historic logins is an outlier as compared to all users.", "rule" : "geo_outlier != null && geo_outlier", "score" : 10, "reason" : "FORMAT('user %s has a distance (%d) from the centroid of their last login is 5 std deviations (%f) from the median (%f)', user, geo_distance, dist_sd, dist_median)" } ], "aggregator" : "MAX" } Because this is only a circumstantial indicator, we’ll only give this a threat score of 10 We’ll need to ensure the security analyst has enough context to make a decision here. In a normal system, there would be many rules triaging the threat, the maximum score would be taken to the score for the message.
  • 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved Questions?
  • 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved Thank you Come visit us at the Hortonworks Booth and attend the Cybersecurity Birds of a Feather on Wednesday!

Editor's Notes

  1. TALK TRACK Hortonworks Powers the Future of Data: data-in-motion, data-at-rest, and Modern Data Applications. [NEXT SLIDE]
  2. Result Screenshot field cloud demo environment
  3. HLLP cardinality – how many servers connected T-digest statistics – average over periods Bloom – presence small needle outliers - Detecting unusual events in streams An individual profile result might be around 2.5k each  2 years, 50 average profiles every 5mins on 10,000 users for a medium size company would max out about 70 TB, or 3-4 nodes