Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Fighting Cybercrime:
A Joint Task Force of Real-Time Data and
Human Analytics
William Callaghan
eSentire Inc.
About eSentire
• Founded in 2001
• Located in Cambridge, Ontario, Canada
• Managed Detection and Responsefor Mid-Sized
Ent...
CYBERCRIME
IS	BIG	BUSINESS
$70B
spent	on	cybersecurity
THREAT	ACTORS
HACTIVIST NATION	STATE	ACTORINSIDERORGANIZED	CRIMECRI...
KNOWN	ATTACKS KNOWN	ATTACKS
ON	OTHERS
SYSTEMIC
VULNERABILITIES
KNOWN
SIGNATURES
THREAT
INTELLIGENCE
The	majority	of	cyber	...
KNOWN	ATTACKS KNOWN	ATTACKS
ON	OTHERS
SYSTEMIC
VULNERABILITIES
UNKNOWN
ATTACKS
KNOWN
SIGNATURES
THREAT
INTELLIGENCE
ANOMAL...
Network Scanner
• A tool for exploringconnection information from
a given IP Address.
• Analysts can get breakdownof traff...
Network Scanner
Important Event fields:
• Company
• Site
• Host (IP)
• Event bucket(Minute of event)
• Event time (when th...
Drawbacks
With 1000+ sensors in the field, some sending
100,000+ events/hour:
• Scalability of having a C* table for each ...
Goals
Reduce data duplication and load on C* while
being able to:
• Perform flexible queries over large datasets.
• Have m...
Solution
When an alert is generated for a
(Company,Site,Host) for a given hour window:
• Load the partition(s) into Spark....
Overhead
• Overhead in starting Spark applications
– Creation of Spark Context
– Connecting to Cassandra (or other sources...
Spark Job Server
• Open-source project originally created by Ooyala.
• Provides a persistent connection to Spark, relievin...
import spark.jobserver.api.{SparkJob -> NewSparkJob,_}
object MySparkJob extends NewSparkJob {
type JobData = Config
type ...
Querying The Same Dataset
What if we want to make different queries on the
same dataset?
• We shouldn’t have to load the d...
Caching in Spark Job Server
• Can cache RDDs, DataFrames in memory.
• They are then available to any Spark application run...
Caching in Spark Job Server
• Managinga large number of datasets using
Spark Job Server can be tedious.
– Datasets had to ...
Alluxio: In-Memory Distributed
Storage
• An in-memory, distributed file system.
• FilesystemAPI supports frameworks such a...
import spark.jobserver.api.{SparkJob -> NewSparkJob, _}
import alluxio.client.file._
import alluxio._
…
def runJob(sc:Spar...
New Architecture
1
2
3
4
5
6
7
8
9
Conclusions
• Expanded capabilities of our Network Scanner
tool to support flexible queries with response
times of ~1-2 se...
Thank You.
Questions?
William Callaghan
william.callaghan@esentire.com
Upcoming SlideShare
Loading in …5
×

Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics: Spark Summit East talk by William Callaghan

486 views

Published on

Cybercrime is big business. Gartner reports worldwide security spending at $80B, with annual losses totalling more than $1.2T (in 2015). Small to medium sized businesses now account for more than half of the attacks targeting enterprises today. The threat actors behind these attacks are continually shifting their techniques and toolkits to evade the security defenses that businesses commonly use. Thanks to the growing frequency and complexity of attacks, the task of identifying and mitigating security-related events has become increasingly difficult.

At eSentire, we use a combination of data and human analytics to identify, respond to and mitigate cyber threats in real-time. We capture all network traffic on our customers’ networks, hence ingesting a large amount of time-series data. We process the data as it is being streamed into our system to extract relevant threat insights and block attacks in real-time. Furthermore, we enable our cybersecurity analysts to perform in-depth investigations to: i) confirm attacks and ii) identify threats that analytical models miss. Having security experts in the loop provides feedback to our analytics engine, thereby improving the overall threat detection effectiveness.

So how exactly can you build an analytics pipeline to handle a large amount of time-series/event-driven data? How do you build the tools that allow people to query this data with the expectation of mission-critical response times?

In this presentation, William Callaghan will focus on the challenges faced and lessons learned in building a human-in-the loop cyber threat analytics pipeline. They will discuss the topic of analytics in cybersecurity and highlight the use of technologies such as Spark Streaming/SQL, Cassandra, Kafka and Alluxio in creating an analytics architecture with missions-critical response times.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics: Spark Summit East talk by William Callaghan

  1. 1. Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics William Callaghan eSentire Inc.
  2. 2. About eSentire • Founded in 2001 • Located in Cambridge, Ontario, Canada • Managed Detection and Responsefor Mid-Sized Enterprises – Real-Time Analytics – 24/7 Security Operations Centre – Threat Intelligence Team
  3. 3. CYBERCRIME IS BIG BUSINESS $70B spent on cybersecurity THREAT ACTORS HACTIVIST NATION STATE ACTORINSIDERORGANIZED CRIMECRIMINAL TERRORIST $375-575B in estimated losses MEANS | MOTIVE |OPPORTUNITY Easy Access to Cyber Weaponry No Negative Repercussions Motivation is High Minimal Cyber Skills Required
  4. 4. KNOWN ATTACKS KNOWN ATTACKS ON OTHERS SYSTEMIC VULNERABILITIES KNOWN SIGNATURES THREAT INTELLIGENCE The majority of cyber defenses protect against KNOWN threats. LAYERS OF SECURITY NAC | WEB PROXY | MALWARE SANDBOXING | ROUTER | FIREWALL | ANTI VIRUS | USER ID AND PASSWORD UTM | IPS/IDS | AUTHENTICATION TOKENS | HIPS | DLP | WAF | SPAM FILTER
  5. 5. KNOWN ATTACKS KNOWN ATTACKS ON OTHERS SYSTEMIC VULNERABILITIES UNKNOWN ATTACKS KNOWN SIGNATURES THREAT INTELLIGENCE ANOMALOUS SIGNAL DETECTION BEHAVIOURAL ANALYTICS ZERO DAY ATTACK TARGETED RETROSPECTION EXPLOITS
  6. 6. Network Scanner • A tool for exploringconnection information from a given IP Address. • Analysts can get breakdownof traffic by: – IP, Port, Minute or any permutation of the 3. • Cassandratable for each type of breakdown. – Data is modeled based on queries, not relations. – Therefore, there is data duplication.
  7. 7. Network Scanner Important Event fields: • Company • Site • Host (IP) • Event bucket(Minute of event) • Event time (when the eventoccurred on the sensor) • Received time (when the event was processed by the counter) Network Scan Counter: • Counts # of connections for each (company,site,host)in every minute. 1 2 3 4 5 6 7 8 9 10
  8. 8. Drawbacks With 1000+ sensors in the field, some sending 100,000+ events/hour: • Scalability of having a C* table for each method of traffic breakdown? • Limited to the use cases (tables created) at the time.
  9. 9. Goals Reduce data duplication and load on C* while being able to: • Perform flexible queries over large datasets. • Have minimal latency.
  10. 10. Solution When an alert is generated for a (Company,Site,Host) for a given hour window: • Load the partition(s) into Spark. – (Company, Eventbucket) – Restrict by “Receive Time” to get a snapshot of the data that led to the alert being triggered. • Perform the query in-memory and return the results.
  11. 11. Overhead • Overhead in starting Spark applications – Creation of Spark Context – Connecting to Cassandra (or other sources) – Acquiring Executors – Distributing application code to workers
  12. 12. Spark Job Server • Open-source project originally created by Ooyala. • Provides a persistent connection to Spark, relieving the overhead. • HTTP API to pass in argumentsto and run a Spark application. • Can now run successive Spark SQL queries without the overhead.
  13. 13. import spark.jobserver.api.{SparkJob -> NewSparkJob,_} object MySparkJob extends NewSparkJob { type JobData = Config type JobOutput= Any def validate(sc:SparkContext,runtime:JobEnvironment,data:JobData):JobData or Every[ValidationProblem]= { // Data validation steps go here } def runJob(sc:SparkContext,runtime: JobEnvironment,data:JobData):JobOutput{ // Extracting arguments val queryString = data.getString(“sql”); } }
  14. 14. Querying The Same Dataset What if we want to make different queries on the same dataset? • We shouldn’t have to load the data in each time.
  15. 15. Caching in Spark Job Server • Can cache RDDs, DataFrames in memory. • They are then available to any Spark application running on the Job Server – Methods: Get, Update, Forget. • Naming Scheme: – “COMPANY_START_END_RECVTIME” object MySparkJob extends NewSparkJob with NamedObjectSupport{ …
  16. 16. Caching in Spark Job Server • Managinga large number of datasets using Spark Job Server can be tedious. – Datasets had to be managed within a Spark application.
  17. 17. Alluxio: In-Memory Distributed Storage • An in-memory, distributed file system. • FilesystemAPI supports frameworks such as: Spark, MapReduce. • Can query Parquet files from Alluxio. • Can set a TTL on datasets. • Fault-tolerance mode for HA. • Can promote datasets to HDFS.
  18. 18. import spark.jobserver.api.{SparkJob -> NewSparkJob, _} import alluxio.client.file._ import alluxio._ … def runJob(sc:SparkContext, runtime: JobEnvironment, data:JobData): JobOutput { val sqlContext = new SQLContext(sc); val fileSystemMaster = "alluxio-ft://<IP>:19998" val fileSystem = BaseFileSystem.get(); val uri = new AlluxioURI(fileSystemMaster+"/data/COMPANY_START_END_RECV") val query = "SELECT COUNT(*) FROM <uri> WHERE ..." if (fileSystem.exists(uri) && fileSystem.getStatus(uri).isCompleted()) { val result = sqlContext.sql(query) result.write.parquet(fileSystemMaster+"/results/"+runtime.jobId) } }
  19. 19. New Architecture 1 2 3 4 5 6 7 8 9
  20. 20. Conclusions • Expanded capabilities of our Network Scanner tool to support flexible queries with response times of ~1-2 seconds. • Analysts now have a more powerful tool for exploratory analysis of network activity.
  21. 21. Thank You. Questions? William Callaghan william.callaghan@esentire.com

×