FireEye & Scylla :
Intel Threat Analysis
using Graph Database
Rahul Gaikwad, Staff DevOps Engineer
&
Krishna Palati, Senior Devops Manager
Presenters
Rahul Gaikwad, Staff DevOps Engineer
❖ Role
➢ Database Administrator - SQL / NoSQL / Graph DB / Big Data
➢ Infrastructure & Cloud Operations
➢ DevOps Automation Engineer
❖ Education
➢ Master of Computer Applications (MCA) & Executive MBA
➢ Pursuing PhD Research in AIOps
❖ Certifications
➢ Scylla | OCP | CCAH | HDPCA | RHCSA | AWS - SA | AWS - SysOps | Confluent Kafka
Krishna Palati, Senior Devops Manager
❖ Role
➢ Senior DevOps Manager
➢ Cloud Infrastructure, Devops automation and Database Systems
❖ Education
➢ Bachelors and Masters Degrees in Computer Science and Engineering
❖ Hobbies
➢ Running, Biking, Hiking and Playing tennis
Agenda
■ Background
■ Why ScyllaDB
■ ScyllaDB at FireEye
■ Conclusion
■ Q&A
Background
Introduction to FireEye
Solutions
■ Threat Intelligence
■ Helix Security Platform
■ Endpoint Security
■ Network Security and Forensics
■ Email Security
■ Managed Defense
Services
■ Breach Response
■ Security Assessment
■ Security Enhancement
■ Security Transformation
★ FireEye is a intelligence-led Cyber security company
★ We offer solutions that blends security technologies, threat intelligence and consulting.
Forrester New Wave
Leading Threat Intel Services
FireEye Threat Intelligence
A portfolio of subscriptions and services designed to address all aspects of an
organization’s intelligence needs.
■ Intelligence Subscriptions
■ Intelligence Enablement
■ Intelligence Capability Development
■ Digital Threat Monitoring
■ Advanced Intelligence Access
Application Use Case
■ Homegrown custom graph database on Postgres
■ Centralizes, organizes and processes cyber threat intelligence data
■ Tracks threat groups by recording all of the analytic correlations
■ Provides analytic results by processing and analysing historical data
■ Data Objects - DNS data, RSS feeds, file md5s, FQDNs and URLs
■ Data Size: Nodes ~500M and Edges ~1.5B
Existing System as Graph DB
Structure of the Graph
■ Stores data as ”nodes” or “edges”
■ Also allows storing tags
Nodes
■ Each node represents a single object, event or evidence
■ E.g. Organizations, actors, hosts, files and FQDNs are represented as nodes in graph
Edges
■ Edges represent the relationships between nodes.
■ E.g. an edge exist from a threat actor to their location
Existing System- Graph Example
Challenges of Existing System
Limitations :
■ Slow performance
■ Not easily scalable
■ Not stable
■ Not highly available
■ Not distributed
Objectives:
■ Replace the current system with a new scalable, highly available,
distributed system.
Tech Evaluation for Graph DB
Evaluation Targets - Multiple Graph DB’s
■ Orient DB
■ Synapse
■ AWS Neptune
■ Janus Graph
Evaluation Criteria - Based on MoSCoW Model
■ Functional
■ Non-Functional
■ Supportability
Why JanusGraph?
Opinionated Selection Criteria for Janus Graph :
■ Indexing capabilities that can be controlled by the user.
■ Free / Full Text search
■ Embedded as well as Server mode setup capability
■ Schema Management
■ Triggers
■ OLAP Capabilities - Distributed Graph Processing
Result:
■ Based on our requirements, tech evaluation and test results, we selected JanusGraph.
Janus Graph is...
■ Distributed
■ Open source
■ Massively scalable
■ Graph Database
also...
■ Supports pluggable Backend Storage
● ScyllaDB
● Cassandra,
● HBase
● Berkeley DB
Motivation for ScyllaDB
Why ScyllaDB ?
Based on tech evaluations and tests we determined Scylla DB is the right
backend storage.
Features :
■ Easy Cluster setup
■ Self Tuning
■ Equal Load distribution
■ Easy to Manage On Cloud
■ Less Administration
■ No GC
■ Compression
ScyllaDB Usage
ScyllaDB Usage for Threat Analysis
■ Since data represents threat activity, we can get answers to questions about:
● Threat actors
● Malware
● Threat activity
● Victims
● Various other things.
■ Graph DB tells a story about data by connecting dots
Graph Traversing Examples
Architecture
Graph DB with ScyllaDB
Environment
Configurations
■ Running on AWS Cloud
■ Single Region (Multi AZ) deployment
■ Using EC2’s
■ AWS Instance - i3.8xlarge
■ Each Cluster has 7 nodes
■ Clusters - DEV, QA, STAGING, PROD.
H/W Per Node Per Cluster
CPU 32 224
RAM (GB) 244 1708
Disk (TB) 16 112
Deployment
Scylla DB - Infrastructure Management
Terraform is a tool for building, changing, and versioning infrastructure
safely and efficiently.
Scylla DB - Configuration Management
Puppet is a Configuration Management tool that is used for deploying,
configuring and managing servers.
Comparison
Conclusion
FireEye Traversing with Scylla DB
■ Very good experience and results observed so far
■ Cost Effective
■ Admin Friendly
■ Superfast
■ Looking at potential opportunities to use ScyllaDB in other projects
Thank You All ..!!
■ FireEye
● Architects
● Engineers: Developers, DevOps & QA
● Project and Program Managers
■ JanusGraph
■ ScyllaDB
● Scylla University
● Community
● Summit Organisers
Thank you Stay in touch
Any questions?
Rahul Gaikwad
rahul.gaikwad@fireeye.com
Krishna Palati
krishna.palati@FireEye.com
linkedin.com/in/rahul-gaikwad-2712b02a
linkedin.com/in/krishnapalati

FireEye & Scylla: Intel Threat Analysis Using a Graph Database

  • 1.
    FireEye & Scylla: Intel Threat Analysis using Graph Database Rahul Gaikwad, Staff DevOps Engineer & Krishna Palati, Senior Devops Manager
  • 2.
    Presenters Rahul Gaikwad, StaffDevOps Engineer ❖ Role ➢ Database Administrator - SQL / NoSQL / Graph DB / Big Data ➢ Infrastructure & Cloud Operations ➢ DevOps Automation Engineer ❖ Education ➢ Master of Computer Applications (MCA) & Executive MBA ➢ Pursuing PhD Research in AIOps ❖ Certifications ➢ Scylla | OCP | CCAH | HDPCA | RHCSA | AWS - SA | AWS - SysOps | Confluent Kafka Krishna Palati, Senior Devops Manager ❖ Role ➢ Senior DevOps Manager ➢ Cloud Infrastructure, Devops automation and Database Systems ❖ Education ➢ Bachelors and Masters Degrees in Computer Science and Engineering ❖ Hobbies ➢ Running, Biking, Hiking and Playing tennis
  • 3.
    Agenda ■ Background ■ WhyScyllaDB ■ ScyllaDB at FireEye ■ Conclusion ■ Q&A
  • 4.
  • 5.
    Introduction to FireEye Solutions ■Threat Intelligence ■ Helix Security Platform ■ Endpoint Security ■ Network Security and Forensics ■ Email Security ■ Managed Defense Services ■ Breach Response ■ Security Assessment ■ Security Enhancement ■ Security Transformation ★ FireEye is a intelligence-led Cyber security company ★ We offer solutions that blends security technologies, threat intelligence and consulting.
  • 6.
    Forrester New Wave LeadingThreat Intel Services
  • 8.
    FireEye Threat Intelligence Aportfolio of subscriptions and services designed to address all aspects of an organization’s intelligence needs. ■ Intelligence Subscriptions ■ Intelligence Enablement ■ Intelligence Capability Development ■ Digital Threat Monitoring ■ Advanced Intelligence Access
  • 9.
    Application Use Case ■Homegrown custom graph database on Postgres ■ Centralizes, organizes and processes cyber threat intelligence data ■ Tracks threat groups by recording all of the analytic correlations ■ Provides analytic results by processing and analysing historical data ■ Data Objects - DNS data, RSS feeds, file md5s, FQDNs and URLs ■ Data Size: Nodes ~500M and Edges ~1.5B
  • 10.
    Existing System asGraph DB Structure of the Graph ■ Stores data as ”nodes” or “edges” ■ Also allows storing tags Nodes ■ Each node represents a single object, event or evidence ■ E.g. Organizations, actors, hosts, files and FQDNs are represented as nodes in graph Edges ■ Edges represent the relationships between nodes. ■ E.g. an edge exist from a threat actor to their location
  • 11.
  • 12.
    Challenges of ExistingSystem Limitations : ■ Slow performance ■ Not easily scalable ■ Not stable ■ Not highly available ■ Not distributed Objectives: ■ Replace the current system with a new scalable, highly available, distributed system.
  • 13.
    Tech Evaluation forGraph DB Evaluation Targets - Multiple Graph DB’s ■ Orient DB ■ Synapse ■ AWS Neptune ■ Janus Graph Evaluation Criteria - Based on MoSCoW Model ■ Functional ■ Non-Functional ■ Supportability
  • 14.
    Why JanusGraph? Opinionated SelectionCriteria for Janus Graph : ■ Indexing capabilities that can be controlled by the user. ■ Free / Full Text search ■ Embedded as well as Server mode setup capability ■ Schema Management ■ Triggers ■ OLAP Capabilities - Distributed Graph Processing Result: ■ Based on our requirements, tech evaluation and test results, we selected JanusGraph.
  • 15.
    Janus Graph is... ■Distributed ■ Open source ■ Massively scalable ■ Graph Database also... ■ Supports pluggable Backend Storage ● ScyllaDB ● Cassandra, ● HBase ● Berkeley DB
  • 16.
  • 17.
    Why ScyllaDB ? Basedon tech evaluations and tests we determined Scylla DB is the right backend storage. Features : ■ Easy Cluster setup ■ Self Tuning ■ Equal Load distribution ■ Easy to Manage On Cloud ■ Less Administration ■ No GC ■ Compression
  • 18.
  • 19.
    ScyllaDB Usage forThreat Analysis ■ Since data represents threat activity, we can get answers to questions about: ● Threat actors ● Malware ● Threat activity ● Victims ● Various other things. ■ Graph DB tells a story about data by connecting dots
  • 20.
  • 21.
  • 22.
    Graph DB withScyllaDB
  • 23.
  • 24.
    Configurations ■ Running onAWS Cloud ■ Single Region (Multi AZ) deployment ■ Using EC2’s ■ AWS Instance - i3.8xlarge ■ Each Cluster has 7 nodes ■ Clusters - DEV, QA, STAGING, PROD. H/W Per Node Per Cluster CPU 32 224 RAM (GB) 244 1708 Disk (TB) 16 112
  • 25.
  • 26.
    Scylla DB -Infrastructure Management Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently.
  • 27.
    Scylla DB -Configuration Management Puppet is a Configuration Management tool that is used for deploying, configuring and managing servers.
  • 28.
  • 29.
  • 30.
    FireEye Traversing withScylla DB ■ Very good experience and results observed so far ■ Cost Effective ■ Admin Friendly ■ Superfast ■ Looking at potential opportunities to use ScyllaDB in other projects
  • 31.
    Thank You All..!! ■ FireEye ● Architects ● Engineers: Developers, DevOps & QA ● Project and Program Managers ■ JanusGraph ■ ScyllaDB ● Scylla University ● Community ● Summit Organisers
  • 32.
    Thank you Stayin touch Any questions? Rahul Gaikwad rahul.gaikwad@fireeye.com Krishna Palati krishna.palati@FireEye.com linkedin.com/in/rahul-gaikwad-2712b02a linkedin.com/in/krishnapalati

Editor's Notes

  • #2 KP: Hello everyone, hope you are enjoying the CA weather. As you heard in the introduction video, today we will talk abt how we at FireEye, used ScyllaDB to redesign an existing product and built a new solution for our Intel product portfolio.
  • #3 KP: I am Krishna Palati, I manage Devops team for Solutions Engineering comprising of Intel, Managed Defense and Incidence Response for FireEye. We are responsible for Core Devops, Cloud Infrastructure operations & Database systems. In this presentation we will talk abt how we used Scylla to implement a solution that is critical for our Intelligence product portfolio. RG - Hello, I am Rahul Gaikwad. I am a Staff DevOps Engineer at FireEye cybersecurity. I am responsible for continuous integration and deployment , different database administration and cloud operations. I came from India to talk in Scylla summit about how we are doing Intel Threat analysis using Graph database. We will be talking about the challenges with existing systems and how ScyllaDB helps us solve some of these challenges.
  • #4 KP
  • #5 KP
  • #6 KP: FireEye is a unique cyber security company in the sense that we bring our Security Appliances & Intelligence capabilities together for our customers. Appliances could be physical or virtual and include a range of products like Endpoint (HX), Network (NX), Email (ETP). Solutions include Intel, Managed Defense & Incidence Reponse.
  • #7 KP: As per Forrester Report, FireEye is the leader in cyber Threat Intelligence offering, both for current content and our strategy. We are specifically focused on Intel because we will be discussing the problems we encountered with current technology and solutions we implemented to address them during rest of this presentation.
  • #8 KP: As is evident here, we are Industry recognized thought leader in cyber Intelligence and often called upon to provide our analysis and thoughts on this topic.
  • #9 KP Subscription: Access to published intelligence reports Enablement: Include onboarding and provisioning, API integration with your security systems, analyst access, workshops. Digital Threat Monitoring: Tailored, proactive monitoring and analysis of threats to your brand, your VIPs. Advanced Intelligence Access: This capability enables direct queries into global visibility, insights and intelligence from FireEye. https://www.fireeye.com/content/dam/fireeye-www/products/pdfs/pf/intel/ds-fireeye-threat-intelligence.pdf
  • #10 KP: Now that we went through the business aspects of why and how we do Threat Intel, let's briefly talk about our current application and what it does at very high level.
  • #11 RG: Our customized graph system stores data as “nodes” or “edges”. It also allows analyst to define and apply tags to nodes and edges , we can call it as attributes or characteristics. Each node represents a single object, event or evidence. For example, organizations, actor, hacker, host computers, files, and FQDNs are all represented as nodes in the graph database. Edges represent the relationships between nodes. For example, an edge exist from a threat actor to their location.
  • #12 RG : In the above diagram, blue circles indicate nodes, green arrows are edges, red labels are properties, and orange labels are aspects node 1 - email - sender mail id node 2 – filemd5 - email content message / file attachment node 3 - email – receiver mail id node 4 - ipv4addr – IP address of filemd5 node SenderEmail-ID (node) sent filemd5 email to ReceiverEmail-Id Each node has properties in our intel system. For example: The SenderEmail-Id is associated with APT3 actor - a known hacking group. Filemd5 has been associated with an email phishing campaign. ReceiverEmail-id is a tagged as victim Filemd5 has association with the IP Address from which such phishing campaigns has been executed in the past.
  • #13 RG : Over time, our intel system became very effective & popular. Its usage has increased from hand full of analysts to several hundred analysts spread across the globe. We became a victim of our own success - as we started running into performance limitations.
  • #14 RG: Based on our objectives, we started evaluating Graph database technologies like OrientDB, Synapse, AWS Neptune, JanusGraph. We had various evaluation criteria like Functional – Traversing Speed , Full text search, Concurrent users Non-functional – Pluggable storage backend , High Availability and Disaster Recovery Supportability – Strong and active user community , Already deployed in Production, Documentations
  • #15 RG: Indexing capabilities - We can define the indexes per use case. Free / Full - Text Search is a capability where the system allows users to search for records that includes one or more word within a Free Text Field. Embedded - We can embed JG with application code layer. Schema Management - Allows to define and change Schema. It also validates incoming data (schema validations). Triggers system generates Events when certain specific actions are performed on the underlying database store. OLAP - Online Analytical Processing - using distributed graph processing
  • #16 RG :
  • #17 RG
  • #18 RG: When we setup or scale the cluster, we just need to run scyall_setup.sh which sets up configs automatically. During data migration from existing to new system we got 80% compression rate.
  • #19 RG
  • #20 RG
  • #21 RG: Here is an example of how those questions are asked. We are showing a Gremlin query used to select a Node with specific property. And then traverse through the graph system and find all the other nodes it is connected via edges. As shown in the red highlighted box, the query traversed through 15,000 nodes and provided results in 322 ms - abt 10 times faster than it is in our current system.
  • #22 KP-
  • #23 KP This is a high level overview of what we built in the cloud. It is an N-tier architecture. App UI JanusGraph Scylladb (primary) & Elasticsearch (search) App API System is designed with redundancy for each of these components for scalability and HA. They are built across multiple Availability Zones so we are protected against AZ failures. Everything is in a private VPC with restricted access. Access comes in via Nginx. Authentication/authorization is handled via an Nginx/OpenResty combination to our internal IDAM server. All the business logic is abstracted in the the Application Tier.
  • #24 RG
  • #25 RG As Krishna mentioned , we have setup all system components in AWS cloud. We went through several iterations to come up with the optimal size of the cluster and resources to accomplish our goals like functionality and data migration from current system.
  • #26 RG
  • #27 RG
  • #28 RG : Using these automation tools we can build the whole stack shown in the architecture diagram with in minutes to an hour.
  • #29 RG: We ran set of queries on existing and new system , and found the new system based on Scylla is 10 times faster than the existing system.
  • #30 KP
  • #31 KP: Our experience with Scylladb has been very good. Its cost effective and performant. We are looking at opportunities to use Scylladb in other projects with in FireEye.
  • #32 KP: Finally, a big thanks to our internal FireEye team of Architects, Developers, QA & Devops. Architects and Devs worked closely with Devops to iterate and improve this solution. Our teams are spread across Reston, VA, Amsterdam & Pune, India - and we work very closely to deliver world class solutions. I would also like to extend our gratitude to JanusGraph and ScyllaDB for the excellent Scylla University resources, the community and the organizers of this Summit.
  • #33 KP