9/1/2015
Big Data Cyber Analytics
IKANOW
11921 Freedom Drive Suite 550
Reston, VA 20190
www.ikanow.com
Document Release: 1.0
Document Number: PN200
Sholeh Gregory
IKANOW
Information Security Analytics (ISA) Threat
Intelligence Platform
System Architecture Guide v1.0
Continuous Cyber Security Optimization
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
2
20
15
TABLE OF CONTENTS
Preface......................................................................................................................................................................5
Who Should use this Document...................................................................................................................................5
Conventions Used in this Document..........................................................................................................................5
Other IKANOW Documentation ..................................................................................................................................6
Contact Information.........................................................................................................................................................6
1 Cyber Security Threats Landscape ...................................................................................................................7
Current Security Paradigm .............................................................................................................................................8
Challenges Around Information Security Analytics................................................................................................8
2 Solution: IKANOW Next Generation Information Security Analytics Overview .....................................9
Open, Flexible, Scalable Threat Intelligence Platform ..........................................................................................9
Constant Calibration of ISA Security Posture........................................................................................................ 10
How IKANOW ISA Works..............................................................................................................................................11
Data Ingestion, Curation, Enrichment.......................................................................................................................13
3 Information Security Analytics Core Features ................................................................................................14
Three-Step Data Sources Ingestion ..........................................................................................................................14
Comprehensive and Collaborative Visualizations and Reports........................................................................15
Third-Party Tools and Applications Integration.....................................................................................................15
Robust Sorting and Searching.....................................................................................................................................16
4 IKANOW ISA Solution Architecture Overview ............................................................................................ 17
Traditional vs. Next-Generation ISA Application Architecture Key Points.................................................... 17
Next-Generation ISA Architecture Requirements ................................................................................................ 17
Front-End ISA Application Components.................................................................................................................19
Back-End ISA Architecture Core Components .....................................................................................................20
Middleware Data Analytics Services.......................................................................................................................... 21
5 Data Source Management.............................................................................................................................. 22
Data Sources.................................................................................................................................................................... 22
Data Source Documents.............................................................................................................................................. 22
Document Entities..................................................................................................................................................... 22
Document Associations........................................................................................................................................... 22
Matching Document Types......................................................................................................................................... 23
Top Documents.......................................................................................................................................................... 23
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
3
20
15
Filtered Documents .................................................................................................................................................. 23
Aggregations Documents........................................................................................................................................ 23
6 Basic Data Elements ....................................................................................................................................... 24
Data Objects .................................................................................................................................................................... 24
Data Services ................................................................................................................................................................... 24
Object Schema........................................................................................................................................................... 24
Data Import...................................................................................................................................................................... 25
Harvesting Data Import........................................................................................................................................... 25
Enrichment Data Import ......................................................................................................................................... 25
Data Buckets.................................................................................................................................................................... 25
Harvesting Configuration............................................................................................................................................. 25
Harvesting Data Enrichment....................................................................................................................................... 26
Data Enrichment Lists ...............................................................................................................................................27
Data Analytics...................................................................................................................................................................27
Data Security ...................................................................................................................................................................28
Plugin Libraries ................................................................................................................................................................ 29
7 Data Source Processing Pipeline...................................................................................................................30
Data Source Processing Types...................................................................................................................................30
Input Sources Processing ........................................................................................................................................30
Custom Processing Sources ...................................................................................................................................30
Data Input Sources......................................................................................................................................................... 32
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
4
20
15
IMPORTANT NOTICE
The information contained in the document is believed to be reliable, but IKANOW makes no
warranties as to its accuracy or completeness. IKANOW does not warrant or represent that any
license, either express or implied, is granted under any IKANOW patent right, copyright, or other
IKANOW intellectual property right relating to any combination or process in which IKANOW
products or services are used. Information published by IKANOW regarding third-party products
or services does not constitute a license from IKANOW to use such products or services or a
warranty or endorsement thereof. Use of such information may require a license from a third
party under the patents or other intellectual property of the third party, or a license from IKANOW
under the patents or other intellectual property of IKANOW.
IKANOW Threat Analytics Platform TM
is trademark of IKANOW, Inc.
Copyright ยฉ 2015, IKANOW Inc. All rights reserved.
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
5
20
15
PREFACE
The Information Security Analytics (ISA) Threat Intelligence Platform System Architecture Guide
describes the productโ€™s core features, architecture, design requirements, different components of
the product, and the role each component provides for the IKANOW Threat Intelligence Platform
solution.
WHO SHOULD USE THIS DOCUMENT
The following are the intended audience for this guide:
๏‚ท Cyber Security Analyst
๏‚ท Special Operations Engineer
๏‚ท Tier 1, 2, 3 Cyber Analyst
๏‚ท Social Media Analyst
๏‚ท IT Security Engineer
๏‚ท Chief Information Security Officer (CISO)
๏‚ท Sales Engineers
๏‚ท System Architects and Designers
CONVENTIONS USED IN THIS DOCUMENT
Table 1 describes the typographic conventions used in this guide.
Table 1. Typographic Conventions
Convention Meaning Example
courier font Names of commands, files,
on-screen computer output.
Edit your .login file.
Use ls -a to list all files.
machine_name% test.doc.
italics Document titles, new terms, words to be
emphasized.
Variables that you replace with a real name or
value .
Read Chapter 6 in User's
Guide.
These are called class options.
You must be root to do this.
Type rm filename to delete a
file.
boldface
Consolas font
What you type. machine_name% su
Password:
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
6
20
15
OTHER IKANOW DOCUMENTATION
๏‚ท IKANOW Community Edition Documentation
๏‚ท IKANOW Enterprise Edition Documentation
CONTACT INFORMATION
Your feedback is always welcome. Please feel free to submit questions, comments, and feedback
to info@IKANOW.com.
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
7
20
15
CYBER SECURITY THREATS LANDSCAPE
โ€œTodayโ€™s cyber security threats are dynamic and asymmetric,โ€ says Chris Morgan, President of
IKANOW.โ€ Organizations need to change their approach to tackle these new threats effectively.โ€
Companies, governments, and non-governmental organization (NGOs) alike need to understand
and defend themselves against advanced persistent threats. Cyber attacks are in the headlines
nearly every day, indicating that virtually every enterprise has been breached. In July 2015, major
data breaches against Trump Hotel Collection, AshleyMadison, UCLA Health System, Service System
Associates, and St. Francis Health have all stolen headlines.
Figure 1. IKANOW Major Breach Index, July 2015
The impact of breaches is disastrous. According to the Mandiant M-Trends Report, it takes an
average organization 229 days, or more than seven months, to just detect a data breach. Thereโ€™s a
22 percent chance that todayโ€™s data breaches will compromise 10,000 or more records, according
to the Ponemon Instituteโ€™s 2014 Cost of Data Breach Study.
Furthermore, the average cost of a data breach for Fortune 1000 companies has risen 15 percent
over the last year to $3.5 (โ‚ฌ3.15) million according to the same study. For organizations that store
personal health information, partner breaches compromising client information could result in
regulatory fines.
Response organizations and Fortune 1000 companies alike all face a similar big data problem in
this era of increasing vulnerabilities where attacks are increasingly dynamic and asymmetric in
nature. They are happening constantly wave after wave while organizations become more
vulnerable through Bring Your Own Device (BYOD) programs, increased use of cloud storage, and
the ubiquitous use of the Internet while at work.
Information security professionals need to optimize their resources to meet the rising cyber
security challenges they face today. Many organizations are looking towards big data as well as
elaborate network of disparate security systems to thwart these types of attacks. Current network
security solutions collect huge amounts of data. In fact, standard security information and event
management (SIEM) products collect so much data that companies struggle to operate them.
According to the 2013 SIEM Survey from EiQ Networks, 52 percent of all companies require two or
more full-time analysts to manage their unwieldy SIEM deployments. This does not account for
the additional monetary and personnel resources needed to analyze the extensive amount of data
11
VVEERRVVII
EEWW
OOFF
IIKKAA
NNOO
WW
HHAADD
OOOOPP
DDAATTAA
AACCCCEE
SSSS
SSEECCUU
RRIITTYY
PPRROODD
UUCCTT
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
8
20
15
many organizations collect from external threat intelligence feeds, such as FireEyeโ€™s
DTI, Symantec DeepInsights, and iSight Partners.
The problem, according to Mark Nicolett, a managing VP at Gartner, is not that
organizations do not have enough security data. โ€œWe are not suffering from a lack of data,โ€
Nicolett told Dark Reading. โ€œWe are suffering from a lack of intelligence in analyzing it.โ€ In other
words, collecting more data will be of no help if you cannot find the story within the data.
By taking these disparate sets of data out of their storage areas, and then analyzing and visualizing
it, organizations can create action on the small data that counts and ultimately improve their
security posture.
CURRENT SECURITY PARADIGM
The current security paradigm is based on a big data approach. An immense amount of data is
collected and stored from various sources such as log data, external threat intelligence feeds, and
open source intelligence (OSINT) data.
Since this data lives in separate places, there is no efficient way for even the best cyber analysts to
bring this information together, to find correlation relevance, or to take action on that data.
An advanced data analytics platform can discover the relevant, small data to conquer the big data
problem. An effective threat analytics platform can ingest external threat intelligence information
and enterprise security data to map known and previously mitigated attacks, along with current
security data to detect attacks already underway.
CHALLENGES AROUND INFORMATION SECURITY ANALYTICS
Bringing all of that big data together into one central place in a logical way, while automating
critical security tasks, allows organizations to increase productivity and streamlining the use of
resources required to understand current threats, bolster defenses and detect threats.
Analyzing and visually representing the full spectrum of internal, external, structured,
semi-structured, and unstructured data together allows organizations to find the small
data that is meaningful and actionable. Only then can IT professionals effectively deploy limited
resources and establish effective protocols for thwarting and addressing breaches.
When you understand the current threats and vulnerabilities faced by your organization, you can
effectively deploy defensive resources to protect the most valuable and most vulnerable assets.
In some cases, it might be against internal systems that can be disabled using distributed denial of
service (DDoS) attacks. In many cases, cyber attacks target proprietary or customer information.
Some intrusions leave a backdoor that becomes a foothold for future attacks.
Big data alone is not enough to defend against the ever changing and ever-increasing specters of
cyber threats. By rethinking the way security and threat intelligence data is collected, analyzed and
reported, security stakeholders can visualize the full threat landscape. This will enable them to find
the small data that really matters, allowing them to respond to threats and develop anticipatory
security strategies.
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
9
20
15
SOLUTION: IKANOW NEXT GENERATION
INFORMATION SECURITY ANALYTICS
OVERVIEW
Anticipating and responding to cyber threats requires a specialized set of tools, infrastructure, and
support. IKANOWโ€™s revolutionary solution offers an open, high-throughput big data technology
infrastructure. Within seconds, you can locate time sensitive information across terabytes of data,
therefore raising the efficiency of team members and analysts that are attentively monitoring
cyber risks around-the-clock. Yet, tools and technologies are useless without support. From
ingestion to customizing visualizations to the development of strategic operational scorecards, it is
possible to leverage IKANOWโ€™s data science competencies and iteratively customize a cyber-risk-
reduction program for your business, without ongoing cost.
IKANOW enables application of adaptable analytical techniques and measurement tools that
automate data ingestion and analysis. These features offer visibility that can save weeks in
detecting and defending against cyber security threats. The vast amount of security information
now available requires a new approach to cyber security that leverages innovative big data
management along with correlation and visualization technologies that enable information
security professionals to effectively protect their network.
OPEN, FLEXIBLE, SCALABLE THREAT INTELLIGENCE PLATFORM
As shown in Figure 2, IKANOW ISA platform
๏‚ท Provides Business Intelligence to the CISO to drive change in an organization.
๏‚ท Reduces the resource required to perform critical security tasks.
๏‚ท Provides an additional layer of defense against advanced persistent threats (APTs).
22
VVEERRVVII
EEWW
OOFF
IIKKAA
NNOO
WW
HHAADD
OOOOPP
DDAATTAA
AACCCCEE
SSSS
SSEECCUU
RRIITTYY
PPRROODD
UUCCTT
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
10
20
15
Figure 2. IKANOW ISA Security-in-Layers Platform
CONSTANT CALIBRATION OF ISA SECURITY POSTURE
IKANOW ISA platform delivers accelerated decision throughput by recalibrating the security
posture of the information security analytics for your overall security plan. The security posture is
the approach your business takes to security, from planning to implementation. It is comprised of
technical and non-technical policies, procedures and controls that protect you from both internal
and external threats. IKANOW ISA is
๏‚ท A platform to integrate threat intelligence with enterprise data and then to ingest, enrich,
analyze and visualize the results and thereby determine the risk level and security posture.
๏‚ท A framework for assessing and improving the security posture of industrial control systems
(ICS).
This platform combines the right feed for your organization by enhancing the feeds with an
analytics platform that can dramatically improve an organizationโ€™s security posture.
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
11
20
15
Figure 3. ISA Security Posture Calibration
The IKANOW platform allows Fortune 1000 organizations and government agencies to quickly
ingest various data sources and types, enrich the data, and then search and visualize the data in an
easy-to-use interface.
HOW IKANOW ISA WORKS
IKANOW ISA platform helps adjust the levers of your enterprise to cohesively align strategic and
tactical functions. Information Security first simplifies the data ingestion process by giving your
analyst team tools for ingestion and duration without the need for customized development.
By using this ingested data, external information as well as internal information can be fused,
followed by the application of filters to remove unnecessary data points. Ready-made
visualizations are then applied to identify patterns and anomalies while the results are shared with
other teams so that other groups may also be informed about any potential or actual security
breaches.. A plan is then devised by our data science team to build techniques for advanced
analytics, finding the optimal mediums for correlation and comparison. With customized
visualizations and templates, you are now armed to baseline repeatable metrics and build
cascading scorecards (dashboards) across functions to mechanize responses used to predict cyber
risks. The ISA platform equips security teams by uniting insights and creating discipline in a way
that can achieve accelerated decision throughout.
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
12
20
15
Figure 4 โ€“ IKANOW Threat Analytics High-Level View
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
13
20
15
DATA INGESTION, CURATION, ENRICHMENT
IANOW correlates data from multiple data feeds, social networks, as well as corporate security
information and event management (SIEM) data. The output can be as specific as identifying IP
addresses that have been affected by malware. Results of the analytics are presented in reports
and a dashboard which allow threats to be easily communicated, discussed, prioritized, and
resolved. This is shown in Figure 5.
Figure 5. DATA Integration, and Enrichment
ISA helps organizations constantly maintain an optimal security posture by aligning the strategic,
tactical, and operational aspects of the business. It does this with a set of core features which make
it very easy to ingest, curate and enrich data. Data sources are constantly entered into the
IKANOW Threat Analytics Platform and are continually visualized and reported on using cascading
scorecards, enabling each enterprise stakeholder to obtain timely results and drive the need for
change in security posture accordingly.
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
14
20
15
INFORMATION SECURITY ANALYTICS
CORE FEATURES
IKANOW ISA enables one to actively recalibrate your security posture by applying adaptable
analytical techniques and measurement tools that automate analysis and decision-making
processes. The following are the IKANOW ISA platform key features:
๏‚ท Three-Step Data Sources Ingestion
๏‚ท Comprehensive and Collaborative Visualizations and Reports
๏‚ท Third-Party Tools and Applications Integration
๏‚ท Robust Sorting and Searching
These capabilities provide users with the information required to effectively use the platform from
source creation and management through data visualization and reporting.
THREE-STEP DATA SOURCES INGESTION
ISA offers the ability to control data sources throughout setup, testing, operation, and publishing
and includes the ability to add and suspend data sources. ISA provides support for logstash, RSS,
CSV, S3, and various APIs. In addition, the advanced source builder option allows you to add and
edit JSON directly.
The first key feature is the source ingestion process that easily adds new source data to the
IKANOW platformโ€”structured, unstructured, or semi-structured in nature (think everything from
SIEM data to OSINT data and social media). This will all be done in a new clean and light interface
as shown in Figure 6.
33
VVEERRVVII
EEWW
OOFF
IIKKAA
NNOO
WW
HHAADD
OOOOPP
DDAATTAA
AACCCCEE
SSSS
SSEECCUU
RRIITTYY
PPRROODD
UUCCTT
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
15
20
15
Figure 6. Three-Step Data Ingestion
This will eliminate having to wait for IT to add critical data sources to the platform. This allows one
to quickly analyze and search across a comprehensive set of threat data, therefore allowing your
organization to detect and prioritize your defense against potential threats in a timely manner..
COMPREHENSIVE AND COLLABORATIVE VISUALIZATIONS AND REPORTS
ISA includes a series of reporting tools that enables one to compare threats and vulnerabilities by
assigning risk levels and tracking cost informationโ€”all to help you to determine your optimal
security strategy.
ISA also offers a threat feed tool to aid your team in determining the ongoing value of threat feeds
over time. Additional visualizations are provided to help in identifying patterns across data and to
identify indicators of compromise most relevant to your team. This means you can create
comprehensive visualizations across all of your InfoSec analytics data. These visualizations can be
shared with team members throughout the analytical process and across levels of your
organization. Enterprises can then create the necessary structures to perform self learning in order
to develop accurate pictures of results.
Figure 7. Visualizations and Reports
THIRD-PARTY TOOLS AND APPLICATIONS INTEGRATION
ISA supports the use of multiple third-party tools and applications. ISA facilitates cleansing of data
with many of these tools and applications to enable ease of sharing.. ISA is also directly integrated
with a growing number of third-party applications, including Kibana, which can be accessed
directly within ISA for direct comparability of log information.
ISA enables the use of Logstash to integrate Kibana and other third-party data analysis tools. This
allows users to read and process data through Logstash and analyze it through Kibana, or another
tool, at scale. This includes structured and unstructured threat intelligence data in a format
customized to match your SIEM log data or any other format.
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
16
20
15
Figure 8. Tools and Applications Integration
Using adaptable analytical techniques and measurements that automate the analysis process,
including IKANOWโ€™s visualization and collaboration functionality, can help constantly optimize
your security posture by staying ahead of threats and reducing enterprise risk.
ROBUST SORTING AND SEARCHING
ISA provides data filtering and organization tools that enable you quickly identify relevant data.
Search options range from verb categories to a selection of entity options that include the ability
to tag and save past searches. Once search queries are executed, further filtering options offer
additional focus across multiple predefined options, such as recent, oldest, and relevance.
You can search a combined set of data from disparate sources and formats to help uncover
relationships between internal and external data, hastening the ability to see potential threats and
their impact across the network.
Using this powerful search capability, organizations no longer need to hire a developer or contact
their vendors to perform these tasks: The InfoSec team can do this on your own their schedule,
quickly and easily.
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
17
20
15
IKANOW ISA SOLUTION ARCHITECTURE
OVERVIEW
This chapter describes the next-generation system architecture for solutions like ISA that uses the
developer-level APIs, while showing distinct differences between ISA and core components.
TRADITIONAL VS. NEXT-GENERATION ISA APPLICATION ARCHITECTURE
KEY POINTS
While traditional ISA design is to interact with by using the RESTful APIs that incorporate various
Java plugins added as the platform for applications, the next-generation ISA supports application-
specific API plugins that are added as an standard operation.
NEXT-GENERATION ISA ARCHITECTURE REQUIREMENTS
The following is a list of requirements that had to be met to qualify IKANOWโ€™s next-generation
ISA architecture:
๏‚ท Write and deploy external harvesters.
๏‚ท Write stand-alone streaming enrichment engines.
๏‚ท Develop records-based threads by using the application-specific API plugins.
๏‚ท View each datum as an "object" with a set of attributes that defines where it is stored and how
it can be processed instead of categorizing them as "document," "record," or "custom."
๏‚ท Set the schema at import time and subsequently modify it.
๏‚ท Plug in different NoSQL technologies based on their capabilities (mapped to the schema), so
that the processing will access the layer that is most sensible.
๏‚ท Store the original data in HDFS, thus enabling repeatability.
๏‚ท Assign roles by users to each node in the cluster through a centralized management User
Interface powered by Salt.
๏‚ท Decouple the user interface and applications more than in the original platform.
๏‚ท Build an Open Source platform from the start with a test infrastructure that enables partners
and the community to contribute.
๏‚ท Write in the modern JVM-based language Scala for increased concurrency and reliability.
๏‚ท Keep the document-based threads.
๏‚ท Keep the analytics-based threads.
๏‚ท Provide elasticsearch data service with both read and write capabilities.
๏‚ท Provide access context for tomcat.
๏‚ท Provide most of the management DB, including bucket CRUD, library (plugin) CRUD, share
replacement CRUD, and access to the data services.
The next step is to map the ISA functional requirements onto the ISA application model
architecture as illustrated in Figure 9.
44
VVEERRVVII
EEWW
OOFF
IIKKAA
NNOO
WW
HHAADD
OOOOPP
DDAATTAA
AACCCCEE
SSSS
SSEECCUU
RRIITTYY
PPRROODD
UUCCTT
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
18
20
15
Figure 9 illustrates the front-end, back-end, and middleware of the ISA application architecture.
Figure 9. ISA Application Architecture Components
Figure 9 illustrates the front-end, back-end, and the middleware of the ISA application
architecture as described in Table 2.
Table 2. ISA Components
ISA Architecture
Components
Meaning
Blue Show the components of the ISA application layer as
described in Section Front-End ISA Application Components.
Light Blue Indicates the core components of the ISA architecture as
explained in Section
Back-End ISA Architecture Core Components.
Very Light Blue Lists out the middleware services as described in Section
Middleware Data Analytics Services.
ISA Front-End Components
ISA Middleware Components
ISA Back-End
Components
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
19
20
15
FRONT-END ISA APPLICATION COMPONENTS
Table 3. Front-End ISA Application Components
Front-End ISA Application
Components
Description
Management and
Virtualization User
Interfaces
The ISA Manager is an a role-based user management mechanism
which enables the ingestion and management of Data Sources.
The Manager for ingestion and management enables a single
sheet of sterilized data be leveraged for analysis, reporting, and
visualization.
The Management and Virtualization user interface helps the user
with understanding and expression of information needed. The
interface help users formulate their queries, select among
available information sources, understand search results, and keep
track of the progress of their search
Some of the visualizations that ship with Information Security
Analytics require additional data processing jobs to be executed
from the platform. An IKANOW resource will be required to
execute these map reduce jobs before your data will appear in the
visualizations.
ISA Application-Specific
API Plugins
The ISA design abstracts extractionโ€“transformationโ€“loading (ETL),
enrichment, and analytics into plugins.
ETL tools are pieces of software responsible for the extraction of
data from several sources, its cleansing, customization,
reformatting, integration, and insertion into a data warehouse.
Building the ETL process is potentially one of the biggest tasks of
building a data warehouse;
Buckets and Sources These are ISA-specific connectors to external data or controlling
analytics.
The Buckets REST API creates, deletes, flushes, and retrieves
information about buckets and bucket operations.
ISA supports full access to CRUD stores.
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
20
20
15
BACK-END ISA ARCHITECTURE CORE COMPONENTS
Table 4. Back-End ISA Core Components
Back-End ISA Core
Components
Descriptions
Traditional API Plugins Plugin interfaces are through data harvest, enrichment, analytics,
as well as access context, which results in granting access to the
management DB. This enables direct access to CRUDs, the
bucket CRUD, and the binary plugin CRUD. Additionally, plugin
interfaces enable direct access to the data services at different
locations where data objects can be stored, including HDFS,
elasticsearch, MongoDB for documents, and Titan for entities
and associations.
Although document-based threads are available through these
additional APIs, analytics-based threads a accessible through
both the traditional as well as ISA application-specific APIs.
Supported analytic technologies:
- Hadoop with MongoDB
- HDFS
- Elasticsearch input/output
- Harvest Technologies
- Enrichments
Management Database
Access Context
MongoDB
Unstructured analytics using Elasticsearch and MongoDB
Data Query Services Elastic search
Traditional Source APIs,
Buckets APIs
External data ISA connectors
ISA-specific connectors to external data or controlling analytics.
Including harvest, enrichment, analytics context
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
21
20
15
MIDDLEWARE DATA ANALYTICS SERVICES
Table 5. Middleware ISA Core Components
Middleware ISA Core
Components
Description
Data Enrichment-
Enrichment Modules
A general term, referring to processes used to enhance, refine or
otherwise improve raw data. This idea and other similar concepts
contribute to making data a valuable asset for almost any
modern business or enterprise. It also shows the common
imperative of proactively using this data in various ways.
Analytic Technologies-
Analytic Modules
Java code to perform ISA specific analytics (in many cases the
plugin will be generic in nature, with ISA specific configuration).
ISA enables any analytic engine to be controlled via a bucket
(given a Java plugin). While a simple Hadoop interface is
available, ISA provides an analytic engine.
External Data Harvest
Technologies
ISA enables Java plugins to control any harvester while providing
the document pipeline and logstash.
Analytic Modules Java code to perform ISA specific analytics (in many cases the
plugin will be generic in nature, with ISA specific configuration).
Go to
DATA OBJECTS IN SECTION
Data Objects of Chapter Section Basic Data Elements for more information about data objects.
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
22
20
15
DATA SOURCE MANAGEMENT
Data sources or connectors extract data and then use visualization widgets to attain deeper
knowledge or a clearer perception of the data gathered.
Source data in the platform is stored in a JSON format as a document, where the document
format contains elements such as metadata, entities, and associations. Sources are made up of
documents that are harvested over time.
DATA SOURCES
Sources are data connectors that pull data from databases, RSS feeds, or file shares such as
directories, and single files, such as PDF, comma-separated values (CSV), XML, or ZIP. Each data
source is assigned a Title (such as Fox News RSS), Tags (for example, News, Politics, Conservative,
Republican, US) and a Type (like News).
DATA SOURCE DOCUMENTS
Each record or piece of data ingested by a source becomes a JSON document, regardless of the
format or size of the data. A document can be any of the following:
๏‚ท Article from an RSS feed
๏‚ท 40-character Tweet
๏‚ท Row from a CSV file
๏‚ท 40-page medical journal
Each JSON document contains
๏‚ท Series of metadata fields , including title, description, source ID, date, and time
๏‚ท Entities, such as person, IP-internal
๏‚ท Associations, for example hard (subject, verb, object) vs. soft
Document Entities
Document entities are who, what, and where that are extracted from a document.
๏‚ท Whoโ€”Person, Company, Organization
๏‚ท Whatโ€”Industry Term, Product, Facility
๏‚ท Whereโ€” City, Province or State, Country
Document Associations
An association is an activity or relationship between entities. It can be a subject, verb, object, at
location, or over time. These subjects and objects can be free text, while pointing to entities within
a document.
55
VVEERRVVII
EEWW
OOFF
IIKKAA
NNOO
WW
HHAADD
OOOOPP
DDAATTAA
AACCCCEE
SSSS
SSEECCUU
RRIITTYY
PPRROODD
UUCCTT
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
23
20
15
MATCHING DOCUMENT TYPES
When a query is issued, often a large number of โ€œmatchingโ€ documents will satisfy the query
criteria, particularly for a common query like "Obama.โ€ In this example, the search yields 4.2
million results that are not directly available to the widgets.
Top Documents
From all the matching documents that are retrieved, a ranked subset of these documents are
selected according to a configurable scoring method returned directly to GUI for analyzing.
These top documents are an estimate of the most relevant documents . The default number
of top documents is 100, indicating the top 100 of the 4.2 million documents are presented in
the widgets.
Filtered Documents
The widget API allows for further filtering of the top documents within the GUI by selecting a
subset of documents that contain a specific set of entities. This subset is called the filtered
documents. In the above example, a filter for "Hillary Clinton" populates widgets with only those
documents that contain both "Obama" AND "Hillary Clinton" occurrences.
Aggregations Documents
Although all matching documents contribute to the "knowledge" that a query can provide, the
documents themselves are not the only objects returned from a query. Similarly, the relevant
information to the analysis is summed, averaged, or aggregated across all matching documents
and so are referred to as the aggregations.
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
24
20
15
BASIC DATA ELEMENTS
DATA OBJECTS
Units of data in ISA are called data objects that include the following diverse objects:
๏‚ท Web pagesโ€”Raw or annotated by natural language processing
๏‚ท Video files
๏‚ท Log recordsโ€”individual or aggregated
๏‚ท Objects generated by analytics on existing data
๏‚ท KML overlays
๏‚ท Aircraft tracks
๏‚ท Business transactions
DATA SERVICES
Table 6 shows a set of logical ways called data services, in which data can be stored, indexed, and
retrieved.
Table 6. Data Services Types
Data Service Types Description
Document As an annotated document, which is a JSON object with a
formatted sub-object describing entities, associations between
entities, user comments, etc.
Search index A searchable object.
Columnar A related set of columns.
Graph A collection of nodes and edges.
Storage layer A set of "opaque" objects within a file.
Temporal, geo-spatial Enables time and geo-specific processing.
Data warehouse A relational view of the data well-suited to traditional OLAP-type
processing.
Object Schema
How an object is handled by ISA data services is defined by its schema (DataSchemaBean). The
schema describes the different properties relative to each service, for example, which columns
should be stored in columnar fashion, how the graph should be constructed from the objects, for
how long objects should be stored, etc.
66
VVEERRVVII
EEWW
OOFF
IIKKAA
NNOO
WW
HHAADD
OOOOPP
DDAATTAA
AACCCCEE
SSSS
SSEECCUU
RRIITTYY
PPRROODD
UUCCTT
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
25
20
15
DATA IMPORT
Data is imported into ISA by means of buckets (DataBucketBean), which has two properties:
harvesting and enrichment data import.
Harvesting Data Import
Taking data from any transport layer and, in turn returning a set of JSON objects.
Enrichment Data Import
Taking data from the harvest, filtering unwanted objects, formatting or creating the desired fields,
applying internal or external functionalities such as geo-location, natural processing, lookups via
other buckets, arbitrary business logic, and so on.
DATA BUCKETS
The data schema to be applied to all objects in this bucket. Data buckets also have standard
metadata shown in Table 7.
Table 7. Data Bucket Metadata
Data Bucket Metadata Description
Access Rights A set of access rights, as described in Data Security below.
Metadata Grouping Grouping metadata, while they can be grouped in a number of
different ways:
Multi-Bucket A specific multi-bucket that is a collection of other buckets.
Multiple buckets can be referenced by parent folders.
Bucket File system Each bucket has a file system hierarchy that physically maps
onto where data is stored in the storage service or HDFS.
Multiple Buckets Alias Each bucket can also be assigned a common alias name that can
refer to multiple buckets.
HARVESTING CONFIGURATION
Harvesting configuration consists of three different parts:
Table 8. Harvesting Configuration Types
Harvesting Configuration
Types
Description
Harvest Technology A JVM JAR implementation (IHarvestTechologyModule)
whose callbacks are invoked whenever pre-defined actions occur
on a bucket at the time it is created.
The Harvester is then free to do processing, typically launching or
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
26
20
15
re-configuring external processes such as Hadoop, Flume,
Logstash, or a web crawler. This process results in ingesting
objects using either the HDFS file interface (for batch operations)
or functionality provided by an injected (IHarvestContext)
object for streaming operations.
Harvest Module Optionally, a set of harvest module JVM JARs whose format is
defined by the author of the Harvest Technology.
ISA enables the upload, access-permissions, discovery, and
retrieval of Harvest Module Libraries. These libraries will typically
provide de-framing of the data from its transport layer and
JSON-ification.
Harvest Technology A list of Harvest Technology-specific JSON configuration objects.
HARVESTING DATA ENRICHMENT
Enrichment of the data that has been harvested can take one of two forms: streaming or batch
enrichment as shown in Table 9.
Table 9. Data Enrichment Forms
Data Enrichment Forms Description
Streaming Data Enrichment Streaming enrichment, where each object is processed as soon
as it is received.
Streaming enrichment use the Storm framework together with
Kafka for messaging.
Batch Enrichment Batch enrichment, where enrichment is performed on sets of
objects is more efficient but introduces latency and so is not
suitable for alerting purpose.
Batch enrichment will use the Hadoop, YARN, or Spark
framework.
Typically only one of the two supported enrichment forms is used. This means, you can take log
records, perform batch processing on them, and then store them efficiently while performing a
smaller set of enrichment processes in near-real-time and discarding most objects except for
broadcasting "alerts" to listeners.
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
27
20
15
Data Enrichment Lists
Table 10 shows that enrichment consists of two lists, batch and streaming.
Table 10. Data Enrichment Lists
Data Enrichment Lists Contents
JVM JAR files A list of JVM JAR files obtained from the Enrichment Module
Library. One of the JAR files in this list must implement
(IenrichmentBatchModule) or
(IenrichmentStreamingModule).
The other JAR files can be in arbitrary format and can be used to
provide functional libraries, for example the Stanford NLP set of
JARs, internal utilities, etc.
JSON configuration object A JSON configuration object passed into the module at startup.
Dependencies The dependencies between the modules that can be used for
batch processing to enrich the objects in parallel.
Similar to the harvester, enrichment modules have an (IenrichmentModuleContext)injected that
enables the interaction with the core framework to filter objects, log errors, etc.
At the end of the enrichment stage, batch or streaming, the extracted, transformed, and enriched
object is automatically passed on to each of the data services as defined in its schema for storage
and indexing. It can also be broadcasted across an object bus for analytics or API listeners to
process as described in Data Analytics below.
Note
A bucket can be generated without any harvesting and enrichment. It can point to an existing
collection in the database or to an empty bucket that can then be populated either manually or by
using analytic threads.
DATA ANALYTICS
An Analytic Thread (AnalyticThreadBean) takes data from one or more populated buckets and
then applies arbitrary further processing by using user-defined technologies such as Hadoop,
Spark, Storm, Mahout, and Gephi.
Furthermore, these Analytic Threads will be contained the bucket corresponding to the output
location of the results. Table 11 shows each Analytic Thread from which it is comprises.
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
28
20
15
Table 11. Data Analytics Components
Data Analytics
Components
Contents
Analytic Technology The name of an Analytic Technology, JVM JAR file implementing
IAnalyticsTechologyModule whose callbacks are invoked
whenever a pre-defined action occurs, such as user interactions,
another analytic thread completes, a bucket obtaining more data,
on a regular schedule. The analytic technology will be responsible
for queuing the desired analytics as defined by the remaining items
on this list.
Data Query Services A set of inputs together with associated queries in the "language"
of whichever "Data Service" is being used. For example, this could
be "search term" queries, temporal queries, geo-spatial queries,
"graph" queries, etc.
Analytic Modules A list of Analytic Modules, JVM JARs managed by the ISA Library
whose format is defined by the corresponding Analytic
Technology.
A configuration object describing the details of the analytics input,
output, etc.
A set of dependencies within the analytic thread such as run
module1, then module2, etc.
Analytic Thread The Analytic Thread run over the specified data and dump the
output into one of more buckets with the appropriate data
schemas.
The output can treat existing data in the output buckets in one of
the following ways:
- Wipe and start again each time
- Add data incrementally
- Merge with existing data
Instead of taking the data At Rest from a bucket, objects can be streamed In Flight for real-time or
near-real time analytics and alerting. The analytic thread in this case registers a bucket name and
the pipeline stage before enrichment, after enrichment, or in the middle of enrichment (after the
named enrichment module).
DATA SECURITY
Security in ISA is delegated to a separate service, typically invoking an existing security scheme
such as Kerberos and IKANOW ISA.
More on this topic, including the security architecture is described in the IKANOW Security
Architecture Guide (TBD).
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
29
20
15
PLUGIN LIBRARIES
The following plugin functionalities could be configured from libraries (SharedLibraryBean):
๏‚ท Harvest Technologies
๏‚ท Harvest Modules
๏‚ท Enrichment Modules
๏‚ท Analytic Technologies
๏‚ท Analytic Modules
๏‚ท Access Modules
ISA provides a library upload, storage, and retrieval services. Libraries are tagged for discovery and
have access tokens assigned to them that determines who can use them. For example, different
analytics or APIs can be restricted based on "user group," for example commercial tiers for SaaS,
by division in a large organization, etc.
Note
Currently only the administrator can upload libraries for security reasons and then sets the access
tokens to decide who can use the libraries.
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
30
20
15
DATA SOURCE PROCESSING PIPELINE
Pipeline in ISA is the flow of different components in Informatics. A mapping in Informatics may
contain Source, Transformations, and Targets that are connected together to make up a pipeline.
Many such pipelines in a single mapping can exist. A single pipeline takes place when one pipeline
is connected to another.
DATA SOURCE PROCESSING TYPES
IKANOW ISA platform supports two types of complex processing: input and custom source
processing.
Input Sources Processing
This type of source processing allows many different types of data be input into documents or
records. Documents are larger and more complex objects are typically generated from more
complex XML/JSON, as well as natural languages such as web-sites and reports.
The ISA platform provides a powerful pipeline of templated operations to transform these data
types into ISAโ€™s generic document model.
Records are smaller objects like single line log records, simple JSON objects, SQL records, and so
on. ISA places almost no restrictions on the format of the JSON, including how it is to be imported
into the system even though it integrates particularly well with the popular community-driven
platform logstash to collect, enrich, and transport data.
Custom Processing Sources
Custom source processing involves applying custom logic to existing documents and records to
enrich the system with new data and functionalities as shown in Table 12.
Table 12. Custom Processing Sources New Data and Functionalities
New Data and
Functionalities
Description
Reports Such as spreadsheets or statistical data containing directly
actionable information.
New records and
documents
Typically alerts, or aggregate "events" made up of multiple
documents and records.
Lookup tables Tables that can be used to enrich new and existing documents like
local asset information, generate alerts for malicious domains, etc.
IKANOW uses the popular Hadoop ecosystem to power its custom processing capabilities,
integrating its output, management, monitoring and security layers.
77
VVEERRVVII
EEWW
OOFF
IIKKAA
NNOO
WW
HHAADD
OOOOPP
DDAATTAA
AACCCCEE
SSSS
SSEECCUU
RRIITTYY
PPRROODD
UUCCTT
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
31
20
15
Figure 10 shows how these two different activities, input and custom source processing are
related.
Figure 10. Input and Custom Source Processing Relationships
The same JSON-based configuration language along with associated user interface can be used to
build and maintain both types of pipelines. Typically, the elements do not mix. That is, a pipeline
consists entirely of elements from either the "standard" set or the "custom" set even though there
are some exceptions described below.
32
2015
DATA INPUT SOURCES
The ISA architecture enables harvesting and enrichment that is a more logical process based on the concept of applying a pipeline of
processing elements to documents proceeding from a source. This capability is illustrated in Figure 11 below:
Figure 11. Pipeline Elements Processing
IKANOW Information Security Analytics (ISA) Threat Intelligence Platform
33
20
15
Pipeline elements can be in any order and have any cardinality.
For example you could create metadata from raw HTML (using xpath), have an automated text
extractor followed by pulling more metadata using regex/javascript, return to the original raw
text, and then run a different automated extractor before creating entities.
A very useful scenario involves running the data through several entity extractors, potentially
using the "criteria" field to choose which one to run based on the content and metadata
extracted.
Figure 11 above shows the pipeline elements can be approximately grouped into the following
categories shown in Table 13.
Table 13. Pipeline Element Categories
Pipeline Element
Categories
Descriptions
Extractors Generates mostly empty ISA documents from external data
sources.
Global Generate javascript artifacts that can be used by subsequent
pipeline elements.
Secondary extractors Enables new documents to be produced in large number from
the existing metadata.
Text extraction Manipulates the raw document content.
Metadata Generates document metadata such as title, description, date, as
well as arbitrary content metadata that use xpath, regex, and
javascript.
Entities and associations Creates entities and associations out of the text.
Storage and indexing Decides which documents to keep, what fields to keep, and what
to index full text for searching using the GUI/API.

IKANOW System Architecture Guide

  • 1.
    9/1/2015 Big Data CyberAnalytics IKANOW 11921 Freedom Drive Suite 550 Reston, VA 20190 www.ikanow.com Document Release: 1.0 Document Number: PN200 Sholeh Gregory IKANOW Information Security Analytics (ISA) Threat Intelligence Platform System Architecture Guide v1.0 Continuous Cyber Security Optimization
  • 2.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 2 20 15 TABLE OF CONTENTS Preface......................................................................................................................................................................5 Who Should use this Document...................................................................................................................................5 Conventions Used in this Document..........................................................................................................................5 Other IKANOW Documentation ..................................................................................................................................6 Contact Information.........................................................................................................................................................6 1 Cyber Security Threats Landscape ...................................................................................................................7 Current Security Paradigm .............................................................................................................................................8 Challenges Around Information Security Analytics................................................................................................8 2 Solution: IKANOW Next Generation Information Security Analytics Overview .....................................9 Open, Flexible, Scalable Threat Intelligence Platform ..........................................................................................9 Constant Calibration of ISA Security Posture........................................................................................................ 10 How IKANOW ISA Works..............................................................................................................................................11 Data Ingestion, Curation, Enrichment.......................................................................................................................13 3 Information Security Analytics Core Features ................................................................................................14 Three-Step Data Sources Ingestion ..........................................................................................................................14 Comprehensive and Collaborative Visualizations and Reports........................................................................15 Third-Party Tools and Applications Integration.....................................................................................................15 Robust Sorting and Searching.....................................................................................................................................16 4 IKANOW ISA Solution Architecture Overview ............................................................................................ 17 Traditional vs. Next-Generation ISA Application Architecture Key Points.................................................... 17 Next-Generation ISA Architecture Requirements ................................................................................................ 17 Front-End ISA Application Components.................................................................................................................19 Back-End ISA Architecture Core Components .....................................................................................................20 Middleware Data Analytics Services.......................................................................................................................... 21 5 Data Source Management.............................................................................................................................. 22 Data Sources.................................................................................................................................................................... 22 Data Source Documents.............................................................................................................................................. 22 Document Entities..................................................................................................................................................... 22 Document Associations........................................................................................................................................... 22 Matching Document Types......................................................................................................................................... 23 Top Documents.......................................................................................................................................................... 23
  • 3.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 3 20 15 Filtered Documents .................................................................................................................................................. 23 Aggregations Documents........................................................................................................................................ 23 6 Basic Data Elements ....................................................................................................................................... 24 Data Objects .................................................................................................................................................................... 24 Data Services ................................................................................................................................................................... 24 Object Schema........................................................................................................................................................... 24 Data Import...................................................................................................................................................................... 25 Harvesting Data Import........................................................................................................................................... 25 Enrichment Data Import ......................................................................................................................................... 25 Data Buckets.................................................................................................................................................................... 25 Harvesting Configuration............................................................................................................................................. 25 Harvesting Data Enrichment....................................................................................................................................... 26 Data Enrichment Lists ...............................................................................................................................................27 Data Analytics...................................................................................................................................................................27 Data Security ...................................................................................................................................................................28 Plugin Libraries ................................................................................................................................................................ 29 7 Data Source Processing Pipeline...................................................................................................................30 Data Source Processing Types...................................................................................................................................30 Input Sources Processing ........................................................................................................................................30 Custom Processing Sources ...................................................................................................................................30 Data Input Sources......................................................................................................................................................... 32
  • 4.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 4 20 15 IMPORTANT NOTICE The information contained in the document is believed to be reliable, but IKANOW makes no warranties as to its accuracy or completeness. IKANOW does not warrant or represent that any license, either express or implied, is granted under any IKANOW patent right, copyright, or other IKANOW intellectual property right relating to any combination or process in which IKANOW products or services are used. Information published by IKANOW regarding third-party products or services does not constitute a license from IKANOW to use such products or services or a warranty or endorsement thereof. Use of such information may require a license from a third party under the patents or other intellectual property of the third party, or a license from IKANOW under the patents or other intellectual property of IKANOW. IKANOW Threat Analytics Platform TM is trademark of IKANOW, Inc. Copyright ยฉ 2015, IKANOW Inc. All rights reserved.
  • 5.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 5 20 15 PREFACE The Information Security Analytics (ISA) Threat Intelligence Platform System Architecture Guide describes the productโ€™s core features, architecture, design requirements, different components of the product, and the role each component provides for the IKANOW Threat Intelligence Platform solution. WHO SHOULD USE THIS DOCUMENT The following are the intended audience for this guide: ๏‚ท Cyber Security Analyst ๏‚ท Special Operations Engineer ๏‚ท Tier 1, 2, 3 Cyber Analyst ๏‚ท Social Media Analyst ๏‚ท IT Security Engineer ๏‚ท Chief Information Security Officer (CISO) ๏‚ท Sales Engineers ๏‚ท System Architects and Designers CONVENTIONS USED IN THIS DOCUMENT Table 1 describes the typographic conventions used in this guide. Table 1. Typographic Conventions Convention Meaning Example courier font Names of commands, files, on-screen computer output. Edit your .login file. Use ls -a to list all files. machine_name% test.doc. italics Document titles, new terms, words to be emphasized. Variables that you replace with a real name or value . Read Chapter 6 in User's Guide. These are called class options. You must be root to do this. Type rm filename to delete a file. boldface Consolas font What you type. machine_name% su Password:
  • 6.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 6 20 15 OTHER IKANOW DOCUMENTATION ๏‚ท IKANOW Community Edition Documentation ๏‚ท IKANOW Enterprise Edition Documentation CONTACT INFORMATION Your feedback is always welcome. Please feel free to submit questions, comments, and feedback to info@IKANOW.com.
  • 7.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 7 20 15 CYBER SECURITY THREATS LANDSCAPE โ€œTodayโ€™s cyber security threats are dynamic and asymmetric,โ€ says Chris Morgan, President of IKANOW.โ€ Organizations need to change their approach to tackle these new threats effectively.โ€ Companies, governments, and non-governmental organization (NGOs) alike need to understand and defend themselves against advanced persistent threats. Cyber attacks are in the headlines nearly every day, indicating that virtually every enterprise has been breached. In July 2015, major data breaches against Trump Hotel Collection, AshleyMadison, UCLA Health System, Service System Associates, and St. Francis Health have all stolen headlines. Figure 1. IKANOW Major Breach Index, July 2015 The impact of breaches is disastrous. According to the Mandiant M-Trends Report, it takes an average organization 229 days, or more than seven months, to just detect a data breach. Thereโ€™s a 22 percent chance that todayโ€™s data breaches will compromise 10,000 or more records, according to the Ponemon Instituteโ€™s 2014 Cost of Data Breach Study. Furthermore, the average cost of a data breach for Fortune 1000 companies has risen 15 percent over the last year to $3.5 (โ‚ฌ3.15) million according to the same study. For organizations that store personal health information, partner breaches compromising client information could result in regulatory fines. Response organizations and Fortune 1000 companies alike all face a similar big data problem in this era of increasing vulnerabilities where attacks are increasingly dynamic and asymmetric in nature. They are happening constantly wave after wave while organizations become more vulnerable through Bring Your Own Device (BYOD) programs, increased use of cloud storage, and the ubiquitous use of the Internet while at work. Information security professionals need to optimize their resources to meet the rising cyber security challenges they face today. Many organizations are looking towards big data as well as elaborate network of disparate security systems to thwart these types of attacks. Current network security solutions collect huge amounts of data. In fact, standard security information and event management (SIEM) products collect so much data that companies struggle to operate them. According to the 2013 SIEM Survey from EiQ Networks, 52 percent of all companies require two or more full-time analysts to manage their unwieldy SIEM deployments. This does not account for the additional monetary and personnel resources needed to analyze the extensive amount of data 11 VVEERRVVII EEWW OOFF IIKKAA NNOO WW HHAADD OOOOPP DDAATTAA AACCCCEE SSSS SSEECCUU RRIITTYY PPRROODD UUCCTT
  • 8.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 8 20 15 many organizations collect from external threat intelligence feeds, such as FireEyeโ€™s DTI, Symantec DeepInsights, and iSight Partners. The problem, according to Mark Nicolett, a managing VP at Gartner, is not that organizations do not have enough security data. โ€œWe are not suffering from a lack of data,โ€ Nicolett told Dark Reading. โ€œWe are suffering from a lack of intelligence in analyzing it.โ€ In other words, collecting more data will be of no help if you cannot find the story within the data. By taking these disparate sets of data out of their storage areas, and then analyzing and visualizing it, organizations can create action on the small data that counts and ultimately improve their security posture. CURRENT SECURITY PARADIGM The current security paradigm is based on a big data approach. An immense amount of data is collected and stored from various sources such as log data, external threat intelligence feeds, and open source intelligence (OSINT) data. Since this data lives in separate places, there is no efficient way for even the best cyber analysts to bring this information together, to find correlation relevance, or to take action on that data. An advanced data analytics platform can discover the relevant, small data to conquer the big data problem. An effective threat analytics platform can ingest external threat intelligence information and enterprise security data to map known and previously mitigated attacks, along with current security data to detect attacks already underway. CHALLENGES AROUND INFORMATION SECURITY ANALYTICS Bringing all of that big data together into one central place in a logical way, while automating critical security tasks, allows organizations to increase productivity and streamlining the use of resources required to understand current threats, bolster defenses and detect threats. Analyzing and visually representing the full spectrum of internal, external, structured, semi-structured, and unstructured data together allows organizations to find the small data that is meaningful and actionable. Only then can IT professionals effectively deploy limited resources and establish effective protocols for thwarting and addressing breaches. When you understand the current threats and vulnerabilities faced by your organization, you can effectively deploy defensive resources to protect the most valuable and most vulnerable assets. In some cases, it might be against internal systems that can be disabled using distributed denial of service (DDoS) attacks. In many cases, cyber attacks target proprietary or customer information. Some intrusions leave a backdoor that becomes a foothold for future attacks. Big data alone is not enough to defend against the ever changing and ever-increasing specters of cyber threats. By rethinking the way security and threat intelligence data is collected, analyzed and reported, security stakeholders can visualize the full threat landscape. This will enable them to find the small data that really matters, allowing them to respond to threats and develop anticipatory security strategies.
  • 9.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 9 20 15 SOLUTION: IKANOW NEXT GENERATION INFORMATION SECURITY ANALYTICS OVERVIEW Anticipating and responding to cyber threats requires a specialized set of tools, infrastructure, and support. IKANOWโ€™s revolutionary solution offers an open, high-throughput big data technology infrastructure. Within seconds, you can locate time sensitive information across terabytes of data, therefore raising the efficiency of team members and analysts that are attentively monitoring cyber risks around-the-clock. Yet, tools and technologies are useless without support. From ingestion to customizing visualizations to the development of strategic operational scorecards, it is possible to leverage IKANOWโ€™s data science competencies and iteratively customize a cyber-risk- reduction program for your business, without ongoing cost. IKANOW enables application of adaptable analytical techniques and measurement tools that automate data ingestion and analysis. These features offer visibility that can save weeks in detecting and defending against cyber security threats. The vast amount of security information now available requires a new approach to cyber security that leverages innovative big data management along with correlation and visualization technologies that enable information security professionals to effectively protect their network. OPEN, FLEXIBLE, SCALABLE THREAT INTELLIGENCE PLATFORM As shown in Figure 2, IKANOW ISA platform ๏‚ท Provides Business Intelligence to the CISO to drive change in an organization. ๏‚ท Reduces the resource required to perform critical security tasks. ๏‚ท Provides an additional layer of defense against advanced persistent threats (APTs). 22 VVEERRVVII EEWW OOFF IIKKAA NNOO WW HHAADD OOOOPP DDAATTAA AACCCCEE SSSS SSEECCUU RRIITTYY PPRROODD UUCCTT
  • 10.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 10 20 15 Figure 2. IKANOW ISA Security-in-Layers Platform CONSTANT CALIBRATION OF ISA SECURITY POSTURE IKANOW ISA platform delivers accelerated decision throughput by recalibrating the security posture of the information security analytics for your overall security plan. The security posture is the approach your business takes to security, from planning to implementation. It is comprised of technical and non-technical policies, procedures and controls that protect you from both internal and external threats. IKANOW ISA is ๏‚ท A platform to integrate threat intelligence with enterprise data and then to ingest, enrich, analyze and visualize the results and thereby determine the risk level and security posture. ๏‚ท A framework for assessing and improving the security posture of industrial control systems (ICS). This platform combines the right feed for your organization by enhancing the feeds with an analytics platform that can dramatically improve an organizationโ€™s security posture.
  • 11.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 11 20 15 Figure 3. ISA Security Posture Calibration The IKANOW platform allows Fortune 1000 organizations and government agencies to quickly ingest various data sources and types, enrich the data, and then search and visualize the data in an easy-to-use interface. HOW IKANOW ISA WORKS IKANOW ISA platform helps adjust the levers of your enterprise to cohesively align strategic and tactical functions. Information Security first simplifies the data ingestion process by giving your analyst team tools for ingestion and duration without the need for customized development. By using this ingested data, external information as well as internal information can be fused, followed by the application of filters to remove unnecessary data points. Ready-made visualizations are then applied to identify patterns and anomalies while the results are shared with other teams so that other groups may also be informed about any potential or actual security breaches.. A plan is then devised by our data science team to build techniques for advanced analytics, finding the optimal mediums for correlation and comparison. With customized visualizations and templates, you are now armed to baseline repeatable metrics and build cascading scorecards (dashboards) across functions to mechanize responses used to predict cyber risks. The ISA platform equips security teams by uniting insights and creating discipline in a way that can achieve accelerated decision throughout.
  • 12.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 12 20 15 Figure 4 โ€“ IKANOW Threat Analytics High-Level View
  • 13.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 13 20 15 DATA INGESTION, CURATION, ENRICHMENT IANOW correlates data from multiple data feeds, social networks, as well as corporate security information and event management (SIEM) data. The output can be as specific as identifying IP addresses that have been affected by malware. Results of the analytics are presented in reports and a dashboard which allow threats to be easily communicated, discussed, prioritized, and resolved. This is shown in Figure 5. Figure 5. DATA Integration, and Enrichment ISA helps organizations constantly maintain an optimal security posture by aligning the strategic, tactical, and operational aspects of the business. It does this with a set of core features which make it very easy to ingest, curate and enrich data. Data sources are constantly entered into the IKANOW Threat Analytics Platform and are continually visualized and reported on using cascading scorecards, enabling each enterprise stakeholder to obtain timely results and drive the need for change in security posture accordingly.
  • 14.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 14 20 15 INFORMATION SECURITY ANALYTICS CORE FEATURES IKANOW ISA enables one to actively recalibrate your security posture by applying adaptable analytical techniques and measurement tools that automate analysis and decision-making processes. The following are the IKANOW ISA platform key features: ๏‚ท Three-Step Data Sources Ingestion ๏‚ท Comprehensive and Collaborative Visualizations and Reports ๏‚ท Third-Party Tools and Applications Integration ๏‚ท Robust Sorting and Searching These capabilities provide users with the information required to effectively use the platform from source creation and management through data visualization and reporting. THREE-STEP DATA SOURCES INGESTION ISA offers the ability to control data sources throughout setup, testing, operation, and publishing and includes the ability to add and suspend data sources. ISA provides support for logstash, RSS, CSV, S3, and various APIs. In addition, the advanced source builder option allows you to add and edit JSON directly. The first key feature is the source ingestion process that easily adds new source data to the IKANOW platformโ€”structured, unstructured, or semi-structured in nature (think everything from SIEM data to OSINT data and social media). This will all be done in a new clean and light interface as shown in Figure 6. 33 VVEERRVVII EEWW OOFF IIKKAA NNOO WW HHAADD OOOOPP DDAATTAA AACCCCEE SSSS SSEECCUU RRIITTYY PPRROODD UUCCTT
  • 15.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 15 20 15 Figure 6. Three-Step Data Ingestion This will eliminate having to wait for IT to add critical data sources to the platform. This allows one to quickly analyze and search across a comprehensive set of threat data, therefore allowing your organization to detect and prioritize your defense against potential threats in a timely manner.. COMPREHENSIVE AND COLLABORATIVE VISUALIZATIONS AND REPORTS ISA includes a series of reporting tools that enables one to compare threats and vulnerabilities by assigning risk levels and tracking cost informationโ€”all to help you to determine your optimal security strategy. ISA also offers a threat feed tool to aid your team in determining the ongoing value of threat feeds over time. Additional visualizations are provided to help in identifying patterns across data and to identify indicators of compromise most relevant to your team. This means you can create comprehensive visualizations across all of your InfoSec analytics data. These visualizations can be shared with team members throughout the analytical process and across levels of your organization. Enterprises can then create the necessary structures to perform self learning in order to develop accurate pictures of results. Figure 7. Visualizations and Reports THIRD-PARTY TOOLS AND APPLICATIONS INTEGRATION ISA supports the use of multiple third-party tools and applications. ISA facilitates cleansing of data with many of these tools and applications to enable ease of sharing.. ISA is also directly integrated with a growing number of third-party applications, including Kibana, which can be accessed directly within ISA for direct comparability of log information. ISA enables the use of Logstash to integrate Kibana and other third-party data analysis tools. This allows users to read and process data through Logstash and analyze it through Kibana, or another tool, at scale. This includes structured and unstructured threat intelligence data in a format customized to match your SIEM log data or any other format.
  • 16.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 16 20 15 Figure 8. Tools and Applications Integration Using adaptable analytical techniques and measurements that automate the analysis process, including IKANOWโ€™s visualization and collaboration functionality, can help constantly optimize your security posture by staying ahead of threats and reducing enterprise risk. ROBUST SORTING AND SEARCHING ISA provides data filtering and organization tools that enable you quickly identify relevant data. Search options range from verb categories to a selection of entity options that include the ability to tag and save past searches. Once search queries are executed, further filtering options offer additional focus across multiple predefined options, such as recent, oldest, and relevance. You can search a combined set of data from disparate sources and formats to help uncover relationships between internal and external data, hastening the ability to see potential threats and their impact across the network. Using this powerful search capability, organizations no longer need to hire a developer or contact their vendors to perform these tasks: The InfoSec team can do this on your own their schedule, quickly and easily.
  • 17.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 17 20 15 IKANOW ISA SOLUTION ARCHITECTURE OVERVIEW This chapter describes the next-generation system architecture for solutions like ISA that uses the developer-level APIs, while showing distinct differences between ISA and core components. TRADITIONAL VS. NEXT-GENERATION ISA APPLICATION ARCHITECTURE KEY POINTS While traditional ISA design is to interact with by using the RESTful APIs that incorporate various Java plugins added as the platform for applications, the next-generation ISA supports application- specific API plugins that are added as an standard operation. NEXT-GENERATION ISA ARCHITECTURE REQUIREMENTS The following is a list of requirements that had to be met to qualify IKANOWโ€™s next-generation ISA architecture: ๏‚ท Write and deploy external harvesters. ๏‚ท Write stand-alone streaming enrichment engines. ๏‚ท Develop records-based threads by using the application-specific API plugins. ๏‚ท View each datum as an "object" with a set of attributes that defines where it is stored and how it can be processed instead of categorizing them as "document," "record," or "custom." ๏‚ท Set the schema at import time and subsequently modify it. ๏‚ท Plug in different NoSQL technologies based on their capabilities (mapped to the schema), so that the processing will access the layer that is most sensible. ๏‚ท Store the original data in HDFS, thus enabling repeatability. ๏‚ท Assign roles by users to each node in the cluster through a centralized management User Interface powered by Salt. ๏‚ท Decouple the user interface and applications more than in the original platform. ๏‚ท Build an Open Source platform from the start with a test infrastructure that enables partners and the community to contribute. ๏‚ท Write in the modern JVM-based language Scala for increased concurrency and reliability. ๏‚ท Keep the document-based threads. ๏‚ท Keep the analytics-based threads. ๏‚ท Provide elasticsearch data service with both read and write capabilities. ๏‚ท Provide access context for tomcat. ๏‚ท Provide most of the management DB, including bucket CRUD, library (plugin) CRUD, share replacement CRUD, and access to the data services. The next step is to map the ISA functional requirements onto the ISA application model architecture as illustrated in Figure 9. 44 VVEERRVVII EEWW OOFF IIKKAA NNOO WW HHAADD OOOOPP DDAATTAA AACCCCEE SSSS SSEECCUU RRIITTYY PPRROODD UUCCTT
  • 18.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 18 20 15 Figure 9 illustrates the front-end, back-end, and middleware of the ISA application architecture. Figure 9. ISA Application Architecture Components Figure 9 illustrates the front-end, back-end, and the middleware of the ISA application architecture as described in Table 2. Table 2. ISA Components ISA Architecture Components Meaning Blue Show the components of the ISA application layer as described in Section Front-End ISA Application Components. Light Blue Indicates the core components of the ISA architecture as explained in Section Back-End ISA Architecture Core Components. Very Light Blue Lists out the middleware services as described in Section Middleware Data Analytics Services. ISA Front-End Components ISA Middleware Components ISA Back-End Components
  • 19.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 19 20 15 FRONT-END ISA APPLICATION COMPONENTS Table 3. Front-End ISA Application Components Front-End ISA Application Components Description Management and Virtualization User Interfaces The ISA Manager is an a role-based user management mechanism which enables the ingestion and management of Data Sources. The Manager for ingestion and management enables a single sheet of sterilized data be leveraged for analysis, reporting, and visualization. The Management and Virtualization user interface helps the user with understanding and expression of information needed. The interface help users formulate their queries, select among available information sources, understand search results, and keep track of the progress of their search Some of the visualizations that ship with Information Security Analytics require additional data processing jobs to be executed from the platform. An IKANOW resource will be required to execute these map reduce jobs before your data will appear in the visualizations. ISA Application-Specific API Plugins The ISA design abstracts extractionโ€“transformationโ€“loading (ETL), enrichment, and analytics into plugins. ETL tools are pieces of software responsible for the extraction of data from several sources, its cleansing, customization, reformatting, integration, and insertion into a data warehouse. Building the ETL process is potentially one of the biggest tasks of building a data warehouse; Buckets and Sources These are ISA-specific connectors to external data or controlling analytics. The Buckets REST API creates, deletes, flushes, and retrieves information about buckets and bucket operations. ISA supports full access to CRUD stores.
  • 20.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 20 20 15 BACK-END ISA ARCHITECTURE CORE COMPONENTS Table 4. Back-End ISA Core Components Back-End ISA Core Components Descriptions Traditional API Plugins Plugin interfaces are through data harvest, enrichment, analytics, as well as access context, which results in granting access to the management DB. This enables direct access to CRUDs, the bucket CRUD, and the binary plugin CRUD. Additionally, plugin interfaces enable direct access to the data services at different locations where data objects can be stored, including HDFS, elasticsearch, MongoDB for documents, and Titan for entities and associations. Although document-based threads are available through these additional APIs, analytics-based threads a accessible through both the traditional as well as ISA application-specific APIs. Supported analytic technologies: - Hadoop with MongoDB - HDFS - Elasticsearch input/output - Harvest Technologies - Enrichments Management Database Access Context MongoDB Unstructured analytics using Elasticsearch and MongoDB Data Query Services Elastic search Traditional Source APIs, Buckets APIs External data ISA connectors ISA-specific connectors to external data or controlling analytics. Including harvest, enrichment, analytics context
  • 21.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 21 20 15 MIDDLEWARE DATA ANALYTICS SERVICES Table 5. Middleware ISA Core Components Middleware ISA Core Components Description Data Enrichment- Enrichment Modules A general term, referring to processes used to enhance, refine or otherwise improve raw data. This idea and other similar concepts contribute to making data a valuable asset for almost any modern business or enterprise. It also shows the common imperative of proactively using this data in various ways. Analytic Technologies- Analytic Modules Java code to perform ISA specific analytics (in many cases the plugin will be generic in nature, with ISA specific configuration). ISA enables any analytic engine to be controlled via a bucket (given a Java plugin). While a simple Hadoop interface is available, ISA provides an analytic engine. External Data Harvest Technologies ISA enables Java plugins to control any harvester while providing the document pipeline and logstash. Analytic Modules Java code to perform ISA specific analytics (in many cases the plugin will be generic in nature, with ISA specific configuration). Go to DATA OBJECTS IN SECTION Data Objects of Chapter Section Basic Data Elements for more information about data objects.
  • 22.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 22 20 15 DATA SOURCE MANAGEMENT Data sources or connectors extract data and then use visualization widgets to attain deeper knowledge or a clearer perception of the data gathered. Source data in the platform is stored in a JSON format as a document, where the document format contains elements such as metadata, entities, and associations. Sources are made up of documents that are harvested over time. DATA SOURCES Sources are data connectors that pull data from databases, RSS feeds, or file shares such as directories, and single files, such as PDF, comma-separated values (CSV), XML, or ZIP. Each data source is assigned a Title (such as Fox News RSS), Tags (for example, News, Politics, Conservative, Republican, US) and a Type (like News). DATA SOURCE DOCUMENTS Each record or piece of data ingested by a source becomes a JSON document, regardless of the format or size of the data. A document can be any of the following: ๏‚ท Article from an RSS feed ๏‚ท 40-character Tweet ๏‚ท Row from a CSV file ๏‚ท 40-page medical journal Each JSON document contains ๏‚ท Series of metadata fields , including title, description, source ID, date, and time ๏‚ท Entities, such as person, IP-internal ๏‚ท Associations, for example hard (subject, verb, object) vs. soft Document Entities Document entities are who, what, and where that are extracted from a document. ๏‚ท Whoโ€”Person, Company, Organization ๏‚ท Whatโ€”Industry Term, Product, Facility ๏‚ท Whereโ€” City, Province or State, Country Document Associations An association is an activity or relationship between entities. It can be a subject, verb, object, at location, or over time. These subjects and objects can be free text, while pointing to entities within a document. 55 VVEERRVVII EEWW OOFF IIKKAA NNOO WW HHAADD OOOOPP DDAATTAA AACCCCEE SSSS SSEECCUU RRIITTYY PPRROODD UUCCTT
  • 23.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 23 20 15 MATCHING DOCUMENT TYPES When a query is issued, often a large number of โ€œmatchingโ€ documents will satisfy the query criteria, particularly for a common query like "Obama.โ€ In this example, the search yields 4.2 million results that are not directly available to the widgets. Top Documents From all the matching documents that are retrieved, a ranked subset of these documents are selected according to a configurable scoring method returned directly to GUI for analyzing. These top documents are an estimate of the most relevant documents . The default number of top documents is 100, indicating the top 100 of the 4.2 million documents are presented in the widgets. Filtered Documents The widget API allows for further filtering of the top documents within the GUI by selecting a subset of documents that contain a specific set of entities. This subset is called the filtered documents. In the above example, a filter for "Hillary Clinton" populates widgets with only those documents that contain both "Obama" AND "Hillary Clinton" occurrences. Aggregations Documents Although all matching documents contribute to the "knowledge" that a query can provide, the documents themselves are not the only objects returned from a query. Similarly, the relevant information to the analysis is summed, averaged, or aggregated across all matching documents and so are referred to as the aggregations.
  • 24.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 24 20 15 BASIC DATA ELEMENTS DATA OBJECTS Units of data in ISA are called data objects that include the following diverse objects: ๏‚ท Web pagesโ€”Raw or annotated by natural language processing ๏‚ท Video files ๏‚ท Log recordsโ€”individual or aggregated ๏‚ท Objects generated by analytics on existing data ๏‚ท KML overlays ๏‚ท Aircraft tracks ๏‚ท Business transactions DATA SERVICES Table 6 shows a set of logical ways called data services, in which data can be stored, indexed, and retrieved. Table 6. Data Services Types Data Service Types Description Document As an annotated document, which is a JSON object with a formatted sub-object describing entities, associations between entities, user comments, etc. Search index A searchable object. Columnar A related set of columns. Graph A collection of nodes and edges. Storage layer A set of "opaque" objects within a file. Temporal, geo-spatial Enables time and geo-specific processing. Data warehouse A relational view of the data well-suited to traditional OLAP-type processing. Object Schema How an object is handled by ISA data services is defined by its schema (DataSchemaBean). The schema describes the different properties relative to each service, for example, which columns should be stored in columnar fashion, how the graph should be constructed from the objects, for how long objects should be stored, etc. 66 VVEERRVVII EEWW OOFF IIKKAA NNOO WW HHAADD OOOOPP DDAATTAA AACCCCEE SSSS SSEECCUU RRIITTYY PPRROODD UUCCTT
  • 25.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 25 20 15 DATA IMPORT Data is imported into ISA by means of buckets (DataBucketBean), which has two properties: harvesting and enrichment data import. Harvesting Data Import Taking data from any transport layer and, in turn returning a set of JSON objects. Enrichment Data Import Taking data from the harvest, filtering unwanted objects, formatting or creating the desired fields, applying internal or external functionalities such as geo-location, natural processing, lookups via other buckets, arbitrary business logic, and so on. DATA BUCKETS The data schema to be applied to all objects in this bucket. Data buckets also have standard metadata shown in Table 7. Table 7. Data Bucket Metadata Data Bucket Metadata Description Access Rights A set of access rights, as described in Data Security below. Metadata Grouping Grouping metadata, while they can be grouped in a number of different ways: Multi-Bucket A specific multi-bucket that is a collection of other buckets. Multiple buckets can be referenced by parent folders. Bucket File system Each bucket has a file system hierarchy that physically maps onto where data is stored in the storage service or HDFS. Multiple Buckets Alias Each bucket can also be assigned a common alias name that can refer to multiple buckets. HARVESTING CONFIGURATION Harvesting configuration consists of three different parts: Table 8. Harvesting Configuration Types Harvesting Configuration Types Description Harvest Technology A JVM JAR implementation (IHarvestTechologyModule) whose callbacks are invoked whenever pre-defined actions occur on a bucket at the time it is created. The Harvester is then free to do processing, typically launching or
  • 26.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 26 20 15 re-configuring external processes such as Hadoop, Flume, Logstash, or a web crawler. This process results in ingesting objects using either the HDFS file interface (for batch operations) or functionality provided by an injected (IHarvestContext) object for streaming operations. Harvest Module Optionally, a set of harvest module JVM JARs whose format is defined by the author of the Harvest Technology. ISA enables the upload, access-permissions, discovery, and retrieval of Harvest Module Libraries. These libraries will typically provide de-framing of the data from its transport layer and JSON-ification. Harvest Technology A list of Harvest Technology-specific JSON configuration objects. HARVESTING DATA ENRICHMENT Enrichment of the data that has been harvested can take one of two forms: streaming or batch enrichment as shown in Table 9. Table 9. Data Enrichment Forms Data Enrichment Forms Description Streaming Data Enrichment Streaming enrichment, where each object is processed as soon as it is received. Streaming enrichment use the Storm framework together with Kafka for messaging. Batch Enrichment Batch enrichment, where enrichment is performed on sets of objects is more efficient but introduces latency and so is not suitable for alerting purpose. Batch enrichment will use the Hadoop, YARN, or Spark framework. Typically only one of the two supported enrichment forms is used. This means, you can take log records, perform batch processing on them, and then store them efficiently while performing a smaller set of enrichment processes in near-real-time and discarding most objects except for broadcasting "alerts" to listeners.
  • 27.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 27 20 15 Data Enrichment Lists Table 10 shows that enrichment consists of two lists, batch and streaming. Table 10. Data Enrichment Lists Data Enrichment Lists Contents JVM JAR files A list of JVM JAR files obtained from the Enrichment Module Library. One of the JAR files in this list must implement (IenrichmentBatchModule) or (IenrichmentStreamingModule). The other JAR files can be in arbitrary format and can be used to provide functional libraries, for example the Stanford NLP set of JARs, internal utilities, etc. JSON configuration object A JSON configuration object passed into the module at startup. Dependencies The dependencies between the modules that can be used for batch processing to enrich the objects in parallel. Similar to the harvester, enrichment modules have an (IenrichmentModuleContext)injected that enables the interaction with the core framework to filter objects, log errors, etc. At the end of the enrichment stage, batch or streaming, the extracted, transformed, and enriched object is automatically passed on to each of the data services as defined in its schema for storage and indexing. It can also be broadcasted across an object bus for analytics or API listeners to process as described in Data Analytics below. Note A bucket can be generated without any harvesting and enrichment. It can point to an existing collection in the database or to an empty bucket that can then be populated either manually or by using analytic threads. DATA ANALYTICS An Analytic Thread (AnalyticThreadBean) takes data from one or more populated buckets and then applies arbitrary further processing by using user-defined technologies such as Hadoop, Spark, Storm, Mahout, and Gephi. Furthermore, these Analytic Threads will be contained the bucket corresponding to the output location of the results. Table 11 shows each Analytic Thread from which it is comprises.
  • 28.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 28 20 15 Table 11. Data Analytics Components Data Analytics Components Contents Analytic Technology The name of an Analytic Technology, JVM JAR file implementing IAnalyticsTechologyModule whose callbacks are invoked whenever a pre-defined action occurs, such as user interactions, another analytic thread completes, a bucket obtaining more data, on a regular schedule. The analytic technology will be responsible for queuing the desired analytics as defined by the remaining items on this list. Data Query Services A set of inputs together with associated queries in the "language" of whichever "Data Service" is being used. For example, this could be "search term" queries, temporal queries, geo-spatial queries, "graph" queries, etc. Analytic Modules A list of Analytic Modules, JVM JARs managed by the ISA Library whose format is defined by the corresponding Analytic Technology. A configuration object describing the details of the analytics input, output, etc. A set of dependencies within the analytic thread such as run module1, then module2, etc. Analytic Thread The Analytic Thread run over the specified data and dump the output into one of more buckets with the appropriate data schemas. The output can treat existing data in the output buckets in one of the following ways: - Wipe and start again each time - Add data incrementally - Merge with existing data Instead of taking the data At Rest from a bucket, objects can be streamed In Flight for real-time or near-real time analytics and alerting. The analytic thread in this case registers a bucket name and the pipeline stage before enrichment, after enrichment, or in the middle of enrichment (after the named enrichment module). DATA SECURITY Security in ISA is delegated to a separate service, typically invoking an existing security scheme such as Kerberos and IKANOW ISA. More on this topic, including the security architecture is described in the IKANOW Security Architecture Guide (TBD).
  • 29.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 29 20 15 PLUGIN LIBRARIES The following plugin functionalities could be configured from libraries (SharedLibraryBean): ๏‚ท Harvest Technologies ๏‚ท Harvest Modules ๏‚ท Enrichment Modules ๏‚ท Analytic Technologies ๏‚ท Analytic Modules ๏‚ท Access Modules ISA provides a library upload, storage, and retrieval services. Libraries are tagged for discovery and have access tokens assigned to them that determines who can use them. For example, different analytics or APIs can be restricted based on "user group," for example commercial tiers for SaaS, by division in a large organization, etc. Note Currently only the administrator can upload libraries for security reasons and then sets the access tokens to decide who can use the libraries.
  • 30.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 30 20 15 DATA SOURCE PROCESSING PIPELINE Pipeline in ISA is the flow of different components in Informatics. A mapping in Informatics may contain Source, Transformations, and Targets that are connected together to make up a pipeline. Many such pipelines in a single mapping can exist. A single pipeline takes place when one pipeline is connected to another. DATA SOURCE PROCESSING TYPES IKANOW ISA platform supports two types of complex processing: input and custom source processing. Input Sources Processing This type of source processing allows many different types of data be input into documents or records. Documents are larger and more complex objects are typically generated from more complex XML/JSON, as well as natural languages such as web-sites and reports. The ISA platform provides a powerful pipeline of templated operations to transform these data types into ISAโ€™s generic document model. Records are smaller objects like single line log records, simple JSON objects, SQL records, and so on. ISA places almost no restrictions on the format of the JSON, including how it is to be imported into the system even though it integrates particularly well with the popular community-driven platform logstash to collect, enrich, and transport data. Custom Processing Sources Custom source processing involves applying custom logic to existing documents and records to enrich the system with new data and functionalities as shown in Table 12. Table 12. Custom Processing Sources New Data and Functionalities New Data and Functionalities Description Reports Such as spreadsheets or statistical data containing directly actionable information. New records and documents Typically alerts, or aggregate "events" made up of multiple documents and records. Lookup tables Tables that can be used to enrich new and existing documents like local asset information, generate alerts for malicious domains, etc. IKANOW uses the popular Hadoop ecosystem to power its custom processing capabilities, integrating its output, management, monitoring and security layers. 77 VVEERRVVII EEWW OOFF IIKKAA NNOO WW HHAADD OOOOPP DDAATTAA AACCCCEE SSSS SSEECCUU RRIITTYY PPRROODD UUCCTT
  • 31.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 31 20 15 Figure 10 shows how these two different activities, input and custom source processing are related. Figure 10. Input and Custom Source Processing Relationships The same JSON-based configuration language along with associated user interface can be used to build and maintain both types of pipelines. Typically, the elements do not mix. That is, a pipeline consists entirely of elements from either the "standard" set or the "custom" set even though there are some exceptions described below.
  • 32.
    32 2015 DATA INPUT SOURCES TheISA architecture enables harvesting and enrichment that is a more logical process based on the concept of applying a pipeline of processing elements to documents proceeding from a source. This capability is illustrated in Figure 11 below: Figure 11. Pipeline Elements Processing
  • 33.
    IKANOW Information SecurityAnalytics (ISA) Threat Intelligence Platform 33 20 15 Pipeline elements can be in any order and have any cardinality. For example you could create metadata from raw HTML (using xpath), have an automated text extractor followed by pulling more metadata using regex/javascript, return to the original raw text, and then run a different automated extractor before creating entities. A very useful scenario involves running the data through several entity extractors, potentially using the "criteria" field to choose which one to run based on the content and metadata extracted. Figure 11 above shows the pipeline elements can be approximately grouped into the following categories shown in Table 13. Table 13. Pipeline Element Categories Pipeline Element Categories Descriptions Extractors Generates mostly empty ISA documents from external data sources. Global Generate javascript artifacts that can be used by subsequent pipeline elements. Secondary extractors Enables new documents to be produced in large number from the existing metadata. Text extraction Manipulates the raw document content. Metadata Generates document metadata such as title, description, date, as well as arbitrary content metadata that use xpath, regex, and javascript. Entities and associations Creates entities and associations out of the text. Storage and indexing Decides which documents to keep, what fields to keep, and what to index full text for searching using the GUI/API.