Haystax Technology Labs presentation of white-paper on advanced threat analytics at 9th International Semantic Technologies Intelligence for Defense and Security (STIDS)
Processing Events in Probabilistic Risk Assessment
1. Processing events in
probabilistic risk
assessment
9th International Conference on Semantic
Technologies for Intelligence, Defense, and
Security (STIDS). November 20, 2014
Annotated presentation—see Notes Page view.
2. Three event-informed person risk models
1. MC (“Carbon”):
Information disclosure risk
Belief that a (candidate) member person
P will disclose an organization’s private
information
Life (“macro”) events
Education, employment
Crime, civil judgment
Bankruptcy, credit
…
2. MS (“Silicon”):
IT system insider exploitation risk
Belief that a user will access, disclose,
or destroy an organization’s computer
network-resident information)
Computer network (“micro”) events
Log in after hours
Access “decoy” file
Copy file to…
External location
Thumb drive
3. MG = MC • MS
NOTE: Carbon and Silicon are names of Haystax Analytic Products
3. 2
Theme
Issue: Apply event evidence to person attribute concept random variables
(RVs) in a risk assessment Bayesian network (BN), modeling events’ changing
relevance over time.
Given:
Person P
Events E, in P’s past or present
Generic person BN B
Risk-related person attribute concept RVs (Boolean)
Concept-relating probabilistic influences
A reference time t (in an ordered set T of such points)
Develop:
Person-specific BN BP reflecting E
Beliefs in P’s attribute concept at t, per BP
(P’s historical risk profile over T)
4. 3
Elided B with ingested event categories (MC)
Trustworthy
Reliable
CommitsMisdemeanor
CommittedToSchool … CommittedToCareeer
School events Employment events
Law
enforcement
events
…
…
5. Approaches to realizing BP
1. Event “ingestion”:
For each event e in E, …
Include a new event RV δ indicating
person attribute concept π in BP
Specify per-event half life decay as
new temporal relevance RV ρ
Enter hard evidence finding on δ
Appropriate when events are of a
given type τ are individually salient
Feasible when |E| << |nodes(B )|
Ingestion
π ρ
concept relevance
δ
event
7. Three event-informed person risk models
1. MC (“Carbon”):
Information disclosure risk
100s of RVs
B extracted from official policy /
guidelines (under in situ test)
Life (“macro”) events
10s of types
10s of events / person
10s of years of data
Ingestion only (“hard” salience)
10s of rules
2. MS (“Silicon”):
IT system insider exploitation risk
10s of RVs
B eyeballed (preliminary proof of
concept)
Computer network (“micro”) events
10s of types
100Ks of events / person
1.5 years of data
Summarization, primarily (“soft”
salience)
1s of ingestion rules
3. MG = MC • MS
8. Three event-informed person risk models
2. MS (“Silicon”):
IT system insider exploitation risk
Belief that a user will access, disclose,
or destroy an organization’s computer
network-resident information)
Computer network (“micro”) events
Log in after hours
Access “decoy” file
Copy file to…
External location
Thumb drive
3. MG = MC • MS
9. Approaches to realizing BP
2. Event “summarization”:
For each event type τ represented in
E, …
Include an event “summary” RV Δ
indicating π in B
Develop a likelihood summarizing the
impact of events τ collected into
temporal buckets
Enter likelihood finding on Δ
Appropriate when the salience of
events type τ tends to depend on
trends w.r.t. an individual or a
population thereof
Useful when ⌐(|E| << |nodes(B )|)
Summarization
concept relevance
π ρ
Δ
summary
events δ1 δ2 … δn
10. 9
Summarization elements (per RV)
Summarize events over a practically unlimited duration, by using temporal
buckets of geometrically increasing size.
Infer salience from event volume variation w.r.t. a person’s own and the
population’s history.
Weight buckets per desired temporal relevance decay.
16. Approaches to realizing BP
2. Event “summarization”:
For each event type τ represented in
E, …
Include an event “summary” RV Δ
indicating π in B
Develop a likelihood summarizing the
impact of events τ collected into
temporal buckets
Enter likelihood finding on Δ
Appropriate when the salience of
events type τ tends to depend on
trends w.r.t. an individual or a
population thereof
Useful when ⌐(|E| << |nodes(B )|)
Summarization
concept relevance
π ρ
Δ
summary
events δ1 δ2 … δn
21. 20
Ingestion issue: Interacting temporal relevance nodes
Temporal relevance nodes participate in belief propagation in BP—making
their beliefs (so, effective temporal relevance) subject to departure from
nominal specification.
Multiple temporal and/or semantically close events’ relevance nodes
reinforce each other—inducing temporal relevance beyond nominal
specification.
5 simultaneous events’ decay only 6% after half life interval.
We might naively expect 50%.
Summarization largely insulates a temporal relevance node from surrounding
belief propagation.
22. 21
Supporting software “stack”
Allegro Common Lisp® (ACL)
AllegoGraph® Lisp direct client
Allegro Prolog macros (e.g., select)
Lisp macros (e.g., iterate-cursor)
ACL API to the Netica® API
Netica® API
27. Approaches to realizing BP
1. Event “ingestion”:
For each event e in E, …
Include a new event RV δ indicating
person attribute concept π in BP
Specify per-event half life decay as
new temporal relevance RV ρ
Enter hard evidence finding on δ
Appropriate when events are of a
given type τ are individually salient
Feasible when |E| << |nodes(B )|
2. Event “summarization”:
For each event type τ represented in
E, …
Include an event “summary” RV Δ
indicating π in B
Develop a likelihood summarizing the
impact of events τ collected into
geometrically larger buckets
Enter likelihood finding on Δ
Appropriate when the salience of
events type τ tends to depend on
trends w.r.t. an individual or a
population thereof
Needed when ⌐(|E| << |nodes(B )|)
31. 30
MS
Summarization metric: Count (CopyDecoyToExternal)
Event type instance count
32. 31
MS
Summarization metric: Variation re self (CopyDecoyToExternal)
Event type historical variation re self
33. 32
MS
Summarization metric: Variation re all (CopyDecoyToExternal)
Event type historical variation re all
34. 33
MS
Summarization metric: Suspicion warrant (CopyDecoyToExternal)
Event type summary RV likelihood (suspicion warrant)
Editor's Notes
This work has been conducted at Haystax Technology’s headquarters in McLean, VA USA.
Our paper begins with some generic motivation for person risk assessment and then moves to actual models we’ve built. We’ll try to make it real right away in this talk.
Our model “Carbon” (because we are assessing risk for carbon-based life forms) assesses information disclosure risk based on a person’s life (or “macro”) events.
Our model “Silicon” (because computers are silicon-based) assesses IT system exploitation risk based on a person’s computer network (or “micro”) events.
Carbon is older and more mature (under deployment). Silicon is an exploratory proof of concept. MG (no brand name, yet) is an early exercise.
To formalize things (just) a little bit, we have this problem statement. We’ll jump right into a Carbon example with a historical risk profile.
Our risk assessment model core is a Bayesian network (BN) using binary random variables (top). The full Carbon BN includes too many nodes to show here. As our top-level proxy for information disclosure risk, we offer the person attribute concept Trustworthy.
Person attribute concepts serve both as hypotheses and as indicators. BN arcs point in the causal direction, from hypotheses to indicators. Indication strength is suggested by line thickness, polarity by other line format. BN is per qualitative specification, forthcoming, of which this is a graphical rendition.
Carbon “ingests” events (as suggested at bottom) to realize a person-specific BN.
A concept like CommitsMisdemeanor is indicated by an event like MisdemeanorAssault.
The (horizontal) time axis does double duty, for events (bars) and beliefs (lines). Hint: compare most rapidly changing beliefs to events (visual sensitivity analysis).
We have applied the model to develop beliefs at the plotted time points.
Note how beliefs in CommittedToSchool and CommittedToCareer tend to build while the related (HighSchoolAttendance and Employment) events are ongoing. Influence interactions in B cause belief in CommittedToCareer to grow even while P is still in high school. (We tend to believe that someone who does well in school will also do well in a career.) Belief in CommittedToSchool increases when P graduates but then become less relevant per half lives specified in ingestion rules for school-related events. The 2007 MisdemeanorAssault charge decreases belief in all the other, positive concept RVs. See also Lisp macro calls expressing associated event data, forthcoming.
Questions? (We will see this format again.)
Source: STIDS-2014-A
Our Carbon and Silicon risk models have been driven by qualitatively different event sets. While the event set for Carbon is sensitive, we can tell you that the event set for Silicon is a synthetic insider threat dataset that had been generated by US CERT for DARPA’s ADAMS program.
Because Silicon must address so many more events—and because these events have qualitatively different salience—we invented a different scheme to specialize the BN w.r.t. a given person’s events.
Remember what kinds of events Silicon is dealing with…
That is, we use a fixed BN with “summary” RVs for the different event types.
In summarization, we attain three objectives simultaneously.
Event volume compaction
Event temporal relevance decay
Event volume variation characterization
The paper’s figure 4 exhibits key metrics for a US CERT dataset person. Here, we have contrived data for a given event type (for a made-up person) to have a linear increase after 33 days (of 64 plotted). Each temporal bucket labeled n counts events in only the most recent n days.
The difference between adjacent buckets tells us what is “new” vs. “old” in the longer bucket. The ratio of this difference to the shorter bucket tells us this person’s variation w.r.t. this bucket size. We normalize this ratio to the range [0, 1] using a sigmoid function.
Note that different-sized buckets have different derivatives w.r.t. time.
Variation w.r.t. the population uses a simple comparison against a computed statistic (e.g., mean).
We average the foregoing variations on our way to informing Big Delta (Δ). (Alternatively, we could include separate RVs in B.)
Finally, we collapse variation means for all the buckets into a single “suspicion warrant” score for a given day and enter this as a likelihood finding on Big Delta (Δ).
This is just a reminder…
All that leads to this belief timeline plot for our made-up person.
From the appendix (topic of a future paper…)
Compare this to the following slide about MG, where we link Carbon and Silicon via the concepts Trustworthy and Untrustworthy. Keep your eye on the (red) belief line for Silicon’s top-level concept, ExploitsITSystemAsInsider. (It’s going to turn umber…)
…and get a lot steeper at the marked point—because we have given this person some life events that affect not only his trustworthiness but (in MG) also our belief that he exploits the IT system as an insider.
So, we have presented some results from each of our three models.
The paper discusses strengths and weaknesses in ingestion’s and summarization’s different treatments of temporal relevance. We’ve reduced that to one “issue” slide here—so that we can say more at this venue about our supporting semantic technology.
We exploit semantic technology for…
Event ontology definition and application
Ingestion rule definition.
AllegroGraph® is an RDF triple store management system that happens to be written in Allegro Common Lisp®. While Franz supports AllegroGraph® clients for a number of different languages, the direct (vs. remote) Lisp client benefits us in that it shares memory with AllegroGraph® itself. Allegro Prolog®, written in and included in Allegro Common Lisp®, is a logic programming facility that the Lisp direct client extends with Lisp macros and Prolog predicates affording access (alternatively to SPARQL) to AllegroGraph® triple stores.
Because Allegro Prolog® supports calls to Lisp functions from within logic programming rules, our ingestion rules can invoke the Allegro Common Lisp® API to the Netica® API to augment an existing generic person Bayesian network (BN) model B to add random variables (RVs) corresponding to a person P’s events E, resulting in a person-specific BN BP.
defIngestionRule is a macro wrapping Allegro Prolog® <-, registering the ingestion rule and performing static analysis to ensure well-formedness. We call +process-reportedEvent, the ingestion rule predicate, to launch the ingestion process for a given person as of a given time. When reportedEvent, our create-EventConceptIndication is called with the bound logic variables.
Allegro Prolog® includes predicate-level functors supporting logical operations (e.g., and, or), backtracking control (varieties of if, cut), and Lisp calls evaluated at predicate level for their truth values (i.e., not just execution for side effect as here). Under AllegroGraph®’s direct Lisp client, user-defined Allegro Prolog® rules (so ingestion rules and their supporting predicates) may include any RDF resources (i.e., URIs) or literals in their heads or bodies. So, the language of ingestion rules is relatively expressive.
With its signature treatment of programs as data (both expressed as lists), Lisp has long been a favorite language for creating embedded knowledge representation languages and supporting utilities. We exploit this facility in designing our models’ ontologies for person-related events—using Lisp macros to express class, property, and individual (instance) definitions.
Macro calls here add triples to a specified graph in an active store. Store-resident triples may be serialized to a standard OWL file in (e.g.) RDF/XML format, then viewed in an available ontology browser (e.g., Protégé). For a specified class (e.g., Person), an object or datatype property (e.g., hasGender or startDate) is created per the type (e.g., Gender or Date) specified. OWL closed enumeration classes (e.g., Gender) are supported, as are OWL property types (e.g., Functional) and restrictions (e.g., cardinality). Validation machinery ensures a specified ontology’s global consistency with respect to effective cardinalities allowed.
The framework validates any loaded dataset with respect to declared subject and object classes, literal data types, and property types (e.g., Functional) and restrictions (e.g., cardinality).