Jesse Shaw, LexisNexis Risk Solutions, presents at the 2015 HPCC Systems Summit Community Day.
Accelerate the exploratory analytics process to rapidly produce valuable insights when approaching new business problems or untested data sources by leveraging HPCC Systems’ Knowledge Engineering Language (KEL). KEL enables the creation, organization and extraction of data dimensions with a fraction of the ECL source code previously required. This presentation will explain how the graph analytics KEL features can be used to track the spread of Ebola throughout the US.
9. 9Everything Is Going To KEL
Digging Out - Exploring Massive Data
KEL
Data Ingest
Data Delivery
ECL So cute.
Data Aggregation
10. 10Discovery Analytics: Tracking Ebola Spread
Digging Out - Exploring Massive Data
KEL
2,000 (tedious) Lines
300,000 (I’m-not-working-here-anymore tedious) Lines
C++ECL
120 Lines
KEL
11. 11Everything Is Going To KEL
Digging Out - Exploring Massive Data
KEL
KEL allows us to control data STRUCTURE
and INTERACTIONS through ENTITES and
ASSOCIATIONS.
Data knowledge becomes PERSPECTIVE
driven.
12. 12Everything Is Going To KEL
Entities, Associations, and Perspective
en·ti·ty - a thing with distinct and independent existence.
The ENTITES derived from a medical claim:
• A Person
• Patient
• Policy Holder
• Referring Provider
• Attending Provider
• A Facility
• Billing
• Provider
• The Claim
• Details
13. 13Everything Is Going To KEL
Entities, Associations, and Perspective
as·so·ci·a·tion - a connection or cooperative link between entities.
ASSOCIATIONS bind ENTITIES together.
• Two providers found on the same claim
• Addresses found in the same geographic boundary
• People found on the same property deed
• Buyers
• Sellers
• People found at the same address
These connections create the GRAPH
14. 14Everything Is Going To KEL
Entities, Associations, and Perspective
per·spec·tive - a point of view or way of regarding something.
The GRAPH provides incredible data aggregation flexibility.
15. 15Everything Is Going To KEL
Entities, Associations, and Perspective
per·spec·tive - a point of view or way of regarding something.
The GRAPH provides incredible data aggregation flexibility.
16. 16Everything Is Going To KEL
Entities, Associations, and Perspective
per·spec·tive - a point of view or way of regarding something.
The GRAPH provides incredible data aggregation flexibility.
17. 17Discovery Analytics: Tracking Ebola Spread
Digging Out – Answer Important Questions!
Where are the Jesse Shaws?
Person := ENTITY(FLAT(UID=did, fname, lname, st, city_name, zip),MODEL(*));
USE header.File_Headers(FLAT,Person);
Person: => isJS := IF(fname = 'JESSE' AND lname = 'SHAW', TRUE, FALSE);
QUERY: StatesJS <= Person(isJS=TRUE){st, cnt := GROUP:Count};
20. 20Discovery Analytics: Tracking Ebola Spread
Example Shell – BRCA Proof of Concept
May 2013, Angelina Jolie publically announced she was having a
preventive double mastectomy.
• She tested positive for BRCA1 and BRCA2 mutations.
• Her mother died from breast cancer.
21. 21Discovery Analytics: Tracking Ebola Spread
Example Shell – BRCA Proof of Concept
• Expected October seasonality
• Search volume decreased and normalized
• Overall search volume decreased suggesting
• Public knowledge appreciated
• Public desensitization to the issue
• Peak search volume narrowed suggesting
• Shorter public attention span
• Desensitization to the issue
22. 22Discovery Analytics: Tracking Ebola Spread
Example Shell – BRCA Proof of Concept
The KEL data structure was comprised of just 4 ENTITES:
• Person
• Facility
• Claim
• Claim Details
A time series was constructed based on:
• BRCA CPT CODES
• YEAR
• MONTH
26. 26Discovery Analytics: Tracking Ebola Spread
FAU – Rapid Response Model
The Team:
Dr. Borko Furht
Dr. Dingding Wang
Dr. Ankur Agarwal
Dr. Hari Kalva
27. 27Discovery Analytics: Tracking Ebola Spread
FAU– Disease Propagation Proof of Concept
Cohesivity and Cluster Connectivity
High social network cohesion has been shown to correlate positively with an increase in the probability of
idea dispersion. This leads to the question whether a highly cohesive physical network would be tuned to
positively effect disease propagation.
28. 28Discovery Analytics: Tracking Ebola Spread
FAU– Disease Propagation Proof of Concept
Two Initial goals:
Create people clusters based on proximity
Build a simple, weighted score approximating physical contact
Bonus!
Create a mechanism to crawl clusters based on edge weights
29. 29Discovery Analytics: Tracking Ebola Spread
FAU– Disease Propagation Proof of Concept
Historic address creation
Use address history to find a person’s most recent address.
30. 30Discovery Analytics: Tracking Ebola Spread
FAU– Disease Propagation Proof of Concept
Cluster creation
Use the public record graph to calculate distances
between a person and their relatives.
33. 33Discovery Analytics: Tracking Ebola Spread
FAU – Disease Propagation Proof of Concept
Controlling Backtracking
KEL allows for the creation of node traversal rules through a pattern definition.
GLOBAL: Relations(#1,#2,#dist1) => D1(#1,#2);
GLOBAL: Relations(#1,#2,#dist1),
Relations(#2,#3,#dist2),
#1<>#3 => D2(#1,#2,#dist1, COUNT(#3));
GLOBAL : D2(#1,#2,#dist1,#cnt1),
D2(#2,#3,#dist2,#cnt2),
#1<>#3,
NOT D2(#1,#3)
=> D2Paths(#1,#3,#dist1+#dist2,#cnt1+#cnt2,#2);
34. 34Discovery Analytics: Tracking Ebola Spread
FAU – Disease Propagation Proof of Concept
Controlling Backtracking
KEL allows for the creation of node traversal rules through a pattern definition.
Resulting Set for D2Paths:
Node 1 Node 2 Node 3 Distance Leaves
A B D 5 1
A C D 4 1
35. 35Discovery Analytics: Tracking Ebola Spread
FAU– Disease Propagation Proof of Concept
Root Sink Path Distance D1 Count Path Length PCT of Graph
4 184, 5, 7, 9, 10, 13, 17, 18 7 10 8 100%
4 184, 8, 12, 16, 18 7 6 5 61%
4 184, 8, 15, 18 15 6 4 56%
Most infectious, Shortest path…
Using distance edge weights, find the path with the
most first degree nodes.