Introduc)on	
  to	
  the	
  Open	
  Source	
  
HPCC	
  Systems®	
  Pla9orm	
  
	
  Arjuna	
  Chala	
  
Sr.	
  Director	
  of	
  Technology	
  Development	
  
A	
  Use	
  Case	
  Introduc)on	
  to	
  HPCC	
  Systems	
  
2	
  
Why	
  HPCC	
  Systems-­‐	
  Example	
  1:	
  Insurance	
  Collusion	
  
	
  
3	
  
•  Detec)ng	
  insurance	
  claim	
  fraud	
  
•  The	
  Insurance	
  company	
  data	
  	
  
only	
  finds	
  a	
  connec3on	
  between	
  
two	
  of	
  the	
  seven	
  claims,	
  and	
  only	
  
iden)fied	
  one	
  other	
  claim	
  as	
  being	
  
weakly	
  connected	
  
THE	
  CHALLENGE	
  
Example	
  1:	
  Insurance	
  Collusion	
  
4	
  
Family	
  
1	
  
Family	
  
2	
  
THE	
  SOLUTION	
  
•  The	
  results	
  showed	
  two	
  family	
  groups	
  
interconnected	
  on	
  all	
  of	
  these	
  seven	
  claims	
  
•  The	
  links	
  were	
  much	
  stronger	
  than	
  the	
  carrier	
  
data	
  previously	
  supported	
  	
  
Customers	
  Claim	
  data	
  is	
  linked	
  with	
  the	
  LexisNexis®	
  
Risk	
  Solu)ons	
  data	
  using	
  the	
  HPCC	
  Systems	
  pla9orm	
  
THE	
  RESULT	
  
Example	
  2:	
  Bust	
  Out	
  Fraud	
  
5	
  
THE	
  CHALLENGE	
  
•  Individual	
  (unconnected)	
  accounts	
  were	
  
defaul)ng	
  
•  Some	
  accounts	
  were	
  flagged	
  as	
  fraud	
  once	
  
contact	
  was	
  lost	
  with	
  individuals	
  
•  It	
  was	
  challenging	
  for	
  the	
  financial	
  
ins)tu)on	
  to	
  understand	
  the	
  depth	
  and	
  
width	
  of	
  the	
  fraud	
  	
  
Example	
  2:	
  Bust	
  Out	
  Fraud	
  
6	
  
THE	
  SOLUTION	
  
•  31	
  accounts	
  associated	
  with	
  3	
  fraud	
  accounts	
  (1	
  degree	
  separa)on)	
  
•  212	
  accounts	
  associated	
  with	
  2	
  known	
  charge	
  of	
  accounts	
  (2	
  degree	
  of	
  separa)on)	
  
•  Iden3fied	
  Ring	
  Leader	
  with	
  8,	
  1st	
  degree	
  associates	
  and	
  72	
  2nd	
  degree	
  associates	
  	
  
5	
  million	
  accounts	
  flagged	
  
with	
  ac)ve,	
  known	
  fraud,	
  
charge	
  offs	
  and	
  
preemp)vely	
  closed	
  tags	
  
THE	
  RESULT	
  
The	
  Core	
  Workflow	
  –	
  Learn	
  and	
  Make	
  Decisions	
  
Introduc)on	
  to	
  HPCC	
  Systems	
  7	
  
LEARN	
  Workflow	
  
8	
  
Raw	
  Historic	
  Data	
  –	
  
e.g.	
  Duplicate	
  
names,	
  unclean	
  
phone	
  numbers	
  	
  
En)ty	
  Disambigua)on	
  
and	
  Linking	
  (aka	
  MDM)	
  
Profile,	
  Clean	
  and	
  Normalize	
  
Social	
  Network	
  
Graph	
  Crea)on	
  
DECISION	
  Workflow	
  	
  (Real-­‐)me	
  or	
  Batch)	
  
9	
  
Customer	
  Inquiry	
  Data	
   Social	
  Network	
  
Graph	
  Analysis	
  
Outcome	
  
The	
  Data	
  Centric	
  Approach	
  
10	
  
A	
  single	
  source	
  of	
  	
  
data	
  is	
  insufficient	
  to	
  	
  
overcome	
  inaccuracies	
  	
  
in	
  the	
  data	
  
	
  
The	
  holes	
  are	
  inaccuracies	
  
found	
  in	
  the	
  data.	
  
Our	
  pla9orm	
  is	
  built	
  on	
  the	
  premise	
  of	
  absorbing	
  data	
  	
  
from	
  mul3ple	
  data	
  sources	
  and	
  transforming	
  them	
  to	
  a	
  
highly	
  intelligent	
  social	
  network	
  graphs	
  that	
  can	
  be	
  
manipulated	
  to	
  extract	
  the	
  non-­‐obvious	
  value.	
  
The	
  holes	
  in	
  the	
  
core	
  data	
  have	
  
been	
  eliminated.	
  
•  Grid	
  compu)ng	
  
•  Data-­‐centric	
  language	
  (ECL)	
  
•  Integrated	
  delivery	
  system	
  that	
  offers	
  data	
  plus	
  analy)cs	
  
Our	
  Solu)ons	
  Are	
  Powered	
  by	
  HPCC	
  Systems	
  at	
  Their	
  Core	
  
11	
  
Big	
  
Data	
  
Structured	
  
Records	
  
Unstructured	
  
Records	
  
News	
  
Ar)cles	
  
Proprietary	
  
Data	
  
Public	
  
Records	
  
Unstructured	
  and	
  
Structured	
  Content	
   High	
  Performance	
  Compu)ng	
  Cluster	
  Pla9orm	
  (HPCC)	
   Analysis	
  Applica)ons	
   Key	
  Capabili)es	
  
•  Over	
  4	
  petabytes	
  of	
  content	
  
•  50	
  billion	
  records	
  
•  10,000	
  sources	
  
•  7.5	
  billion	
  unique	
  name	
  and	
  
address	
  combina)ons	
  
•  Mul)-­‐bureau/mul)-­‐
source	
  models	
  and	
  
bureau	
  roll-­‐over	
  support	
  
•  Extensive	
  experience	
  
leveraging	
  atomic	
  level	
  
data,	
  combining	
  and	
  
leveraging	
  disparate	
  data	
  
•  Approximately	
  400	
  
models	
  deployed	
  
(custom	
  and	
  flagship)	
  
•  Data	
  and	
  analy)cs	
  
•  Iden)ty	
  verifica)on	
  and	
  
authen)ca)on	
  
•  Fraud	
  detec)on	
  and	
  
preven)on	
  
•  Inves)ga)on	
  
•  Screening	
  
•  Receivables	
  management	
  
Fusion	
  
Linking	
  
Refinery	
  
Open	
  Source	
  Components	
  
Complex	
  Analysis	
  
Clustering	
  Analysis	
  
Link	
  Analysis	
  
En3ty	
  Resolu3on	
  
Financial	
  Services	
  
Government	
  
Health	
  Care	
  
Insurance	
  
Legal	
  
Retail	
  
Scien3fic	
  Technical	
  
Medical	
  
Exhibi3ons	
  
The	
  Technology	
  –	
  Super	
  Fast,	
  Parallel,	
  Graph	
  Analy)cs	
  
12	
  
13	
  
SAP	
  
Oracle	
  ERP	
  
RDBMS	
  
Flat	
  Files	
  
IoT	
  
Terminals	
  
JMS	
  
Others	
  
Thor	
  
	
  
	
  
	
  
	
  	
  
	
  
	
  
	
  
ROXIE	
  
	
  
	
  
	
  
	
  	
  
	
  
	
  
	
  
Standardiza3on	
  &	
  Aggrega3on	
  System	
   Query	
  Delivery	
  System	
  
I
N
T
E
R
L
O
K	
  
Data	
  Integra)on	
  
•  Connect	
  
•  Integrate	
  
•  Schedule	
  
•  Transform	
  
Standardiza)on	
  
•  Clean	
  
•  Profile	
  
•  Normalize	
  
Aggrega)on	
  
•  Master	
  Data	
  Crea)on	
  
•  Rela)onship	
  Analysis	
  
•  Predic)ve	
  Analysis	
  	
  
•  Business	
  Intelligence	
  
D
S
P	
  
Integra3on	
  	
  
System	
  
Visualiza3on	
  	
  
System	
  
STRIKE	
  Technology	
  Overview	
  
ECL:	
  A	
  Powerful	
  Data	
  Flow	
  Language	
  
14	
  
How	
  you	
  code	
   How	
  the	
  system	
  executes	
  it	
  
Graph	
  Data	
  can	
  be	
  Represented	
  Using	
  Na)ve	
  Support	
  for	
  Hierarchical	
  
Data	
  and	
  Index	
  Pointers	
  
15	
  
STRIKE	
  Technology	
  Layer	
  View	
  
16	
  
Data	
  Connect	
  
Analy)cs	
  
Tools	
  
Common	
  
Programming	
  
Language	
  
Data	
  
Science	
  
Portal	
  
Cleaning	
  
MDM	
  
Dashboard	
  Creator	
   Workflow	
  Builder	
  
ECL	
  
Thor	
   ROXIE	
  Interlok	
  
Profiling	
  
Normaliza3on	
  
Predic3ve	
  
Analysis	
  
Business	
  
Intelligence	
  
A`ribute	
  
Crea3on	
  
Rela3onship	
  
Analysis	
  
SALT	
   KEL	
  
HPCC	
  Systems	
  has	
  evolved	
  to	
  address	
  IoT,	
  Blockchain…	
  
17	
  
Industry	
  Example:	
  Smart	
  Hat	
  
18	
  
4,000	
  workers	
  die	
  and	
  millions	
  
injured	
  annually	
  while	
  working	
  
on	
  the	
  industrial	
  floor	
  
Very	
  high	
  cost	
  for	
  maintaining	
  
safety	
  for	
  businesses	
  
THE	
  CHALLENGE	
  
THE	
  SOLUTION	
  
Example	
  2:	
  Smart	
  Hat	
  
19	
  
THE	
  OUTCOME:	
  	
  
Produced	
  an	
  industrial	
  
wearable	
  that	
  uses	
  	
  
IoT	
  and	
  wireless	
  
communica5ons	
  
systems	
  to	
  protect	
  	
  
and	
  empower	
  	
  
industrial	
  workers.	
  	
  
1.	
  Factory	
  readings	
  (temp,	
  
pressure,	
  CO,	
  CO2)	
  	
  
2.	
  Real-­‐)me	
  alerts	
  
3.	
  Update	
  monitoring	
  sta)on	
  
Sensor	
  equipped	
  
Wi-­‐Fi	
  hardhats	
  	
  
Central	
  Monitoring	
  
Sta)on	
  
Predic)on	
  Engine	
  
4.	
  Emergency	
  updates	
  	
  	
  
Next	
  Genera)on	
  HPCC	
  Systems	
  Goal	
  
20	
  
Autonomous	
  Vehicles	
  	
  
&	
  Driver	
  Behavior	
  
Security	
  &	
  Energy	
  
Public	
  Health,	
  Safety,	
  
Security	
  &	
  Transporta5on	
  
Logis5cs	
  &	
  Naviga5on	
  
Safety,	
  Opera5ons	
  &	
  
Equipment	
  Op5miza5on	
  
•  Real-­‐)me	
  data	
  collec)on,	
  
analysis	
  and	
  aler)ng	
  to	
  enable	
  
IoT	
  
•  Enable	
  event	
  driven	
  workflows	
  
like	
  managing	
  Blockchain	
  ledgers	
  
and	
  Driver	
  Behavior	
  
Automa5on	
  	
  
&	
  Security	
  
FACTORIES	
  
HOME	
  
OUTSIDE	
  
OFFICES	
  
CITIES	
  
VEHICLES	
  
Pa)ent	
  wearable	
  records	
  
exercise	
  informa)on	
  
Shared	
  
Ledger	
  
Doctor	
  informs	
  
pa)ent	
  that	
  they	
  need	
  
to	
  exercise	
  
Pa)ent	
  Exercises	
  Smart	
  Contract	
  with	
  
an	
  ini)al	
  $	
  value	
  is	
  
created	
  
Pa)ent	
  agrees	
  to	
  
exercise	
  regimen	
  
Pa)ent	
  wearable	
  
updates	
  analy)cs	
  
engine	
  periodically	
  
Analy)cs	
  Engine	
  
updates	
  the	
  ledger	
  by	
  
adding	
  or	
  decreasing	
  
value	
  of	
  the	
  contract	
  
1	
   2	
   3	
  
4	
  
Contract	
  details	
  are	
  
updated	
  to	
  a	
  shared	
  
ledger	
  
5	
   6	
  
7	
  
8	
  Example:	
  Healthcare	
  
Blockchain	
  (Napster	
  for	
  contracts)	
  	
  	
  
Enables	
  Event	
  Driven	
  Contracts	
  
Q&A	
  
22	
  
•  Portal:	
  hpccsystems.com	
  
•  ECL	
  Language	
  Reference:	
  hjps://hpccsystems.com/ecl-­‐language-­‐reference	
  
•  SALT:	
  hjps://hpccsystems.com/enterprise-­‐services/purchase-­‐required-­‐modules/SALT	
  	
  
•  KEL:	
  hjps://hpccsystems.com/download/free-­‐modules/kel-­‐lite	
  	
  
•  Machine	
  Learning:	
  hjp://hpccsystems.com/ml	
  	
  
•  Online	
  Training:	
  hjp://learn.lexisnexis.com/hpcc	
  	
  	
  
•  HPCC	
  Systems	
  Blog:	
  hjp://hpccsystems.com/blog	
  	
  
•  HPCC	
  Systems	
  Wiki	
  &	
  Red	
  Book:	
  hjps://wiki.hpccsystems.com	
  	
  	
  
•  Our	
  GitHub	
  portal:	
  hjps://github.com/hpcc-­‐systems	
  	
  
•  Community	
  Forums:	
  hjp://hpccsystems.com/bb	
  	
  
•  Case	
  Studies:	
  hjps://hpccsystems.com/resources/case-­‐studies	
  	
  
Resources	
  

Introduction to the Open Source HPCC Systems Platform by Arjuna Chala

  • 1.
    Introduc)on  to  the  Open  Source   HPCC  Systems®  Pla9orm    Arjuna  Chala   Sr.  Director  of  Technology  Development  
  • 2.
    A  Use  Case  Introduc)on  to  HPCC  Systems   2  
  • 3.
    Why  HPCC  Systems-­‐  Example  1:  Insurance  Collusion     3   •  Detec)ng  insurance  claim  fraud   •  The  Insurance  company  data     only  finds  a  connec3on  between   two  of  the  seven  claims,  and  only   iden)fied  one  other  claim  as  being   weakly  connected   THE  CHALLENGE  
  • 4.
    Example  1:  Insurance  Collusion   4   Family   1   Family   2   THE  SOLUTION   •  The  results  showed  two  family  groups   interconnected  on  all  of  these  seven  claims   •  The  links  were  much  stronger  than  the  carrier   data  previously  supported     Customers  Claim  data  is  linked  with  the  LexisNexis®   Risk  Solu)ons  data  using  the  HPCC  Systems  pla9orm   THE  RESULT  
  • 5.
    Example  2:  Bust  Out  Fraud   5   THE  CHALLENGE   •  Individual  (unconnected)  accounts  were   defaul)ng   •  Some  accounts  were  flagged  as  fraud  once   contact  was  lost  with  individuals   •  It  was  challenging  for  the  financial   ins)tu)on  to  understand  the  depth  and   width  of  the  fraud    
  • 6.
    Example  2:  Bust  Out  Fraud   6   THE  SOLUTION   •  31  accounts  associated  with  3  fraud  accounts  (1  degree  separa)on)   •  212  accounts  associated  with  2  known  charge  of  accounts  (2  degree  of  separa)on)   •  Iden3fied  Ring  Leader  with  8,  1st  degree  associates  and  72  2nd  degree  associates     5  million  accounts  flagged   with  ac)ve,  known  fraud,   charge  offs  and   preemp)vely  closed  tags   THE  RESULT  
  • 7.
    The  Core  Workflow  –  Learn  and  Make  Decisions   Introduc)on  to  HPCC  Systems  7  
  • 8.
    LEARN  Workflow   8   Raw  Historic  Data  –   e.g.  Duplicate   names,  unclean   phone  numbers     En)ty  Disambigua)on   and  Linking  (aka  MDM)   Profile,  Clean  and  Normalize   Social  Network   Graph  Crea)on  
  • 9.
    DECISION  Workflow    (Real-­‐)me  or  Batch)   9   Customer  Inquiry  Data   Social  Network   Graph  Analysis   Outcome  
  • 10.
    The  Data  Centric  Approach   10   A  single  source  of     data  is  insufficient  to     overcome  inaccuracies     in  the  data     The  holes  are  inaccuracies   found  in  the  data.   Our  pla9orm  is  built  on  the  premise  of  absorbing  data     from  mul3ple  data  sources  and  transforming  them  to  a   highly  intelligent  social  network  graphs  that  can  be   manipulated  to  extract  the  non-­‐obvious  value.   The  holes  in  the   core  data  have   been  eliminated.  
  • 11.
    •  Grid  compu)ng   •  Data-­‐centric  language  (ECL)   •  Integrated  delivery  system  that  offers  data  plus  analy)cs   Our  Solu)ons  Are  Powered  by  HPCC  Systems  at  Their  Core   11   Big   Data   Structured   Records   Unstructured   Records   News   Ar)cles   Proprietary   Data   Public   Records   Unstructured  and   Structured  Content   High  Performance  Compu)ng  Cluster  Pla9orm  (HPCC)   Analysis  Applica)ons   Key  Capabili)es   •  Over  4  petabytes  of  content   •  50  billion  records   •  10,000  sources   •  7.5  billion  unique  name  and   address  combina)ons   •  Mul)-­‐bureau/mul)-­‐ source  models  and   bureau  roll-­‐over  support   •  Extensive  experience   leveraging  atomic  level   data,  combining  and   leveraging  disparate  data   •  Approximately  400   models  deployed   (custom  and  flagship)   •  Data  and  analy)cs   •  Iden)ty  verifica)on  and   authen)ca)on   •  Fraud  detec)on  and   preven)on   •  Inves)ga)on   •  Screening   •  Receivables  management   Fusion   Linking   Refinery   Open  Source  Components   Complex  Analysis   Clustering  Analysis   Link  Analysis   En3ty  Resolu3on   Financial  Services   Government   Health  Care   Insurance   Legal   Retail   Scien3fic  Technical   Medical   Exhibi3ons  
  • 12.
    The  Technology  –  Super  Fast,  Parallel,  Graph  Analy)cs   12  
  • 13.
    13   SAP   Oracle  ERP   RDBMS   Flat  Files   IoT   Terminals   JMS   Others   Thor                   ROXIE                   Standardiza3on  &  Aggrega3on  System   Query  Delivery  System   I N T E R L O K   Data  Integra)on   •  Connect   •  Integrate   •  Schedule   •  Transform   Standardiza)on   •  Clean   •  Profile   •  Normalize   Aggrega)on   •  Master  Data  Crea)on   •  Rela)onship  Analysis   •  Predic)ve  Analysis     •  Business  Intelligence   D S P   Integra3on     System   Visualiza3on     System   STRIKE  Technology  Overview  
  • 14.
    ECL:  A  Powerful  Data  Flow  Language   14   How  you  code   How  the  system  executes  it  
  • 15.
    Graph  Data  can  be  Represented  Using  Na)ve  Support  for  Hierarchical   Data  and  Index  Pointers   15  
  • 16.
    STRIKE  Technology  Layer  View   16   Data  Connect   Analy)cs   Tools   Common   Programming   Language   Data   Science   Portal   Cleaning   MDM   Dashboard  Creator   Workflow  Builder   ECL   Thor   ROXIE  Interlok   Profiling   Normaliza3on   Predic3ve   Analysis   Business   Intelligence   A`ribute   Crea3on   Rela3onship   Analysis   SALT   KEL  
  • 17.
    HPCC  Systems  has  evolved  to  address  IoT,  Blockchain…   17  
  • 18.
    Industry  Example:  Smart  Hat   18   4,000  workers  die  and  millions   injured  annually  while  working   on  the  industrial  floor   Very  high  cost  for  maintaining   safety  for  businesses   THE  CHALLENGE  
  • 19.
    THE  SOLUTION   Example  2:  Smart  Hat   19   THE  OUTCOME:     Produced  an  industrial   wearable  that  uses     IoT  and  wireless   communica5ons   systems  to  protect     and  empower     industrial  workers.     1.  Factory  readings  (temp,   pressure,  CO,  CO2)     2.  Real-­‐)me  alerts   3.  Update  monitoring  sta)on   Sensor  equipped   Wi-­‐Fi  hardhats     Central  Monitoring   Sta)on   Predic)on  Engine   4.  Emergency  updates      
  • 20.
    Next  Genera)on  HPCC  Systems  Goal   20   Autonomous  Vehicles     &  Driver  Behavior   Security  &  Energy   Public  Health,  Safety,   Security  &  Transporta5on   Logis5cs  &  Naviga5on   Safety,  Opera5ons  &   Equipment  Op5miza5on   •  Real-­‐)me  data  collec)on,   analysis  and  aler)ng  to  enable   IoT   •  Enable  event  driven  workflows   like  managing  Blockchain  ledgers   and  Driver  Behavior   Automa5on     &  Security   FACTORIES   HOME   OUTSIDE   OFFICES   CITIES   VEHICLES  
  • 21.
    Pa)ent  wearable  records   exercise  informa)on   Shared   Ledger   Doctor  informs   pa)ent  that  they  need   to  exercise   Pa)ent  Exercises  Smart  Contract  with   an  ini)al  $  value  is   created   Pa)ent  agrees  to   exercise  regimen   Pa)ent  wearable   updates  analy)cs   engine  periodically   Analy)cs  Engine   updates  the  ledger  by   adding  or  decreasing   value  of  the  contract   1   2   3   4   Contract  details  are   updated  to  a  shared   ledger   5   6   7   8  Example:  Healthcare   Blockchain  (Napster  for  contracts)       Enables  Event  Driven  Contracts  
  • 22.
  • 23.
    •  Portal:  hpccsystems.com   •  ECL  Language  Reference:  hjps://hpccsystems.com/ecl-­‐language-­‐reference   •  SALT:  hjps://hpccsystems.com/enterprise-­‐services/purchase-­‐required-­‐modules/SALT     •  KEL:  hjps://hpccsystems.com/download/free-­‐modules/kel-­‐lite     •  Machine  Learning:  hjp://hpccsystems.com/ml     •  Online  Training:  hjp://learn.lexisnexis.com/hpcc       •  HPCC  Systems  Blog:  hjp://hpccsystems.com/blog     •  HPCC  Systems  Wiki  &  Red  Book:  hjps://wiki.hpccsystems.com       •  Our  GitHub  portal:  hjps://github.com/hpcc-­‐systems     •  Community  Forums:  hjp://hpccsystems.com/bb     •  Case  Studies:  hjps://hpccsystems.com/resources/case-­‐studies     Resources