SlideShare a Scribd company logo
1 of 26
Download to read offline
Overview of HPCC Systems and Case Studies 
Presenter: Brian Bounds, Director of Software Engineering 
November 2014
Context 
Business Services: 39% 
• 
About Brian 
• 
HPCC Systems in the Context of LexisNexis 
• 
Isn’t Big Data just Data?
WHT/082311 
http://hpccsystems.com 
Components of the Open Source HPCC Systems platform 
• 
The HPCC Systems platform includes: 
• 
Thor: batch oriented data manipulation, linking and analytics engine 
• 
Roxie: real-time data delivery and analytics engine 
• 
A high level declarative dataflow language: ECL 
• 
Implicitly parallel 
• 
No side effects 
• 
Code/data encapsulation 
• 
Extensible 
• 
Highly optimized 
• 
Builds graphical execution plans 
• 
Compiles into C++ and native machine code 
• 
Common to Thor and Roxie 
• 
An extensive library of ECL modules, including data profiling, linking, graph analytics, and Machine Learning
The LexisNexis Open Source HPCC Systems platform
WHT/082311 
Detailed HPCC Systems Platform Architecture
WHT/082311 
HPCC Systems Overview Video
WHT/082311 
• 
The Programming Language (ECL) 
• 
The Delivery Engine (ROXIE) 
• 
Enterprise Readiness 
• 
Big Data … becomes Data (that might be Big) 
Features to Care About
WHT/082311 
Case Study #1 (Enterprise) – LexisNexis Risk Solutions
WHT/082311 
Case Study #1 (Enterprise) – LexisNexis Risk Solutions 
We are among the largest providers of risk solutions in the market today 
LexisNexis 
Reed Elsevier is a world leading provider of information solutions. 
LexisNexis Risk Solutions Revenue: $M 
LexisNexis Risk Solutions has seen sustained revenue and profit growth 
Government 8% 
LexisNexis Risk Solutions is a leading provider in the U.S. across Business Services, Insurance and Government segments. 
Insurance: 53% 
Business Services: 
39% 
LexisNexis Risk Solutions Revenue by Segment
WHT/082311 
We are among the largest providers of risk solutions in the market today 
LexisNexis 
Our customers include: 
• 
99 of the top 100 US banks 
• 
90% of the Fortune 500 
• 
100% of US P&C insurance carriers 
• 
All 50 US states, 70% of local governments and 80% of US federal agencies 
• 
97 of Am Law 100 firms 
Government 8% 
LexisNexis Risk Solutions is a leading provider in the U.S. across Business Services, Insurance and Government segments. 
Insurance: 53% 
Business Services: 39% 
LexisNexis Risk Solutions Revenue by Segment 
Case Study #1 (Enterprise) – LexisNexis Risk Solutions
WHT/082311 
Case Study #1 (Enterprise) – LexisNexis Risk Solutions 
We have a unique set of capabilities: Data, Linking, Analytics, and Product Development 
Customer-Focused Solutions 
+ 
+ 
= 
Linking & Analytics 
+ 
Industry-Specific Expertise & Delivery 
Data Technology 
Vast Data Resources 
• 
Process 
• 
Sources 
• 
Coverage 
• 
Advanced linking & analytics 
• 
Accuracy & efficiency 
• 
Protect private information 
• 
Aligned with our customers’ industries 
• 
Deep industry expertise 
• 
Speed 
• 
Capacity 
• 
Cost savings 
• 
Predict, manage and assess risk across many industries.
WHT/082311 
Case Study #1 (Enterprise) – LexisNexis Risk Solutions 
Data Source 
# of records 
Data Source 
# of records 
Associates/Relatives 
1.8 Billion 
People at Work 
1.5 Billion 
Bankruptcy 
23 Million 
Private Phones 
172 Million 
Business BDID's 
283 Million 
Professional Licenses 
94 Million 
Business People Links 
959 Million 
Property 
2.5 Billion 
Canadian Phones 
62 Million 
Sex Offenders 
550,000 
Consumer Header 
10.8 Billion 
SSN's 
7.2 Billion 
Criminal 
216 Million 
Student Records 
38 Million 
Date of Birth 
5.2 Billion 
TIN 
2.9 Million 
Death 
98 Million 
Unique ADLs - active 
257 Million 
Drivers Licenses 
397 Million 
Utility 
645 Million 
EDA Phones 
124 Million 
Vehicle Titles 
635 Million 
FEINs 
10.4 Million 
Vehicle Registrations 
2.5 Billion 
Historical Phones 
800 Million 
White Pages 
116 Million 
Hunting and Fishing Licenses 
67 Million 
Wireless Phones 
101 Million 
Liens and Judgments 
244 Million 
Yellow Pages 
14 Million 
Access to more than 25 billion public record filings 
Break-down of record counts for the more popular data sets:
WHT/082311 
Case Study #1 (Enterprise) – LexisNexis Risk Solutions
WHT/082311 
Case Study #2 (Boil the Ocean) – Property Transaction Risk
WHT/082311 
Case Study #2 (Boil the Ocean) – Property Transaction Risk 
Flipping 
Profit 
Collusion 
Three core transaction variables measured 
• 
Velocity 
• 
Profit (or not) 
• 
Buyer to Seller Relationship Distance (Potential of Collusion)
WHT/082311 
Case Study #2 (Boil the Ocean) – Property Transaction Risk 
Large Scale Suspicious Cluster Ranking 
±700 mill Deeds 
Derived Public Data Relationships from +/- 50 terabyte database 
Collusion Graph Analytics 
Chronological Analysis of all property Sales 
Historical Property Sales 
Indicators and Counts 
Person / Network Level Indicators and Counts 
Data Factory Clean
WHT/082311 
Case Study #2 (Boil the Ocean) – Property Transaction Risk
WHT/082311 
Case Study #2 (Boil the Ocean) – Property Transaction Risk 
Large scale measurement of influencers strategically placed to potentially direct suspicious transactions. 
• 
All data on one supercomputer measuring over a decade of property transfers nationwide. 
• 
Data Products to turn other Data into compelling intelligence. 
• 
Large Scale Graph Analytics allow for identifying known unknowns. 
• 
Florida Proof of Concept 
– 
Highest ranked influencers 
 
Identified known ringleaders in flipping and equity stripping schemes. 
 
Typically not connected directly to suspicious transactions. 
– 
Known ringleaders not the Highest Ranking. 
• 
Clusters with high levels of potential collusion. 
• 
Clusters offloading property, generating defaults. 
• 
Agile Framework able to keep step with emerging schemes in real estate.
WHT/082311 
Case Study #3 (Serendipity) – Family Ties
WHT/082311 
Case Study #3 (Serendipity) – Family Ties between Claims 
Scenario This view of carrier data shows seven known fraud claims and an additional linked claim. The Insurance company data only finds a connection between two of the seven claims, and only identified one other claim as being weakly connected.
WHT/082311 
Case Study #3 (Serendipity) – Family Ties between Claims 
Task After adding the LexID to the carrier Data, LexisNexis HPCC technology then explored 2 additional degrees of relative separation Result The results showed two family groups interconnected on all of these seven claims. The links were much stronger than the carrier data previously supported. 
Family 1 
Family 2
WHT/082311 
Case Study #4 (Tactical) – 30 Hour Job
WHT/082311 
Case Study #4 (Tactical) – 30 Hour Job 
Objective: 
• 
Re-engineer long-running legacy process (proof-of-concept) 
• 
3m+ rows in … 500m rows out … 30+ hours 
• 
Use similar hardware to maximize comparability Pre-Coding Set Up: 
• 
Legacy developers … completed on-line ECL training 
• 
1 ECL developer … exposed to legacy process and data 
• 
Dump of input data and known result files 
• 
Create HPCC hardware environment comparable to legacy environment 
• 
Amazon AWS 4 x m1.xlarge total (3 x m1.xlarge Thor Slaves) 
• 
12-slave CPUs Coding: 
• 
Meet-up for 1 week coding session
WHT/082311 
Case Study #4 (Tactical) – 30 Hour Job 
Legacy Environment: 
• 
Oracle on Intel SMP 
• 
16 cores HPCC Environment: 
• 
Amazon AWS 
• 
1 x m1.xlarge (support node) 
• 
3 x m1.xlarge (Thor Slaves) 
• 
12-slave Cores Results: 
• 
3 Days of Coding 
• 
450-ish Lines of ECL 
• 
Legacy Run-Time: 30+ hours 
• 
HPCC Run-Time: 1.5 hours
WHT/082311 
Additional Infomation 
 
LexisNexis Open Source HPCC Systems Platform: http://hpccsystems.com 
 
Free Online Training: http://learn.lexisnexis.com/hpcc 
 
SALT: http://hpccsystems.com/products-and-services/products/modules/SALT 
 
Machine Learning portal: http://hpccsystems.com/ml 
 
The HPCC Systems blog: http://hpccsystems.com/blog 
 
Community Forums: http://hpccsystems.com/bb 
 
Our GitHub portal: https://github.com/hpcc-systems 
 
JIRA: https://track.hpccsystems.com
WHT/082311 
Thank you!

More Related Content

What's hot

What's hot (9)

Infochimps Cloudcon 2012
Infochimps Cloudcon 2012Infochimps Cloudcon 2012
Infochimps Cloudcon 2012
 
HPCC Systems - Using Big Data to Help Feed the World
HPCC Systems - Using Big Data to Help Feed the WorldHPCC Systems - Using Big Data to Help Feed the World
HPCC Systems - Using Big Data to Help Feed the World
 
Big Data & Cloud - Infinite Monkey Theorem
Big Data & Cloud - Infinite Monkey TheoremBig Data & Cloud - Infinite Monkey Theorem
Big Data & Cloud - Infinite Monkey Theorem
 
Big data security challenges and recommendations!
Big data security challenges and recommendations!Big data security challenges and recommendations!
Big data security challenges and recommendations!
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy Industry
 
Risk listening: monitoring for profitable growth
Risk listening: monitoring for profitable growthRisk listening: monitoring for profitable growth
Risk listening: monitoring for profitable growth
 
Introduction to the Open Source HPCC Systems Platform by Arjuna Chala
Introduction to the Open Source HPCC Systems Platform by Arjuna ChalaIntroduction to the Open Source HPCC Systems Platform by Arjuna Chala
Introduction to the Open Source HPCC Systems Platform by Arjuna Chala
 
Connecting Home/Building, Life and Car..The Importance of Insurance Risk Moni...
Connecting Home/Building, Life and Car..The Importance of Insurance Risk Moni...Connecting Home/Building, Life and Car..The Importance of Insurance Risk Moni...
Connecting Home/Building, Life and Car..The Importance of Insurance Risk Moni...
 
BI on Big Data with instant response times at Verizon
BI on Big Data with instant response times at VerizonBI on Big Data with instant response times at Verizon
BI on Big Data with instant response times at Verizon
 

Viewers also liked

Youtube es una pagina en la que se pude ver videos de cualquier clase ya que ...
Youtube es una pagina en la que se pude ver videos de cualquier clase ya que ...Youtube es una pagina en la que se pude ver videos de cualquier clase ya que ...
Youtube es una pagina en la que se pude ver videos de cualquier clase ya que ...
federico23456
 
Hanson Dodge Creative - Active Lifestyle Portfolio
Hanson Dodge Creative - Active Lifestyle PortfolioHanson Dodge Creative - Active Lifestyle Portfolio
Hanson Dodge Creative - Active Lifestyle Portfolio
Bryan Rasch
 
Annual report of working children program 2010
Annual report of working children program 2010Annual report of working children program 2010
Annual report of working children program 2010
Asher Nazir Asher Nazir
 
El español en el mercado de la traducción en sudáfrica prospección
El español en el mercado de la traducción en sudáfrica prospecciónEl español en el mercado de la traducción en sudáfrica prospección
El español en el mercado de la traducción en sudáfrica prospección
hotsavannah
 
4home.cz - Katalog podzim 2011
4home.cz - Katalog podzim 20114home.cz - Katalog podzim 2011
4home.cz - Katalog podzim 2011
4home
 
Successful collaboration and team dynamics at the masters level (1) (1)
Successful collaboration and team dynamics at the masters level (1) (1)Successful collaboration and team dynamics at the masters level (1) (1)
Successful collaboration and team dynamics at the masters level (1) (1)
chandprice
 

Viewers also liked (20)

Data Analytics Governance and Ethics
Data Analytics Governance and EthicsData Analytics Governance and Ethics
Data Analytics Governance and Ethics
 
HPCC Presentation
HPCC PresentationHPCC Presentation
HPCC Presentation
 
Turnaround guide
Turnaround guideTurnaround guide
Turnaround guide
 
Procesos QED de bajo orden
Procesos QED de bajo ordenProcesos QED de bajo orden
Procesos QED de bajo orden
 
Youtube es una pagina en la que se pude ver videos de cualquier clase ya que ...
Youtube es una pagina en la que se pude ver videos de cualquier clase ya que ...Youtube es una pagina en la que se pude ver videos de cualquier clase ya que ...
Youtube es una pagina en la que se pude ver videos de cualquier clase ya que ...
 
Perfil
PerfilPerfil
Perfil
 
Akademija komercijalne komunikacije
Akademija komercijalne komunikacijeAkademija komercijalne komunikacije
Akademija komercijalne komunikacije
 
Mountaineer 2013 08-16
Mountaineer 2013 08-16Mountaineer 2013 08-16
Mountaineer 2013 08-16
 
Hanson Dodge Creative - Active Lifestyle Portfolio
Hanson Dodge Creative - Active Lifestyle PortfolioHanson Dodge Creative - Active Lifestyle Portfolio
Hanson Dodge Creative - Active Lifestyle Portfolio
 
The kase presentation
The kase presentationThe kase presentation
The kase presentation
 
Assessment of gender equality in ethiopia
Assessment of gender equality in ethiopiaAssessment of gender equality in ethiopia
Assessment of gender equality in ethiopia
 
Tourisme : comment Internet modifie-t-il
Tourisme : comment Internet modifie-t-il Tourisme : comment Internet modifie-t-il
Tourisme : comment Internet modifie-t-il
 
54550972 oswald-chambers-my-utmost-for-his-highest
54550972 oswald-chambers-my-utmost-for-his-highest54550972 oswald-chambers-my-utmost-for-his-highest
54550972 oswald-chambers-my-utmost-for-his-highest
 
Annual report of working children program 2010
Annual report of working children program 2010Annual report of working children program 2010
Annual report of working children program 2010
 
El español en el mercado de la traducción en sudáfrica prospección
El español en el mercado de la traducción en sudáfrica prospecciónEl español en el mercado de la traducción en sudáfrica prospección
El español en el mercado de la traducción en sudáfrica prospección
 
You are the media
You are the mediaYou are the media
You are the media
 
4home.cz - Katalog podzim 2011
4home.cz - Katalog podzim 20114home.cz - Katalog podzim 2011
4home.cz - Katalog podzim 2011
 
Webinar: Financial Management - Dashboard Overview
Webinar: Financial Management - Dashboard OverviewWebinar: Financial Management - Dashboard Overview
Webinar: Financial Management - Dashboard Overview
 
Successful collaboration and team dynamics at the masters level (1) (1)
Successful collaboration and team dynamics at the masters level (1) (1)Successful collaboration and team dynamics at the masters level (1) (1)
Successful collaboration and team dynamics at the masters level (1) (1)
 
Technology trends that will shape our future
Technology trends that will shape our futureTechnology trends that will shape our future
Technology trends that will shape our future
 

Similar to HPCC Systems Presentation to TDWI Chicago Chapter

Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
Running Enterprise Workloads with an Open Source Hybrid Cloud Data ArchitectureRunning Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
DataWorks Summit
 
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
Denodo
 
Running Enterprise Workloads with an open source Hybrid Cloud Data Architectu...
Running Enterprise Workloads with an open source Hybrid Cloud Data Architectu...Running Enterprise Workloads with an open source Hybrid Cloud Data Architectu...
Running Enterprise Workloads with an open source Hybrid Cloud Data Architectu...
DataWorks Summit
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Denodo
 
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackBig Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Precisely
 
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address RequirementsGov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
DataWorks Summit
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
Denodo
 

Similar to HPCC Systems Presentation to TDWI Chicago Chapter (20)

Big Data Bootcamp 2017 - Atlanta - Flavio Villanustre
Big Data Bootcamp 2017 - Atlanta - Flavio VillanustreBig Data Bootcamp 2017 - Atlanta - Flavio Villanustre
Big Data Bootcamp 2017 - Atlanta - Flavio Villanustre
 
Dell Digital Transformation Through AI and Data Analytics Webinar
Dell Digital Transformation Through AI and  Data Analytics WebinarDell Digital Transformation Through AI and  Data Analytics Webinar
Dell Digital Transformation Through AI and Data Analytics Webinar
 
Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
Running Enterprise Workloads with an Open Source Hybrid Cloud Data ArchitectureRunning Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
 
Virtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government Insights
Virtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government InsightsVirtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government Insights
Virtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government Insights
 
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
 
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
 
MatterPoint Overview
MatterPoint OverviewMatterPoint Overview
MatterPoint Overview
 
State of Florida Neo4J Graph Briefing - Keynote
State of Florida Neo4J Graph Briefing - KeynoteState of Florida Neo4J Graph Briefing - Keynote
State of Florida Neo4J Graph Briefing - Keynote
 
BDVe Webinar Series - Ocean Protocol – Why you need to care about how you sha...
BDVe Webinar Series - Ocean Protocol – Why you need to care about how you sha...BDVe Webinar Series - Ocean Protocol – Why you need to care about how you sha...
BDVe Webinar Series - Ocean Protocol – Why you need to care about how you sha...
 
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
 
TIC-TOC: Disrupt the Threat Management Conversation with Dominique Singer and...
TIC-TOC: Disrupt the Threat Management Conversation with Dominique Singer and...TIC-TOC: Disrupt the Threat Management Conversation with Dominique Singer and...
TIC-TOC: Disrupt the Threat Management Conversation with Dominique Singer and...
 
SIEM, malware protection, deep data visibility — for free
SIEM, malware protection, deep data visibility — for freeSIEM, malware protection, deep data visibility — for free
SIEM, malware protection, deep data visibility — for free
 
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and HortonworksHow to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
 
Running Enterprise Workloads with an open source Hybrid Cloud Data Architectu...
Running Enterprise Workloads with an open source Hybrid Cloud Data Architectu...Running Enterprise Workloads with an open source Hybrid Cloud Data Architectu...
Running Enterprise Workloads with an open source Hybrid Cloud Data Architectu...
 
Electronic Data Discovery
Electronic Data DiscoveryElectronic Data Discovery
Electronic Data Discovery
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
 
United Technologies, Hands On Reference Data Management For Corporate Finance...
United Technologies, Hands On Reference Data Management For Corporate Finance...United Technologies, Hands On Reference Data Management For Corporate Finance...
United Technologies, Hands On Reference Data Management For Corporate Finance...
 
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackBig Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
 
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address RequirementsGov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
 

More from HPCC Systems

Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
HPCC Systems
 

More from HPCC Systems (20)

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
 
Welcome
WelcomeWelcome
Welcome
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
 
Docker Support
Docker Support Docker Support
Docker Support
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

HPCC Systems Presentation to TDWI Chicago Chapter

  • 1. Overview of HPCC Systems and Case Studies Presenter: Brian Bounds, Director of Software Engineering November 2014
  • 2. Context Business Services: 39% • About Brian • HPCC Systems in the Context of LexisNexis • Isn’t Big Data just Data?
  • 3. WHT/082311 http://hpccsystems.com Components of the Open Source HPCC Systems platform • The HPCC Systems platform includes: • Thor: batch oriented data manipulation, linking and analytics engine • Roxie: real-time data delivery and analytics engine • A high level declarative dataflow language: ECL • Implicitly parallel • No side effects • Code/data encapsulation • Extensible • Highly optimized • Builds graphical execution plans • Compiles into C++ and native machine code • Common to Thor and Roxie • An extensive library of ECL modules, including data profiling, linking, graph analytics, and Machine Learning
  • 4. The LexisNexis Open Source HPCC Systems platform
  • 5. WHT/082311 Detailed HPCC Systems Platform Architecture
  • 6. WHT/082311 HPCC Systems Overview Video
  • 7. WHT/082311 • The Programming Language (ECL) • The Delivery Engine (ROXIE) • Enterprise Readiness • Big Data … becomes Data (that might be Big) Features to Care About
  • 8. WHT/082311 Case Study #1 (Enterprise) – LexisNexis Risk Solutions
  • 9. WHT/082311 Case Study #1 (Enterprise) – LexisNexis Risk Solutions We are among the largest providers of risk solutions in the market today LexisNexis Reed Elsevier is a world leading provider of information solutions. LexisNexis Risk Solutions Revenue: $M LexisNexis Risk Solutions has seen sustained revenue and profit growth Government 8% LexisNexis Risk Solutions is a leading provider in the U.S. across Business Services, Insurance and Government segments. Insurance: 53% Business Services: 39% LexisNexis Risk Solutions Revenue by Segment
  • 10. WHT/082311 We are among the largest providers of risk solutions in the market today LexisNexis Our customers include: • 99 of the top 100 US banks • 90% of the Fortune 500 • 100% of US P&C insurance carriers • All 50 US states, 70% of local governments and 80% of US federal agencies • 97 of Am Law 100 firms Government 8% LexisNexis Risk Solutions is a leading provider in the U.S. across Business Services, Insurance and Government segments. Insurance: 53% Business Services: 39% LexisNexis Risk Solutions Revenue by Segment Case Study #1 (Enterprise) – LexisNexis Risk Solutions
  • 11. WHT/082311 Case Study #1 (Enterprise) – LexisNexis Risk Solutions We have a unique set of capabilities: Data, Linking, Analytics, and Product Development Customer-Focused Solutions + + = Linking & Analytics + Industry-Specific Expertise & Delivery Data Technology Vast Data Resources • Process • Sources • Coverage • Advanced linking & analytics • Accuracy & efficiency • Protect private information • Aligned with our customers’ industries • Deep industry expertise • Speed • Capacity • Cost savings • Predict, manage and assess risk across many industries.
  • 12. WHT/082311 Case Study #1 (Enterprise) – LexisNexis Risk Solutions Data Source # of records Data Source # of records Associates/Relatives 1.8 Billion People at Work 1.5 Billion Bankruptcy 23 Million Private Phones 172 Million Business BDID's 283 Million Professional Licenses 94 Million Business People Links 959 Million Property 2.5 Billion Canadian Phones 62 Million Sex Offenders 550,000 Consumer Header 10.8 Billion SSN's 7.2 Billion Criminal 216 Million Student Records 38 Million Date of Birth 5.2 Billion TIN 2.9 Million Death 98 Million Unique ADLs - active 257 Million Drivers Licenses 397 Million Utility 645 Million EDA Phones 124 Million Vehicle Titles 635 Million FEINs 10.4 Million Vehicle Registrations 2.5 Billion Historical Phones 800 Million White Pages 116 Million Hunting and Fishing Licenses 67 Million Wireless Phones 101 Million Liens and Judgments 244 Million Yellow Pages 14 Million Access to more than 25 billion public record filings Break-down of record counts for the more popular data sets:
  • 13. WHT/082311 Case Study #1 (Enterprise) – LexisNexis Risk Solutions
  • 14. WHT/082311 Case Study #2 (Boil the Ocean) – Property Transaction Risk
  • 15. WHT/082311 Case Study #2 (Boil the Ocean) – Property Transaction Risk Flipping Profit Collusion Three core transaction variables measured • Velocity • Profit (or not) • Buyer to Seller Relationship Distance (Potential of Collusion)
  • 16. WHT/082311 Case Study #2 (Boil the Ocean) – Property Transaction Risk Large Scale Suspicious Cluster Ranking ±700 mill Deeds Derived Public Data Relationships from +/- 50 terabyte database Collusion Graph Analytics Chronological Analysis of all property Sales Historical Property Sales Indicators and Counts Person / Network Level Indicators and Counts Data Factory Clean
  • 17. WHT/082311 Case Study #2 (Boil the Ocean) – Property Transaction Risk
  • 18. WHT/082311 Case Study #2 (Boil the Ocean) – Property Transaction Risk Large scale measurement of influencers strategically placed to potentially direct suspicious transactions. • All data on one supercomputer measuring over a decade of property transfers nationwide. • Data Products to turn other Data into compelling intelligence. • Large Scale Graph Analytics allow for identifying known unknowns. • Florida Proof of Concept – Highest ranked influencers  Identified known ringleaders in flipping and equity stripping schemes.  Typically not connected directly to suspicious transactions. – Known ringleaders not the Highest Ranking. • Clusters with high levels of potential collusion. • Clusters offloading property, generating defaults. • Agile Framework able to keep step with emerging schemes in real estate.
  • 19. WHT/082311 Case Study #3 (Serendipity) – Family Ties
  • 20. WHT/082311 Case Study #3 (Serendipity) – Family Ties between Claims Scenario This view of carrier data shows seven known fraud claims and an additional linked claim. The Insurance company data only finds a connection between two of the seven claims, and only identified one other claim as being weakly connected.
  • 21. WHT/082311 Case Study #3 (Serendipity) – Family Ties between Claims Task After adding the LexID to the carrier Data, LexisNexis HPCC technology then explored 2 additional degrees of relative separation Result The results showed two family groups interconnected on all of these seven claims. The links were much stronger than the carrier data previously supported. Family 1 Family 2
  • 22. WHT/082311 Case Study #4 (Tactical) – 30 Hour Job
  • 23. WHT/082311 Case Study #4 (Tactical) – 30 Hour Job Objective: • Re-engineer long-running legacy process (proof-of-concept) • 3m+ rows in … 500m rows out … 30+ hours • Use similar hardware to maximize comparability Pre-Coding Set Up: • Legacy developers … completed on-line ECL training • 1 ECL developer … exposed to legacy process and data • Dump of input data and known result files • Create HPCC hardware environment comparable to legacy environment • Amazon AWS 4 x m1.xlarge total (3 x m1.xlarge Thor Slaves) • 12-slave CPUs Coding: • Meet-up for 1 week coding session
  • 24. WHT/082311 Case Study #4 (Tactical) – 30 Hour Job Legacy Environment: • Oracle on Intel SMP • 16 cores HPCC Environment: • Amazon AWS • 1 x m1.xlarge (support node) • 3 x m1.xlarge (Thor Slaves) • 12-slave Cores Results: • 3 Days of Coding • 450-ish Lines of ECL • Legacy Run-Time: 30+ hours • HPCC Run-Time: 1.5 hours
  • 25. WHT/082311 Additional Infomation  LexisNexis Open Source HPCC Systems Platform: http://hpccsystems.com  Free Online Training: http://learn.lexisnexis.com/hpcc  SALT: http://hpccsystems.com/products-and-services/products/modules/SALT  Machine Learning portal: http://hpccsystems.com/ml  The HPCC Systems blog: http://hpccsystems.com/blog  Community Forums: http://hpccsystems.com/bb  Our GitHub portal: https://github.com/hpcc-systems  JIRA: https://track.hpccsystems.com