SlideShare a Scribd company logo
1 of 28
Download to read offline
www.globalbigdataconference.com
Twitter : @bigdataconf
Big Data Processing Beyond Map Reduce
Dr. Flavio Villanustre, VP Technology
October 7, 2016
The Data Centric Approach
3
A single source of
data is insufficient to
overcome inaccuracies
in the data
The holes are inaccuracies
found in the data.
Our platform is built on the premise of absorbing data
from multiple data sources and transforming them to a
highly intelligent social network graphs that can be
processed to non-obvious relationships.
The holes in the
core data have
been eliminated.
The Big Data funnel
Big
Data
Structured
Records
Unstructured
Records
News
Articles
Proprietary
Data
Public
Records
Unstructured and
Structured Content High Performance Computing Cluster Platform (HPCC) Analysis Applications
Fusion
Linking
Refinery
Complex Analysis
Clustering Analysis
Link Analysis
Entity Resolution
Data Flow Oriented Big Data Platform
5
ESP
Middleware
Services
Raw data from
several sources
BatchSubscribersPortal
Thor (Data Lake)
• Shared Nothing MPP Architecture
• Commodity Hardware
• Batch ETL and Analytics
ECL
Batch requests for
scoring and analytics • Easy to use • Implicitly Parallel • Compiles to C++
ROXIE (Query)
• Shared Nothing MPP Architecture
• Commodity Hardware
• Real-time Indexed Based Query
• Low Latency, Highly Concurrent
and Highly Redundant
Batch Processed
Data
BatchSubscribers
Thor
ECL – The Data Flow Oriented Programming Language
Raw data
from
several
sources
Reporting
ECL
Batch
reporting
requests
ROXIE
Batch
reporting
requests
• An easy to use, data-centric programming
language optimized for large-scale data
management and query processing
• Highly efficient — automatically distributes
workload across all nodes
• 80% more efficient than C++, Java and SQL —
1/3 reduction in programmer time to
maintain/enhance existing applications
• Benchmark against SQL (5 times more efficient)
for code generation
• Automatic parallelization and synchronization
of sequential algorithms for parallel and
distributed processing
• Large library of built-in modules to handle
common data manipulation tasks
Declarative programming language … powerful, extensible,
implicitly parallel, maintainable, complete and homogeneous
01 JOE CHAMBERS 02 JOHN SMITH
Scalability: ECL Is Inherently — and Extremely — Parallel
FEATURES
• At the foundation level
ECL works on data flows
• Imagine data is flowing
through a pipe, ECL can
simultaneously execute
operations on different
parts of the pipe
• One-to-many pipelines
are constructed based
on achievable
parallelism and
available resources
• Laziness ensures that
execution is only done as
needed
ds := somedata;
transformed := TRANSFORM(ds, UPPER(NAME));
Data Partitioned Parallel Processing
01 joe chambers
02 john smith
…. ….
1001 alex shaw
1002 peter king
…. ….
direction of data flow
TRANSFORM Next Operation…
PIPELINE 1 – NODE 1
1002 PETER KING 1001 ALEX SHAW
direction of data flow
TRANSFORM Next Operation…
PIPELINE CC – NODE 2
Original somedata dataset
The HPCC stack (STRIKE)
8
Data Connect
Analytics
Tools
Common
Programming
Language
Data
Science
Portal
Cleaning
MDM
Dashboard Creator Workflow Builder
ECL
Thor ROXIEInterlok
Profiling
Normalization
Predictive
Analysis
Business
Intelligence
Attribute
Creation
Relationship
Analysis
SALT KEL
Master Data Management with SALT
9
From disparate data, to clustering,
to showing relationships
SALT Enables Content Disambiguation to Increase Productivity
10
• The acronym stands for
“Scalable Automated
Linking Technology”
• Entity disambiguation
using Inference
Techniques
• Templates based ECL
code generator
• Provides for automated
data profiling, parsing,
cleansing, normalization
and standardization
• Sophisticated specificity
and relatives based
linking and clustering
42 Lines
of SALT
3,980 Lines
of ECL
482,410 Lines
of C++
Data Sources
StandardizationNormalizationCleansingParsingProfiling
1. Data Preparation Processes (ETL)
2. Record Linkage Process
Matching
Weights &
Threshold
Computation
Record
Match
Decision
Linked
Data
File
Linking Iterations
Weight Assignment
& Record Comparison
Blocking/
Searching
Add’l Data
Ingest
Relationship Analysis With KEL
11
KEL — an abstraction for
network/graph processing
• Declarative model: describe what
things are, rather than how to
execute
• High level: vertices and edges are
first class citizens
• A single model to describe graphs
and queries
• Leverages Thor for heavy lifting
and ROXIE for real-time analytics
• Compiles into ECL (and ECL
compiles into C++, which compiles
into assembler)
The Data Science Portal
12
Dashboards
13
Selected use cases
Introduction to HPCC Systems14
LexisNexis Legal & Professional
15
Fast insight into case law
THE CHALLENGE
100+ million documents
Entity identification and
resolution
Document and topic classification
Near real-time feedback
THE SOLUTION
LexisNexis Legal & Professional
16
• Generation 2 entity recognition employs HPCC
PARSE(…) function and pattern rules
• More entities recognized
• Faster development
• Faster operation
• SALT based entity resolution
• Custom resolution for citation entities
• Case law and statute reference
• References can be anaphoric (like infra)
• Parallels (same case in more than one
book)
• Active learning used to extend classification
THE OUTCOME:
Two enormous benefits:
 Huge lift on entity resolution and
document classification because
of SALT
 Ahead of customers in terms of
performance and maintaining
currency of data because of the
rapid big data processing
capabilities
SciVal
17
Fast insight into evidence based
research data
Scopus data covering 21,000 titles
from 5,000 publishers that’s
updated weekly
Data from 4,600 research
institutions and 220 countries
Real-time user specific data slicing
THE CHALLENGE
THE SOLUTION
SciVal
18
• Clean, standardize
and build Analytics
Queries (HPCC Thor)
from 40+ TB of data
on a weekly basis
• Support rapid query
interface to support
both ad hoc queries
and canned queries
(HPCC ROXIE)
THE OUTCOME:
Two enormous benefits:
 Users can customize their own
visualizations, which are
generated in just a few seconds
 HPCC Systems works in 2 modes:
1. offline crunching of huge,
pre-defined requests
2. smaller calculations in real-
time on customer generated
data slices
Smart Hard Hat Ecosystem
19
4,000 workers die and millions
injured annually while working
on the industrial floor
Very high cost for maintaining
safety for businesses
THE CHALLENGE
THE SOLUTION
Smart Hard Hat Ecosystem
20
• Equip workers’ hats with smart
sensor technology
• Central real-time processing of
(high volume) information with
real-time alerting capability
(HPCC Systems)
• Customizable dashboards, rules
framework and data workflow
frameworks (HPCC Systems)
• Predictive modeling and
analytics (HPCC Systems)
THE OUTCOME:
Produced an industrial
wearable that uses
IoT and wireless
communications
systems to protect
and empower
industrial workers.
Driver Behavior with Smart Telematics
21
High cost of insurance
High car accident rates
Lack of tools to analyze
driver behavior
THE CHALLENGE
THE SOLUTION
Driver Behavior with Smart Telematics
22
• Telematics smart phone
application
• Central system to collect
(very large) data and perform
analytics (HPCC Systems)
• Journey based feedback
to all drivers to advice
and correct behavior
(HPCC Systems)
• Insurance enrollment to
reduce premiums
THE OUTCOME:
 Recommend corrections to driver
behavior that would avoid accidents
 Reduce overall Insurance costs
 Correlate information from drivers
data traversing the same path to
create an understanding of
predictable actions
 Examples include periods of traffic
congestion, problem areas in the
path and hazard detection
Contextual Marketing
23
Understanding an individual
customer’s behavior based on
past actions
Technical problem
• Huge volumes of data based
on observed cell phone Wi-Fi
• Apply advanced machine
learning techniques
THE CHALLENGE
THE SOLUTION
Contextual Marketing
24
• Central analytics system
to collect and analyze
data (HPCC Systems)
• Leverage parallel
algorithms to perform
analytics on large
quantities of data
(HPCC Systems)
THE OUTCOME:
Created a platform to
process any location
specific telecom data that
can be analyzed rapidly to
gauge consumer behavior
and in turn help drive
context-based marketing
Predict Passenger Volumes in Airports
25
How to interpret 100’s millions of
location points while adjusting to
flight schedule changes
Complex clustering algorithm
requirements
Understand passenger behaviors
and interaction of local
areas of activity
THE CHALLENGE
THE SOLUTION
Predict Passenger Volumes in Airports
26
HPCC used to solve Big Data
challenges
• Raw data to refined data
• Clustering analysis
• Forecasting
THE OUTCOME:
Better passenger
experience and better
airport planning
• Portal: http://hpccsystems.com
• ECL Language Reference: https://hpccsystems.com/ecl-language-reference
• SALT: https://hpccsystems.com/enterprise-services/purchase-required-modules/SALT
• KEL: https://hpccsystems.com/download/free-modules/kel-lite
• Machine Learning: http://hpccsystems.com/ml
• Online Training: http://learn.lexisnexis.com/hpcc
• HPCC Systems Blog: http://hpccsystems.com/blog
• HPCC Systems Wiki & Red Book: https://wiki.hpccsystems.com
• Our GitHub portal: https://github.com/hpcc-systems
• Community Forums: http://hpccsystems.com/bb
• Case Studies: https://hpccsystems.com/resources/case-studies
Resources
Questions?Dr. Flavio Villanustre
VP Technology, LexisNexis® Risk Solutions
Flavio.Villanustre@lexisnexis.com

More Related Content

What's hot

Credit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph AnalysisCredit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph AnalysisJen Aman
 
Obfuscating LinkedIn Member Data
Obfuscating LinkedIn Member DataObfuscating LinkedIn Member Data
Obfuscating LinkedIn Member DataDataWorks Summit
 
TopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstTopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstSpark Summit
 
Architecture for Real-Time and Batch Big Data Analytics
Architecture for Real-Time and Batch Big Data AnalyticsArchitecture for Real-Time and Batch Big Data Analytics
Architecture for Real-Time and Batch Big Data AnalyticsNir Rubinstein
 
Graph Data: a New Data Management Frontier
Graph Data: a New Data Management FrontierGraph Data: a New Data Management Frontier
Graph Data: a New Data Management FrontierDemai Ni
 
Dealing With Drift - Building an Enterprise Data Lake
Dealing With Drift - Building an Enterprise Data LakeDealing With Drift - Building an Enterprise Data Lake
Dealing With Drift - Building an Enterprise Data LakePat Patterson
 
Introduction to basic data analytics tools
Introduction to basic data analytics toolsIntroduction to basic data analytics tools
Introduction to basic data analytics toolsNascenia IT
 
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...DataWorks Summit/Hadoop Summit
 
Streamsets and spark at SF Hadoop User Group
Streamsets and spark at SF Hadoop User GroupStreamsets and spark at SF Hadoop User Group
Streamsets and spark at SF Hadoop User GroupHari Shreedharan
 
PNDA - Platform for Network Data Analytics
PNDA - Platform for Network Data AnalyticsPNDA - Platform for Network Data Analytics
PNDA - Platform for Network Data AnalyticsJohn Evans
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudRick Bilodeau
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Geoffrey Fox
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaDatabricks
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceRevolution Analytics
 
Spark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat PattersonSpark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat PattersonSpark Summit
 
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...Databricks
 
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demoDatabricks
 

What's hot (20)

Credit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph AnalysisCredit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph Analysis
 
Obfuscating LinkedIn Member Data
Obfuscating LinkedIn Member DataObfuscating LinkedIn Member Data
Obfuscating LinkedIn Member Data
 
TopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstTopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David Durst
 
Architecture for Real-Time and Batch Big Data Analytics
Architecture for Real-Time and Batch Big Data AnalyticsArchitecture for Real-Time and Batch Big Data Analytics
Architecture for Real-Time and Batch Big Data Analytics
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 
Graph Data: a New Data Management Frontier
Graph Data: a New Data Management FrontierGraph Data: a New Data Management Frontier
Graph Data: a New Data Management Frontier
 
Dealing With Drift - Building an Enterprise Data Lake
Dealing With Drift - Building an Enterprise Data LakeDealing With Drift - Building an Enterprise Data Lake
Dealing With Drift - Building an Enterprise Data Lake
 
Introduction to basic data analytics tools
Introduction to basic data analytics toolsIntroduction to basic data analytics tools
Introduction to basic data analytics tools
 
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
 
Streamsets and spark at SF Hadoop User Group
Streamsets and spark at SF Hadoop User GroupStreamsets and spark at SF Hadoop User Group
Streamsets and spark at SF Hadoop User Group
 
Building a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with RBuilding a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with R
 
PNDA - Platform for Network Data Analytics
PNDA - Platform for Network Data AnalyticsPNDA - Platform for Network Data Analytics
PNDA - Platform for Network Data Analytics
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
Spark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat PattersonSpark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat Patterson
 
ExecutiveWhitePaper
ExecutiveWhitePaperExecutiveWhitePaper
ExecutiveWhitePaper
 
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
 
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
 

Viewers also liked

HPCC Systems - Using Big Data to Help Feed the World
HPCC Systems - Using Big Data to Help Feed the WorldHPCC Systems - Using Big Data to Help Feed the World
HPCC Systems - Using Big Data to Help Feed the WorldHPCC Systems
 
Introduction to the Open Source HPCC Systems Platform by Arjuna Chala
Introduction to the Open Source HPCC Systems Platform by Arjuna ChalaIntroduction to the Open Source HPCC Systems Platform by Arjuna Chala
Introduction to the Open Source HPCC Systems Platform by Arjuna ChalaHPCC Systems
 
Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013 Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013 IBM Sverige
 
The Data Driven Enterprise - Roadmap to Big Data & Analytics Success
The Data Driven Enterprise - Roadmap to Big Data & Analytics SuccessThe Data Driven Enterprise - Roadmap to Big Data & Analytics Success
The Data Driven Enterprise - Roadmap to Big Data & Analytics SuccessBigInsights
 
Meetup: Case Study - HPCC Systems implementation for an Aviation company
Meetup: Case Study - HPCC Systems implementation for an Aviation companyMeetup: Case Study - HPCC Systems implementation for an Aviation company
Meetup: Case Study - HPCC Systems implementation for an Aviation companyHPCC Systems
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBernard Marr
 
About diamonds
About diamondsAbout diamonds
About diamondsloser08
 
Pdhpe rationalev2
Pdhpe rationalev2Pdhpe rationalev2
Pdhpe rationalev2Jen_e1986
 
A po afrohet momenti i kthesës për Evropën?!
A po afrohet momenti i kthesës për Evropën?!A po afrohet momenti i kthesës për Evropën?!
A po afrohet momenti i kthesës për Evropën?!Veton Rexhepi
 
Máquinas de Combustão Interna – Ciclo Otto
Máquinas de Combustão Interna – Ciclo OttoMáquinas de Combustão Interna – Ciclo Otto
Máquinas de Combustão Interna – Ciclo OttoAlbert Oliveira
 
Bab vii spektek kejari
Bab vii spektek kejariBab vii spektek kejari
Bab vii spektek kejariMohammad Rovik
 
Future office multi employer worksite ppt
Future office multi employer worksite pptFuture office multi employer worksite ppt
Future office multi employer worksite pptBethany Yorio
 
What is SPF record good for? | Part 7#17
What is SPF record good for? | Part 7#17What is SPF record good for? | Part 7#17
What is SPF record good for? | Part 7#17Eyal Doron
 
Праздничный каталог LR до 10.03.2015 Россия
Праздничный каталог LR до 10.03.2015 РоссияПраздничный каталог LR до 10.03.2015 Россия
Праздничный каталог LR до 10.03.2015 Россияt575ae
 

Viewers also liked (20)

HPCC Systems - Using Big Data to Help Feed the World
HPCC Systems - Using Big Data to Help Feed the WorldHPCC Systems - Using Big Data to Help Feed the World
HPCC Systems - Using Big Data to Help Feed the World
 
Introduction to the Open Source HPCC Systems Platform by Arjuna Chala
Introduction to the Open Source HPCC Systems Platform by Arjuna ChalaIntroduction to the Open Source HPCC Systems Platform by Arjuna Chala
Introduction to the Open Source HPCC Systems Platform by Arjuna Chala
 
Big Data Week - Chennai - 2014
Big Data Week - Chennai - 2014Big Data Week - Chennai - 2014
Big Data Week - Chennai - 2014
 
The BYTE Project
The BYTE Project The BYTE Project
The BYTE Project
 
Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013 Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013
 
The Data Driven Enterprise - Roadmap to Big Data & Analytics Success
The Data Driven Enterprise - Roadmap to Big Data & Analytics SuccessThe Data Driven Enterprise - Roadmap to Big Data & Analytics Success
The Data Driven Enterprise - Roadmap to Big Data & Analytics Success
 
Meetup: Case Study - HPCC Systems implementation for an Aviation company
Meetup: Case Study - HPCC Systems implementation for an Aviation companyMeetup: Case Study - HPCC Systems implementation for an Aviation company
Meetup: Case Study - HPCC Systems implementation for an Aviation company
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
About diamonds
About diamondsAbout diamonds
About diamonds
 
модуль 3а
модуль 3амодуль 3а
модуль 3а
 
Pdhpe rationalev2
Pdhpe rationalev2Pdhpe rationalev2
Pdhpe rationalev2
 
Spek komputer
Spek komputerSpek komputer
Spek komputer
 
Presentation1
Presentation1Presentation1
Presentation1
 
A po afrohet momenti i kthesës për Evropën?!
A po afrohet momenti i kthesës për Evropën?!A po afrohet momenti i kthesës për Evropën?!
A po afrohet momenti i kthesës për Evropën?!
 
Máquinas de Combustão Interna – Ciclo Otto
Máquinas de Combustão Interna – Ciclo OttoMáquinas de Combustão Interna – Ciclo Otto
Máquinas de Combustão Interna – Ciclo Otto
 
Bab vii spektek kejari
Bab vii spektek kejariBab vii spektek kejari
Bab vii spektek kejari
 
Future office multi employer worksite ppt
Future office multi employer worksite pptFuture office multi employer worksite ppt
Future office multi employer worksite ppt
 
What is SPF record good for? | Part 7#17
What is SPF record good for? | Part 7#17What is SPF record good for? | Part 7#17
What is SPF record good for? | Part 7#17
 
 
Праздничный каталог LR до 10.03.2015 Россия
Праздничный каталог LR до 10.03.2015 РоссияПраздничный каталог LR до 10.03.2015 Россия
Праздничный каталог LR до 10.03.2015 Россия
 

Similar to Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre

HUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation SlidesHUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation SlidesJohn Mulhall
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Maya Lumbroso
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Dataconomy Media
 
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...Dataconomy Media
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
Redefining Data Analytics Through Search
Redefining Data Analytics Through SearchRedefining Data Analytics Through Search
Redefining Data Analytics Through SearchConnexica
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2Joe_F
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About Jesus Rodriguez
 
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big datawebwinkelvakdag
 
Presentation at Wright State University
Presentation at Wright State UniversityPresentation at Wright State University
Presentation at Wright State UniversityHPCC Systems
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Dell Digital Transformation Through AI and Data Analytics Webinar
Dell Digital Transformation Through AI and  Data Analytics WebinarDell Digital Transformation Through AI and  Data Analytics Webinar
Dell Digital Transformation Through AI and Data Analytics WebinarBill Wong
 
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...StampedeCon
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data Geoffrey Fox
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 

Similar to Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre (20)

HUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation SlidesHUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation Slides
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
 
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Redefining Data Analytics Through Search
Redefining Data Analytics Through SearchRedefining Data Analytics Through Search
Redefining Data Analytics Through Search
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big data
 
Presentation at Wright State University
Presentation at Wright State UniversityPresentation at Wright State University
Presentation at Wright State University
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Dell Digital Transformation Through AI and Data Analytics Webinar
Dell Digital Transformation Through AI and  Data Analytics WebinarDell Digital Transformation Through AI and  Data Analytics Webinar
Dell Digital Transformation Through AI and Data Analytics Webinar
 
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
Lecture1
Lecture1Lecture1
Lecture1
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 

More from HPCC Systems

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...HPCC Systems
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsHPCC Systems
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn HPCC Systems
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingHPCC Systems
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle ChangesHPCC Systems
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index HPCC Systems
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningHPCC Systems
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesHPCC Systems
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsHPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch HPCC Systems
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem HPCC Systems
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis ToolHPCC Systems
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony HPCC Systems
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterHPCC Systems
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...HPCC Systems
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...HPCC Systems
 

More from HPCC Systems (20)

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
 
Welcome
WelcomeWelcome
Welcome
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
 
Docker Support
Docker Support Docker Support
Docker Support
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
 

Recently uploaded

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 

Recently uploaded (20)

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 

Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre

  • 2. Big Data Processing Beyond Map Reduce Dr. Flavio Villanustre, VP Technology October 7, 2016
  • 3. The Data Centric Approach 3 A single source of data is insufficient to overcome inaccuracies in the data The holes are inaccuracies found in the data. Our platform is built on the premise of absorbing data from multiple data sources and transforming them to a highly intelligent social network graphs that can be processed to non-obvious relationships. The holes in the core data have been eliminated.
  • 4. The Big Data funnel Big Data Structured Records Unstructured Records News Articles Proprietary Data Public Records Unstructured and Structured Content High Performance Computing Cluster Platform (HPCC) Analysis Applications Fusion Linking Refinery Complex Analysis Clustering Analysis Link Analysis Entity Resolution
  • 5. Data Flow Oriented Big Data Platform 5 ESP Middleware Services Raw data from several sources BatchSubscribersPortal Thor (Data Lake) • Shared Nothing MPP Architecture • Commodity Hardware • Batch ETL and Analytics ECL Batch requests for scoring and analytics • Easy to use • Implicitly Parallel • Compiles to C++ ROXIE (Query) • Shared Nothing MPP Architecture • Commodity Hardware • Real-time Indexed Based Query • Low Latency, Highly Concurrent and Highly Redundant Batch Processed Data
  • 6. BatchSubscribers Thor ECL – The Data Flow Oriented Programming Language Raw data from several sources Reporting ECL Batch reporting requests ROXIE Batch reporting requests • An easy to use, data-centric programming language optimized for large-scale data management and query processing • Highly efficient — automatically distributes workload across all nodes • 80% more efficient than C++, Java and SQL — 1/3 reduction in programmer time to maintain/enhance existing applications • Benchmark against SQL (5 times more efficient) for code generation • Automatic parallelization and synchronization of sequential algorithms for parallel and distributed processing • Large library of built-in modules to handle common data manipulation tasks Declarative programming language … powerful, extensible, implicitly parallel, maintainable, complete and homogeneous
  • 7. 01 JOE CHAMBERS 02 JOHN SMITH Scalability: ECL Is Inherently — and Extremely — Parallel FEATURES • At the foundation level ECL works on data flows • Imagine data is flowing through a pipe, ECL can simultaneously execute operations on different parts of the pipe • One-to-many pipelines are constructed based on achievable parallelism and available resources • Laziness ensures that execution is only done as needed ds := somedata; transformed := TRANSFORM(ds, UPPER(NAME)); Data Partitioned Parallel Processing 01 joe chambers 02 john smith …. …. 1001 alex shaw 1002 peter king …. …. direction of data flow TRANSFORM Next Operation… PIPELINE 1 – NODE 1 1002 PETER KING 1001 ALEX SHAW direction of data flow TRANSFORM Next Operation… PIPELINE CC – NODE 2 Original somedata dataset
  • 8. The HPCC stack (STRIKE) 8 Data Connect Analytics Tools Common Programming Language Data Science Portal Cleaning MDM Dashboard Creator Workflow Builder ECL Thor ROXIEInterlok Profiling Normalization Predictive Analysis Business Intelligence Attribute Creation Relationship Analysis SALT KEL
  • 9. Master Data Management with SALT 9 From disparate data, to clustering, to showing relationships
  • 10. SALT Enables Content Disambiguation to Increase Productivity 10 • The acronym stands for “Scalable Automated Linking Technology” • Entity disambiguation using Inference Techniques • Templates based ECL code generator • Provides for automated data profiling, parsing, cleansing, normalization and standardization • Sophisticated specificity and relatives based linking and clustering 42 Lines of SALT 3,980 Lines of ECL 482,410 Lines of C++ Data Sources StandardizationNormalizationCleansingParsingProfiling 1. Data Preparation Processes (ETL) 2. Record Linkage Process Matching Weights & Threshold Computation Record Match Decision Linked Data File Linking Iterations Weight Assignment & Record Comparison Blocking/ Searching Add’l Data Ingest
  • 11. Relationship Analysis With KEL 11 KEL — an abstraction for network/graph processing • Declarative model: describe what things are, rather than how to execute • High level: vertices and edges are first class citizens • A single model to describe graphs and queries • Leverages Thor for heavy lifting and ROXIE for real-time analytics • Compiles into ECL (and ECL compiles into C++, which compiles into assembler)
  • 12. The Data Science Portal 12
  • 14. Selected use cases Introduction to HPCC Systems14
  • 15. LexisNexis Legal & Professional 15 Fast insight into case law THE CHALLENGE 100+ million documents Entity identification and resolution Document and topic classification Near real-time feedback
  • 16. THE SOLUTION LexisNexis Legal & Professional 16 • Generation 2 entity recognition employs HPCC PARSE(…) function and pattern rules • More entities recognized • Faster development • Faster operation • SALT based entity resolution • Custom resolution for citation entities • Case law and statute reference • References can be anaphoric (like infra) • Parallels (same case in more than one book) • Active learning used to extend classification THE OUTCOME: Two enormous benefits:  Huge lift on entity resolution and document classification because of SALT  Ahead of customers in terms of performance and maintaining currency of data because of the rapid big data processing capabilities
  • 17. SciVal 17 Fast insight into evidence based research data Scopus data covering 21,000 titles from 5,000 publishers that’s updated weekly Data from 4,600 research institutions and 220 countries Real-time user specific data slicing THE CHALLENGE
  • 18. THE SOLUTION SciVal 18 • Clean, standardize and build Analytics Queries (HPCC Thor) from 40+ TB of data on a weekly basis • Support rapid query interface to support both ad hoc queries and canned queries (HPCC ROXIE) THE OUTCOME: Two enormous benefits:  Users can customize their own visualizations, which are generated in just a few seconds  HPCC Systems works in 2 modes: 1. offline crunching of huge, pre-defined requests 2. smaller calculations in real- time on customer generated data slices
  • 19. Smart Hard Hat Ecosystem 19 4,000 workers die and millions injured annually while working on the industrial floor Very high cost for maintaining safety for businesses THE CHALLENGE
  • 20. THE SOLUTION Smart Hard Hat Ecosystem 20 • Equip workers’ hats with smart sensor technology • Central real-time processing of (high volume) information with real-time alerting capability (HPCC Systems) • Customizable dashboards, rules framework and data workflow frameworks (HPCC Systems) • Predictive modeling and analytics (HPCC Systems) THE OUTCOME: Produced an industrial wearable that uses IoT and wireless communications systems to protect and empower industrial workers.
  • 21. Driver Behavior with Smart Telematics 21 High cost of insurance High car accident rates Lack of tools to analyze driver behavior THE CHALLENGE
  • 22. THE SOLUTION Driver Behavior with Smart Telematics 22 • Telematics smart phone application • Central system to collect (very large) data and perform analytics (HPCC Systems) • Journey based feedback to all drivers to advice and correct behavior (HPCC Systems) • Insurance enrollment to reduce premiums THE OUTCOME:  Recommend corrections to driver behavior that would avoid accidents  Reduce overall Insurance costs  Correlate information from drivers data traversing the same path to create an understanding of predictable actions  Examples include periods of traffic congestion, problem areas in the path and hazard detection
  • 23. Contextual Marketing 23 Understanding an individual customer’s behavior based on past actions Technical problem • Huge volumes of data based on observed cell phone Wi-Fi • Apply advanced machine learning techniques THE CHALLENGE
  • 24. THE SOLUTION Contextual Marketing 24 • Central analytics system to collect and analyze data (HPCC Systems) • Leverage parallel algorithms to perform analytics on large quantities of data (HPCC Systems) THE OUTCOME: Created a platform to process any location specific telecom data that can be analyzed rapidly to gauge consumer behavior and in turn help drive context-based marketing
  • 25. Predict Passenger Volumes in Airports 25 How to interpret 100’s millions of location points while adjusting to flight schedule changes Complex clustering algorithm requirements Understand passenger behaviors and interaction of local areas of activity THE CHALLENGE
  • 26. THE SOLUTION Predict Passenger Volumes in Airports 26 HPCC used to solve Big Data challenges • Raw data to refined data • Clustering analysis • Forecasting THE OUTCOME: Better passenger experience and better airport planning
  • 27. • Portal: http://hpccsystems.com • ECL Language Reference: https://hpccsystems.com/ecl-language-reference • SALT: https://hpccsystems.com/enterprise-services/purchase-required-modules/SALT • KEL: https://hpccsystems.com/download/free-modules/kel-lite • Machine Learning: http://hpccsystems.com/ml • Online Training: http://learn.lexisnexis.com/hpcc • HPCC Systems Blog: http://hpccsystems.com/blog • HPCC Systems Wiki & Red Book: https://wiki.hpccsystems.com • Our GitHub portal: https://github.com/hpcc-systems • Community Forums: http://hpccsystems.com/bb • Case Studies: https://hpccsystems.com/resources/case-studies Resources
  • 28. Questions?Dr. Flavio Villanustre VP Technology, LexisNexis® Risk Solutions Flavio.Villanustre@lexisnexis.com