SlideShare a Scribd company logo
Grab some 
coffee and 
enjoy the 
pre-show 
banter 
before the 
top of the 
hour!
Has Traditional MDM Finally Met Its Match? 
The Briefing Room
Twitter Tag: #briefr 
The Briefing Room 
Welcome 
Host: 
Eric Kavanagh 
eric.kavanagh@bloorgroup.com 
@eric_kavanagh
! Reveal the essential characteristics of enterprise 
software, good and bad 
! Provide a forum for detailed analysis of today’s innovative 
technologies 
! Give vendors a chance to explain their product to savvy 
analysts 
! Allow audience members to pose serious questions... and 
get answers! 
Twitter Tag: #briefr 
The Briefing Room 
Mission
This Month: INTEGRATION & DATA FLOW 
October: ANALYTIC PLATFORMS 
November: DISCOVERY & VISUALIZATION 
Twitter Tag: #briefr 
The Briefing Room 
Topics 
2014 Editorial Calendar at 
www.insideanalysis.com/webcasts/the-briefing-room
There’s a New Sheriff in Town! 
Twitter Tag: #briefr 
The Briefing Room 
Executive Summary 
• Speed and power trump the old way 
• Traditional MDM is officially archaic 
• YARN is the new fabric of MDM
Twitter Tag: #briefr 
The Briefing Room 
Analyst: Robin Bloor 
Robin Bloor is 
Chief Analyst at 
The Bloor Group 
robin.bloor@bloorgroup.com 
@robinbloor
Twitter Tag: #briefr 
The Briefing Room 
RedPoint Global 
! RedPoint Global is a data management and integrated 
marketing technology company 
! RedPoint Data Management offers solutions designed for 
master data management (MDM), collaboration and 
architecture integration 
! RedPoint Data Management for Hadoop is YARN-compliant 
and enables analysts to access and manipulate data directly 
within the Hadoop cluster
Twitter Tag: #briefr 
The Briefing Room 
Guest: George Corugedo 
George Corugedo is Chief Technology Officer & Co- 
Founder at RedPoint Global Inc. A mathematician 
and seasoned technology executive, George has 
over 20 years of business and technical expertise. 
As co-founder and CTO of RedPoint Global, George 
is responsible for leading the development of the 
RedPoint Convergent Marketing Platform™. A 
former math professor, George left academia to 
co-found Accenture’s Customer Insight Practice, 
which specialized in strategic data utilization, 
analytics and customer strategy. Previous positions 
include director of client delivery at ClarityBlue, 
Inc., a provider of hosted customer intelligence 
solutions to enterprise commercial entities, and 
COO/CIO of Riscuity, a receivables management 
company specializing in the utilization of analytics 
to drive collections.
MDM for the Modern Data Architecture 
September 
2014
Purpose of MDM 
Create correct and consistent data across 
the enterprise that earns trust in information 
and acceleration of growth. 
11 © RedPoint Global Inc. 2014 Confidential
Vicious Cycle of Unmanaged Data 
1. Master 
Data Issues 
remain 
unaddressed 
or unresolved 
2. Garbage 
in/garbage 
out creates 
process 
confusion 
4. Data 
conflicts 
reinforce 
siloed 
operations 
3. Lack of 
process trust 
slows business 
momentum 
12 © RedPoint Global Inc. 2014 Confidential
13 © RedPoint Global Inc. 2014 Confidential 
© Hortonworks Inc. 2014 
A Data Architecture Under Pressure
Broad Spectrum of Benefits Across Industries 
14 © RedPoint Global Inc. 2014 Confidential
Gartner’s Nexus of Forces Making Things Worse 
15 © RedPoint Global Inc. 2014 Confidential
Business Benefits of MDM 
16 © RedPoint Global Inc. 2014 Confidential
Types of Data in a Typical Organization 
Challenges 
to 
Data 
Lake 
Approach 
• Severe 
shortage 
of 
Map 
Reduce 
skilled 
resources 
• Inconsistent 
skills 
lead 
to 
inconsistent 
results 
of 
code 
based 
solu>ons 
• Nascent 
technologies 
require 
mul>ple 
point 
solu>ons 
• Technologies 
Benefits 
of 
a 
Hadoop 
Data 
Lake 
17 © RedPoint Global Inc. 2014 Confidential 
are 
not 
enterprise 
grade 
• Some 
func>onality 
may 
not 
be 
possible 
within 
these 
frameworks 
• Data 
is 
ingested 
in 
its 
raw 
state 
regardless 
of 
format, 
structure 
or 
lack 
of 
structure 
• Raw 
data 
can 
be 
used 
and 
reused 
for 
differing 
purposes 
across 
the 
enterprise 
• Beyond 
inexpensive 
storage, 
Hadoop 
is 
an 
extremely 
power 
and 
scalable 
and 
segmentable 
computa>onal 
plaMorm 
• Master 
Data 
can 
be 
fed 
across 
the 
enterprise 
and 
deep 
analy>cs 
on 
clean 
data 
is 
immediately 
enabled
Big Data Can Become Big Information 
! Inges>on 
of 
all 
data 
available 
from 
any 
source, 
format, 
cadence, 
structure 
or 
non-­‐structure 
! ELT 
and 
data 
transforma>on, 
refinement, 
cleansing, 
comple>on, 
valida>on 
and 
standardiza>on 
! Geospa>al 
processing 
and 
geocoding 
! Data 
profiling, 
lineage 
and 
metadata 
management 
! Iden>ty 
resolu>on 
and 
persistent 
keying 
and 
en>ty 
profile 
management 
! ASribute 
source 
and 
consumer 
mapping 
18 © RedPoint Global Inc. 2014 Confidential
Data Lake Architecture for MDM 
Data 
Sources 
CRM 
ERP 
Billing 
Subscriber 
Product 
Network 
Weather 
Compete 
Manuf. 
Clickstream 
Online 
Chat 
Sensor 
Data 
Social 
Media 
Call 
Detail 
Records 
Fabrica>on 
Logs 
Sales 
Feedback 
Field 
Feedback 
Field 
Feedback 
+ 
19 © RedPoint Global Inc. 2014 Confidential
Key Functions for Master Data Management 
ETL 
& 
ELT 
Data 
Quality 
Master 
Key 
Management 
Web 
Services 
Integra>on 
20 © RedPoint Global Inc. 2014 Confidential 
Integra>on 
& 
Matching 
Process 
Automa>on 
& 
Opera>ons 
• Profiling, 
reads/writes, 
transforma>ons 
• Single 
project 
for 
all 
jobs 
• Cleanse 
data 
• Parsing, 
correc>on 
• Geo-­‐spa>al 
analysis 
• Grouping 
• Fuzzy 
match 
• Create 
keys 
• Track 
changes 
• Maintain 
matches 
over 
>me 
• Consume 
and 
publish 
• HTTP/HTTPS 
protocols 
• XML/JSON/SOAP 
formats 
• Job 
scheduling, 
monitoring, 
no>fica>ons 
• Central 
point 
of 
control 
• Meta 
Data 
Management
So How to Proceed? 
21 © RedPoint Global Inc. 2014 Confidential
Overview - What is Hadoop/Hadoop 2.0 
Hadoop 
1.0 
• All 
opera>ons 
based 
on 
Map 
Reduce 
• Intrinsic 
inconsistency 
of 
code 
based 
solu>ons 
• Highly 
skilled 
and 
expensive 
resources 
needed 
• 3rd 
party 
applica>ons 
constrained 
by 
the 
need 
to 
generate 
code 
22 © RedPoint Global Inc. 2014 Confidential 
Hadoop 
2.0 
• Introduc>on 
of 
the 
YARN: 
“a 
general-­‐purpose, 
distributed, 
applica>on 
management 
framework 
that 
supersedes 
the 
classic 
Apache 
Hadoop 
MapReduce 
framework 
for 
processing 
data 
in 
Hadoop 
clusters.” 
• Mature 
applica>ons 
can 
now 
operate 
directly 
on 
Hadoop 
• Reduce 
skill 
requirements 
and 
increased 
consistency
RedPoint Data Management on Hadoop 
Par>>oning 
AM 
/ 
Tasks 
Parallel 
Sec>on 
(UI) 
Execu>on 
AM 
/ 
Tasks 
Data 
I/O 
Key 
/ 
Split 
Analysis 
YARN 
23 © RedPoint Global Inc. 2014 Confidential 
MapReduce
Reference Hadoop Architecture 
Monitoring and Management Tools 
AMBARI 
DATA REFINEMENT 
PIG HIVE 
MAPREDUCE 
REST 
HTTP 
STREAM 
STRUCTURE 
HCATALOG 
(metadata services) 
DBs 
Fil 
esF il 
Feilse s 
NFS 
Ÿ 
24 © RedPoint Global Inc. 2014 Confidential 
Query/Visualization/ 
Reporting/Analytical 
Tools and Apps 
SOURCE 
DATA 
- Sensor Logs 
- Clickstream 
JMS 
- Flat Queue’s 
Files 
- Unstructured 
- Sentiment 
- Customer 
- Inventory 
Data Sources 
RDBMS 
EDW 
INTERACTIVE 
HIVE Server2 
LOAD 
SQOOP 
WebHDFS 
Flume 
LOAD 
SQOO P/Hive 
Web HDFS 
YARN 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
n 
HDFS 
1 Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ
RedPoint Functional Footprint 
Monitoring and Management Tools 
AMBARI 
DATA REFINEMENT 
PIG HIVE 
MAPREDUCE 
REST 
HTTP 
STREAM 
STRUCTURE 
HCATALOG 
(metadata services) 
DBs 
Fil 
esF il 
Feilse s 
NFS 
Ÿ 
25 © RedPoint Global Inc. 2014 Confidential 
Query/Visualization/ 
Reporting/Analytical 
Tools and Apps 
SOURCE 
DATA 
- Sensor Logs 
- Clickstream 
JMS 
- Flat Queue’s 
Files 
- Unstructured 
- Sentiment 
- Customer 
- Inventory 
Data Sources 
RDBMS 
EDW 
INTERACTIVE 
HIVE Server2 
LOAD 
SQOOP 
WebHDFS 
Flume 
LOAD 
SQOO P/Hive 
Web HDFS 
YARN 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
n 
HDFS 
1 Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ 
Ÿ
Sample 
MapReduce 
(small 
subset 
of 
the 
entire 
code 
which 
totals 
nearly 
150 
lines): 
public 
static 
class 
MapClass 
extends 
Mapper<WordOffset, Text, Text, IntWritable> { 
26 © RedPoint Global Inc. 2014 Confidential 
RedPoint 
Benchmarks – Project Gutenberg 
Map 
Reduce 
Pig 
private 
final 
static 
String delimiters = 
"',./<>?;:"[]{}-=_+()&*%^#$!@`~ |«»¡¢£¤¥¦©¬®¯±¶·¿"; 
private 
final 
static 
IntWritable one = new 
IntWritable(1); 
private 
Text word = new 
Text(); 
public 
void 
map(WordOffset key, Text value, Context context) 
throws 
IOException, InterruptedException { 
String line = value.toString(); 
StringTokenizer itr = new 
StringTokenizer(line, delimiters); 
while 
(itr.hasMoreTokens()) { 
word.set(itr.nextToken()); 
context.write(word, one); 
} 
} 
} 
Sample 
Pig 
script 
without 
the 
UDF: 
SET 
pig.maxCombinedSplitSize 67108864 
SET 
pig.splitCombination true 
A = LOAD 
'/testdata/pg/*/*/*'; 
B = FOREACH A GENERATE FLATTEN(TOKENIZE((chararray)$0)) AS 
C = FOREACH B GENERATE UPPER(word) AS 
word; 
D = GROUP 
C BY 
word; 
E = FOREACH D GENERATE COUNT(C) AS 
occurrences, group; 
F = ORDER 
E BY 
occurrences DESC; 
STORE F INTO 
'/user/cleonardi/pg/pig-count'; 
>150 Lines of MR Code ~50 Lines of Script Code 0 Lines of Code 
6 hours of development 3 hours of development 15 min. of development 
6 minutes runtime 15 minutes runtime 3 minutes runtime 
Extensive optimization 
needed 
User Defined Functions 
required prior to running 
script 
No tuning or optimization 
required
Data Lake Architecture for MDM 
Data 
Sources 
CRM 
ERP 
Billing 
Subscriber 
Product 
Network 
Weather 
Compete 
Manuf. 
Clickstream 
Online 
Chat 
Sensor 
Data 
Social 
Media 
Call 
Detail 
Records 
Fabrica>on 
Logs 
Sales 
Feedback 
Field 
Feedback 
Field 
Feedback 
+ 
27 © RedPoint Global Inc. 2014 Confidential
Twitter Tag: #briefr 
The Briefing Room 
Perceptions & Questions 
Analyst: 
Robin Bloor
What Can You Do With 
a Data Lake? 
Robin Bloor, Ph.D.
The Story So Far… 
The old Data Warehouse World 
(environment) is fast dying – 
giving way to a dystopian 
future dominated by alien and 
mutant data, carried by vast 
unruly data streams that flow 
rapidly into dank and murky 
data lakes. 
This is Hadoop World. 
HOW DO WE MAKE SENSE OF THIS?
The Big Data Architecture 
Filtering 
Replicating 
& Routing 
Local 
Data 
Data 
Reservoir 
(Hadoop) 
Local 
Data 
General 
Data 
Server(s) 
Local 
Data 
Specialist 
Data 
Server(s) 
Data 
Preparation 
Data Flow 
(Optimize) 
Local 
Workloads 
ETL & 
Data Virt'n 
Local 
Data 
Data Refinery and Processing Hub 
The Application 
Layer 
Data 
Streaming 
Apps 
Data 
Mart 
Trans 
Apps 
Data 
Mart 
BI 
Apps 
Data 
Mart 
Office 
Apps 
Data 
Mart 
Events 
Data Flow 
Data 
Export 
The Data 
Layer 
Applications may use the Data Hub Directly 
Streams 
IOT 
Log files 
DaaS 
Mobile 
Devices 
Desktops 
Servers 
The Cloud 
Social 
media 
Etc.
The Main Point to Note 
This is WAY more 
complicated than the 
old Data Warehouse 
world
The Governance of Data 
It’s all GOVERNANCE!! 
Data 
Reservoir 
(Hadoop) 
General 
Data 
Server(s) 
Specialist 
Data 
Server(s) 
ETL & 
Data Virt'n 
Data 
Security 
Data Life 
Cycle Mgt 
MDM & 
Business 
Glossary 
Data 
Cleansing 
System 
Management 
Local 
Workloads 
MetaData 
Management 
Performance 
Monitoring 
& Mgt 
Data 
Lineage 
Data 
Mapping 
Data 
ExtrDaacttas 
Extracts 
MetaData 
Discovery 
Service Level 
Mgt 
Corporate Data Hub
The Evolution of Hadoop 
u There were many 
components before YARN 
and Tez 
u But YARN and Tez have 
changed the picture 
u Hadoop is becoming the 
default scale-out file 
system and the OS for 
data flow
The Prognosis 
The foundation is in place for a 
comprehensive Big Data 
Information Architecture… 
But BUILDING such 
integrated systems 
will not be easy
u How does RedPoint see the role of Hadoop 
(ingest-point, ETL engines, MDM work area, 
analytical sandbox, database, etc.); some of 
these? All of these? 
u Often in the past, MDM implementations have 
proved to be disappointing. What makes RedPoint 
different given that the data environment is 
more challenging than ever? 
u Which companies/technologies do you see as 
competitive with RedPoint
u Which verticals have shown the greatest 
interest in RedPoint? 
u How does a RedPoint engagement normally pan 
out? 
u If you are intent upon doing MDM, where is it 
best to start?
Twitter Tag: #briefr 
The Briefing Room
This Month: INTEGRATION & DATA FLOW 
October: ANALYTIC PLATFORMS 
November: DISCOVERY & VISUALIZATION 
www.insideanalysis.com/webcasts/the-briefing-room 
Twitter Tag: #briefr 
The Briefing Room 
Upcoming Topics 
2014 Editorial Calendar at 
www.insideanalysis.com
Twitter Tag: #briefr 
THANK YOU 
for your 
ATTENTION! 
Opening slide image courtesy of Wikimedia Commons 
The Briefing Room

More Related Content

What's hot

The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data Analytics
Hortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
Hortonworks
 
Data Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache HadoopData Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache Hadoop
Hortonworks
 
Anil_BigData Resume
Anil_BigData ResumeAnil_BigData Resume
Anil_BigData ResumeAnil Sokhal
 
How Salesforce.com uses Hadoop
How Salesforce.com uses HadoopHow Salesforce.com uses Hadoop
How Salesforce.com uses Hadoop
Narayan Bharadwaj
 
Level Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop AccelerationLevel Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop Acceleration
Inside Analysis
 
Predicting Customer Behavior with Customer Convsrsation Modeling
Predicting Customer Behavior with Customer Convsrsation ModelingPredicting Customer Behavior with Customer Convsrsation Modeling
Predicting Customer Behavior with Customer Convsrsation Modeling
DataWorks Summit
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks
 
Trafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoopTrafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoop
Krishna-Kumar
 
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXTDriving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
DataWorks Summit
 
Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack Europe
Hortonworks
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hortonworks
 
Sourav banerjee resume
Sourav banerjee   resumeSourav banerjee   resume
Sourav banerjee resume
Sourav Banerjee
 
1 - The Case for Trafodion
1 - The Case for Trafodion1 - The Case for Trafodion
1 - The Case for Trafodion
Rohit Jain
 
Resume_bibhu_prasad_dash
Resume_bibhu_prasad_dash Resume_bibhu_prasad_dash
Resume_bibhu_prasad_dash Bibhu Dash
 

What's hot (20)

The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data Analytics
 
GauravSriastava
GauravSriastavaGauravSriastava
GauravSriastava
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Data Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache HadoopData Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache Hadoop
 
Anil_BigData Resume
Anil_BigData ResumeAnil_BigData Resume
Anil_BigData Resume
 
How Salesforce.com uses Hadoop
How Salesforce.com uses HadoopHow Salesforce.com uses Hadoop
How Salesforce.com uses Hadoop
 
HimaBindu
HimaBinduHimaBindu
HimaBindu
 
Level Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop AccelerationLevel Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop Acceleration
 
Predicting Customer Behavior with Customer Convsrsation Modeling
Predicting Customer Behavior with Customer Convsrsation ModelingPredicting Customer Behavior with Customer Convsrsation Modeling
Predicting Customer Behavior with Customer Convsrsation Modeling
 
SreenivasulaReddy
SreenivasulaReddySreenivasulaReddy
SreenivasulaReddy
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 
Trafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoopTrafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoop
 
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXTDriving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
 
Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack Europe
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Sourav banerjee resume
Sourav banerjee   resumeSourav banerjee   resume
Sourav banerjee resume
 
hadoop exp
hadoop exphadoop exp
hadoop exp
 
1 - The Case for Trafodion
1 - The Case for Trafodion1 - The Case for Trafodion
1 - The Case for Trafodion
 
Resume_bibhu_prasad_dash
Resume_bibhu_prasad_dash Resume_bibhu_prasad_dash
Resume_bibhu_prasad_dash
 
Resume_of_Vasudevan - Hadoop
Resume_of_Vasudevan - HadoopResume_of_Vasudevan - Hadoop
Resume_of_Vasudevan - Hadoop
 

Similar to Has Traditional MDM Finally Met its Match?

Hadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeHadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality Challenge
Inside Analysis
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobal
Caserta
 
Drive dataqualityatyourcompanycreateadatalake
Drive dataqualityatyourcompanycreateadatalakeDrive dataqualityatyourcompanycreateadatalake
Drive dataqualityatyourcompanycreateadatalake
The Pathway Group
 
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionYARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionDataWorks Summit
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
Edgar Alejandro Villegas
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
MapR Technologies
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
MapR Technologies
 
Big Data in Action – Real-World Solution Showcase
 Big Data in Action – Real-World Solution Showcase Big Data in Action – Real-World Solution Showcase
Big Data in Action – Real-World Solution Showcase
Inside Analysis
 
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens DoorsThe Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
Inside Analysis
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
jaxconf
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Rizaldy Ignacio
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Hortonworks
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
BigDataEverywhere
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Felicia Haggarty
 
Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台
Etu Solution
 
Informatica + Hadoop = Best of Both Worlds
Informatica + Hadoop = Best of Both WorldsInformatica + Hadoop = Best of Both Worlds
Informatica + Hadoop = Best of Both Worlds
Ahmed Tayeh
 
Big Data & SQL: The On-Ramp to Hadoop
Big Data & SQL: The On-Ramp to Hadoop Big Data & SQL: The On-Ramp to Hadoop
Big Data & SQL: The On-Ramp to Hadoop
Inside Analysis
 
How Experian increased insights with Hadoop
How Experian increased insights with HadoopHow Experian increased insights with Hadoop
How Experian increased insights with Hadoop
Precisely
 
Hadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataHadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big Data
Senturus
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
Hortonworks
 

Similar to Has Traditional MDM Finally Met its Match? (20)

Hadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeHadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality Challenge
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobal
 
Drive dataqualityatyourcompanycreateadatalake
Drive dataqualityatyourcompanycreateadatalakeDrive dataqualityatyourcompanycreateadatalake
Drive dataqualityatyourcompanycreateadatalake
 
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionYARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Big Data in Action – Real-World Solution Showcase
 Big Data in Action – Real-World Solution Showcase Big Data in Action – Real-World Solution Showcase
Big Data in Action – Real-World Solution Showcase
 
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens DoorsThe Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
 
Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台
 
Informatica + Hadoop = Best of Both Worlds
Informatica + Hadoop = Best of Both WorldsInformatica + Hadoop = Best of Both Worlds
Informatica + Hadoop = Best of Both Worlds
 
Big Data & SQL: The On-Ramp to Hadoop
Big Data & SQL: The On-Ramp to Hadoop Big Data & SQL: The On-Ramp to Hadoop
Big Data & SQL: The On-Ramp to Hadoop
 
How Experian increased insights with Hadoop
How Experian increased insights with HadoopHow Experian increased insights with Hadoop
How Experian increased insights with Hadoop
 
Hadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataHadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big Data
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
 

More from Inside Analysis

An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BI
Inside Analysis
 
Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for Success
Inside Analysis
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
Inside Analysis
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data Letdown
Inside Analysis
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
Inside Analysis
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
Inside Analysis
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of Data
Inside Analysis
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
Inside Analysis
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Inside Analysis
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of Everything
Inside Analysis
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Inside Analysis
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
Inside Analysis
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your Architecture
Inside Analysis
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the Risk
Inside Analysis
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
Inside Analysis
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data Warehouse
Inside Analysis
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
Inside Analysis
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
Inside Analysis
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave Duggal
Inside Analysis
 
Modus Operandi
Modus OperandiModus Operandi
Modus Operandi
Inside Analysis
 

More from Inside Analysis (20)

An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BI
 
Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for Success
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data Letdown
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of Data
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time Analytics
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of Everything
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your Architecture
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the Risk
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data Warehouse
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave Duggal
 
Modus Operandi
Modus OperandiModus Operandi
Modus Operandi
 

Recently uploaded

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 

Recently uploaded (20)

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 

Has Traditional MDM Finally Met its Match?

  • 1. Grab some coffee and enjoy the pre-show banter before the top of the hour!
  • 2. Has Traditional MDM Finally Met Its Match? The Briefing Room
  • 3. Twitter Tag: #briefr The Briefing Room Welcome Host: Eric Kavanagh eric.kavanagh@bloorgroup.com @eric_kavanagh
  • 4. ! Reveal the essential characteristics of enterprise software, good and bad ! Provide a forum for detailed analysis of today’s innovative technologies ! Give vendors a chance to explain their product to savvy analysts ! Allow audience members to pose serious questions... and get answers! Twitter Tag: #briefr The Briefing Room Mission
  • 5. This Month: INTEGRATION & DATA FLOW October: ANALYTIC PLATFORMS November: DISCOVERY & VISUALIZATION Twitter Tag: #briefr The Briefing Room Topics 2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
  • 6. There’s a New Sheriff in Town! Twitter Tag: #briefr The Briefing Room Executive Summary • Speed and power trump the old way • Traditional MDM is officially archaic • YARN is the new fabric of MDM
  • 7. Twitter Tag: #briefr The Briefing Room Analyst: Robin Bloor Robin Bloor is Chief Analyst at The Bloor Group robin.bloor@bloorgroup.com @robinbloor
  • 8. Twitter Tag: #briefr The Briefing Room RedPoint Global ! RedPoint Global is a data management and integrated marketing technology company ! RedPoint Data Management offers solutions designed for master data management (MDM), collaboration and architecture integration ! RedPoint Data Management for Hadoop is YARN-compliant and enables analysts to access and manipulate data directly within the Hadoop cluster
  • 9. Twitter Tag: #briefr The Briefing Room Guest: George Corugedo George Corugedo is Chief Technology Officer & Co- Founder at RedPoint Global Inc. A mathematician and seasoned technology executive, George has over 20 years of business and technical expertise. As co-founder and CTO of RedPoint Global, George is responsible for leading the development of the RedPoint Convergent Marketing Platform™. A former math professor, George left academia to co-found Accenture’s Customer Insight Practice, which specialized in strategic data utilization, analytics and customer strategy. Previous positions include director of client delivery at ClarityBlue, Inc., a provider of hosted customer intelligence solutions to enterprise commercial entities, and COO/CIO of Riscuity, a receivables management company specializing in the utilization of analytics to drive collections.
  • 10. MDM for the Modern Data Architecture September 2014
  • 11. Purpose of MDM Create correct and consistent data across the enterprise that earns trust in information and acceleration of growth. 11 © RedPoint Global Inc. 2014 Confidential
  • 12. Vicious Cycle of Unmanaged Data 1. Master Data Issues remain unaddressed or unresolved 2. Garbage in/garbage out creates process confusion 4. Data conflicts reinforce siloed operations 3. Lack of process trust slows business momentum 12 © RedPoint Global Inc. 2014 Confidential
  • 13. 13 © RedPoint Global Inc. 2014 Confidential © Hortonworks Inc. 2014 A Data Architecture Under Pressure
  • 14. Broad Spectrum of Benefits Across Industries 14 © RedPoint Global Inc. 2014 Confidential
  • 15. Gartner’s Nexus of Forces Making Things Worse 15 © RedPoint Global Inc. 2014 Confidential
  • 16. Business Benefits of MDM 16 © RedPoint Global Inc. 2014 Confidential
  • 17. Types of Data in a Typical Organization Challenges to Data Lake Approach • Severe shortage of Map Reduce skilled resources • Inconsistent skills lead to inconsistent results of code based solu>ons • Nascent technologies require mul>ple point solu>ons • Technologies Benefits of a Hadoop Data Lake 17 © RedPoint Global Inc. 2014 Confidential are not enterprise grade • Some func>onality may not be possible within these frameworks • Data is ingested in its raw state regardless of format, structure or lack of structure • Raw data can be used and reused for differing purposes across the enterprise • Beyond inexpensive storage, Hadoop is an extremely power and scalable and segmentable computa>onal plaMorm • Master Data can be fed across the enterprise and deep analy>cs on clean data is immediately enabled
  • 18. Big Data Can Become Big Information ! Inges>on of all data available from any source, format, cadence, structure or non-­‐structure ! ELT and data transforma>on, refinement, cleansing, comple>on, valida>on and standardiza>on ! Geospa>al processing and geocoding ! Data profiling, lineage and metadata management ! Iden>ty resolu>on and persistent keying and en>ty profile management ! ASribute source and consumer mapping 18 © RedPoint Global Inc. 2014 Confidential
  • 19. Data Lake Architecture for MDM Data Sources CRM ERP Billing Subscriber Product Network Weather Compete Manuf. Clickstream Online Chat Sensor Data Social Media Call Detail Records Fabrica>on Logs Sales Feedback Field Feedback Field Feedback + 19 © RedPoint Global Inc. 2014 Confidential
  • 20. Key Functions for Master Data Management ETL & ELT Data Quality Master Key Management Web Services Integra>on 20 © RedPoint Global Inc. 2014 Confidential Integra>on & Matching Process Automa>on & Opera>ons • Profiling, reads/writes, transforma>ons • Single project for all jobs • Cleanse data • Parsing, correc>on • Geo-­‐spa>al analysis • Grouping • Fuzzy match • Create keys • Track changes • Maintain matches over >me • Consume and publish • HTTP/HTTPS protocols • XML/JSON/SOAP formats • Job scheduling, monitoring, no>fica>ons • Central point of control • Meta Data Management
  • 21. So How to Proceed? 21 © RedPoint Global Inc. 2014 Confidential
  • 22. Overview - What is Hadoop/Hadoop 2.0 Hadoop 1.0 • All opera>ons based on Map Reduce • Intrinsic inconsistency of code based solu>ons • Highly skilled and expensive resources needed • 3rd party applica>ons constrained by the need to generate code 22 © RedPoint Global Inc. 2014 Confidential Hadoop 2.0 • Introduc>on of the YARN: “a general-­‐purpose, distributed, applica>on management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters.” • Mature applica>ons can now operate directly on Hadoop • Reduce skill requirements and increased consistency
  • 23. RedPoint Data Management on Hadoop Par>>oning AM / Tasks Parallel Sec>on (UI) Execu>on AM / Tasks Data I/O Key / Split Analysis YARN 23 © RedPoint Global Inc. 2014 Confidential MapReduce
  • 24. Reference Hadoop Architecture Monitoring and Management Tools AMBARI DATA REFINEMENT PIG HIVE MAPREDUCE REST HTTP STREAM STRUCTURE HCATALOG (metadata services) DBs Fil esF il Feilse s NFS Ÿ 24 © RedPoint Global Inc. 2014 Confidential Query/Visualization/ Reporting/Analytical Tools and Apps SOURCE DATA - Sensor Logs - Clickstream JMS - Flat Queue’s Files - Unstructured - Sentiment - Customer - Inventory Data Sources RDBMS EDW INTERACTIVE HIVE Server2 LOAD SQOOP WebHDFS Flume LOAD SQOO P/Hive Web HDFS YARN Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ n HDFS 1 Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ
  • 25. RedPoint Functional Footprint Monitoring and Management Tools AMBARI DATA REFINEMENT PIG HIVE MAPREDUCE REST HTTP STREAM STRUCTURE HCATALOG (metadata services) DBs Fil esF il Feilse s NFS Ÿ 25 © RedPoint Global Inc. 2014 Confidential Query/Visualization/ Reporting/Analytical Tools and Apps SOURCE DATA - Sensor Logs - Clickstream JMS - Flat Queue’s Files - Unstructured - Sentiment - Customer - Inventory Data Sources RDBMS EDW INTERACTIVE HIVE Server2 LOAD SQOOP WebHDFS Flume LOAD SQOO P/Hive Web HDFS YARN Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ n HDFS 1 Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ
  • 26. Sample MapReduce (small subset of the entire code which totals nearly 150 lines): public static class MapClass extends Mapper<WordOffset, Text, Text, IntWritable> { 26 © RedPoint Global Inc. 2014 Confidential RedPoint Benchmarks – Project Gutenberg Map Reduce Pig private final static String delimiters = "',./<>?;:"[]{}-=_+()&*%^#$!@`~ |«»¡¢£¤¥¦©¬®¯±¶·¿"; private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(WordOffset key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line, delimiters); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } Sample Pig script without the UDF: SET pig.maxCombinedSplitSize 67108864 SET pig.splitCombination true A = LOAD '/testdata/pg/*/*/*'; B = FOREACH A GENERATE FLATTEN(TOKENIZE((chararray)$0)) AS C = FOREACH B GENERATE UPPER(word) AS word; D = GROUP C BY word; E = FOREACH D GENERATE COUNT(C) AS occurrences, group; F = ORDER E BY occurrences DESC; STORE F INTO '/user/cleonardi/pg/pig-count'; >150 Lines of MR Code ~50 Lines of Script Code 0 Lines of Code 6 hours of development 3 hours of development 15 min. of development 6 minutes runtime 15 minutes runtime 3 minutes runtime Extensive optimization needed User Defined Functions required prior to running script No tuning or optimization required
  • 27. Data Lake Architecture for MDM Data Sources CRM ERP Billing Subscriber Product Network Weather Compete Manuf. Clickstream Online Chat Sensor Data Social Media Call Detail Records Fabrica>on Logs Sales Feedback Field Feedback Field Feedback + 27 © RedPoint Global Inc. 2014 Confidential
  • 28. Twitter Tag: #briefr The Briefing Room Perceptions & Questions Analyst: Robin Bloor
  • 29. What Can You Do With a Data Lake? Robin Bloor, Ph.D.
  • 30. The Story So Far… The old Data Warehouse World (environment) is fast dying – giving way to a dystopian future dominated by alien and mutant data, carried by vast unruly data streams that flow rapidly into dank and murky data lakes. This is Hadoop World. HOW DO WE MAKE SENSE OF THIS?
  • 31. The Big Data Architecture Filtering Replicating & Routing Local Data Data Reservoir (Hadoop) Local Data General Data Server(s) Local Data Specialist Data Server(s) Data Preparation Data Flow (Optimize) Local Workloads ETL & Data Virt'n Local Data Data Refinery and Processing Hub The Application Layer Data Streaming Apps Data Mart Trans Apps Data Mart BI Apps Data Mart Office Apps Data Mart Events Data Flow Data Export The Data Layer Applications may use the Data Hub Directly Streams IOT Log files DaaS Mobile Devices Desktops Servers The Cloud Social media Etc.
  • 32. The Main Point to Note This is WAY more complicated than the old Data Warehouse world
  • 33. The Governance of Data It’s all GOVERNANCE!! Data Reservoir (Hadoop) General Data Server(s) Specialist Data Server(s) ETL & Data Virt'n Data Security Data Life Cycle Mgt MDM & Business Glossary Data Cleansing System Management Local Workloads MetaData Management Performance Monitoring & Mgt Data Lineage Data Mapping Data ExtrDaacttas Extracts MetaData Discovery Service Level Mgt Corporate Data Hub
  • 34. The Evolution of Hadoop u There were many components before YARN and Tez u But YARN and Tez have changed the picture u Hadoop is becoming the default scale-out file system and the OS for data flow
  • 35. The Prognosis The foundation is in place for a comprehensive Big Data Information Architecture… But BUILDING such integrated systems will not be easy
  • 36. u How does RedPoint see the role of Hadoop (ingest-point, ETL engines, MDM work area, analytical sandbox, database, etc.); some of these? All of these? u Often in the past, MDM implementations have proved to be disappointing. What makes RedPoint different given that the data environment is more challenging than ever? u Which companies/technologies do you see as competitive with RedPoint
  • 37. u Which verticals have shown the greatest interest in RedPoint? u How does a RedPoint engagement normally pan out? u If you are intent upon doing MDM, where is it best to start?
  • 38. Twitter Tag: #briefr The Briefing Room
  • 39. This Month: INTEGRATION & DATA FLOW October: ANALYTIC PLATFORMS November: DISCOVERY & VISUALIZATION www.insideanalysis.com/webcasts/the-briefing-room Twitter Tag: #briefr The Briefing Room Upcoming Topics 2014 Editorial Calendar at www.insideanalysis.com
  • 40. Twitter Tag: #briefr THANK YOU for your ATTENTION! Opening slide image courtesy of Wikimedia Commons The Briefing Room