SlideShare a Scribd company logo
1 of 37
Download to read offline
Grab some coffee and enjoy
the pre-show banter before
the top of the hour!
Five Critical Success Factors for Big Data and Traditional BI

The Briefing Room
Welcome

Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com

Twitter Tag: #briefr

The Briefing Room
Mission

!   Reveal the essential characteristics of enterprise software,
good and bad
!   Provide a forum for detailed analysis of today s innovative
technologies
!   Give vendors a chance to explain their product to savvy
analysts
!   Allow audience members to pose serious questions... and get
answers!

Twitter Tag: #briefr

The Briefing Room
Topics

This Month: INNOVATORS
January: ANALYTICS
February: BIG DATA
2014 Editorial Calendar at

www.insideanalysis.com/webcasts/the-briefing-room

Twitter Tag: #briefr

The Briefing Room
Data Discovery & Visualization

INNOVATORS
Twitter Tag: #briefr

The Briefing Room
Analyst: Robin Bloor

Robin Bloor is
Chief Analyst at
The Bloor Group	
	

robin.bloor@bloorgroup.com

Twitter Tag: #briefr

The Briefing Room
VelociData
! VelociData offers purpose-built big data operations
appliances
!   Its solutions combine field-programmable gate arrays
(FPGAs), graphics processing units (GPUs) and central
processing units (CPUs) to enable high speed parallelism
! VelociData can improve data transformation and data
quality performance by several orders of magnitude

Twitter Tag: #briefr

The Briefing Room
Guests: Ron Indeck and Chris O’Malley
Ron Indeck is President,
CTO and Founder
of VelociData

Chris O’Malley is
CEO of VelociData

Twitter Tag: #briefr

The Briefing Room
VelociData

Solving the Need for Speed in Big DataOps

The Bloor Group – December 10, 2013	

Fall 2013

10

www.velocidata.com
@velocidat
tel.: 314.785.0601
www.velocidata.com
a

info@velocidata.com
info@velocidata.com
Dr. Ronald Indeck – Founder and President, VelociData
•  Founder and CTO, Exegy
•  Former Professor, Washington University
•  Das Family Distinguished Professor
•  Director, Center for Security Technologies
•  Former President, Institute of Electrical & Electronics
Engineers (IEEE) Magnetics Society
•  Past Recipient Bar Association Inventor of the Year
11
Five Critical Success Factors for Leveraging Data

1.  Don’t ignore data ingest and transformation
2.  Data Integration speed and cost really count
3.  Hadoop alone does not solve the problem
4.  VelociData eliminates data ingest bottlenecks
5.  Big Data project risks can be mitigated effectively

12

www.velocidata.com

info@velocidata.com
Why Data is Breaking the Seams of Conventional Options
Competitive advantage is achieved in seizing the opportunity presented in transient
business moments; this is creating a crisis between the growth of data sources and
the relentless quest for faster insights
•  Volume: Data volume growing exponentially at 55% annually
•  Variety: Must harness numerous new data sources
•  Velocity: Reconcile data moving at differing speeds; batch, streaming, archived
These factors are compounded by Hadoop that offers data management at ~80% less cost
than conventional approaches, justifying storage of everything over longer periods of
time; this is spawning business ideas for monetizing the use of data creating use cases
requiring massive acceleration of data operations that must handle the scale and
complexity of the 3Vs
Following conventional best practices no longer satisfies critical business
applications

CSF #1: Don’t ignore data ingest and transformation
13

www.velocidata.com

info@velocidata.com
Complexity

Cost

•  high volume (e.g., 10M+ row, densely populated tables)
•  high growth (e.g., >60% annually)
•  multiple varieties and sources (structured and unstructured)
•  high velocity (e.g., data available in less than an hour)

Scalability

Conventional options for improving data operations
performance under the following requirements:

Performance

What are Conventional Options for Accelerating DataOps?

Add cores to existing ETL processes
Add MIPS to existing IBM mainframe data integration jobs
Push down optimization (ELT)
Hadoop (ELT)
Entirely new engineered system platform
CSF #2: Data integration speed and cost really count
CSF #3: Hadoop alone doesn’t solve the problem
14

www.velocidata.com

info@velocidata.com
VelociData Solution Palette
VelociData Suites

VelociData Solutions

Lookup and Replace

Examples

Conventional

VelociData

(records/second)

(records/second)

Data enrichment by populating fields from a master
file

<3000

600,000

500

700,000

XML à Fixed; Binary à Char

1000-2000

800,000

2013-01-02 à 01/02/2013

1000-3000

800,000

Cardio Pulmonologist à CP

Type Conversions
Format Conversions

Rearrange, add, drop, or resize fields to change
layouts

1000

650,000

Surrogate Key
Generation

Hash multiple field values into a unique pseudo-key

3000

> 1,000,000

Generate MD5 or SHA hash keys

3000

> 1,000,000

Data Masking

Data Transform

Obfuscate data for non-production uses: Persistent
or Dynamic; Format preserving encryption; AES-256

500-1000

> 1,000,000

600

400,000

Validate a value based on a list of acceptable values
(e.g., all states in the US; all countries in the world)

1000-3000

750,000

Validates based on patterns such as emails, dates,
phone numbers, …

1000-3000

> 1,000,000

3000

> 1,000,000

200

> 200,000

Standardization, verification, and cleansing

USPS Address Processing (CASS certification in process)

Data Quality

Domain Data Validation

Field Validation

Data type validation and bounds checking

Data Platform
Offload

Mainframe Data Offload

Copybook parsing & data layout discovery; EBCDIC,
COMP, COMP-3, … à ASCII, Integer, Float,…

Results are system dependent but data intended to provide magnitude comparison
15
CSF #4: VelociData eliminates data ingest bottlenecks
	


www.velocidata.com

info@velocidata.com
The New World Data Challenges Being Solved
•  Credit card company reduces MIPS and improves performance to
integrate historical and fresh data into Hadoop analytics process by
processing 10 million records per minute
•  Financial processing network masks 5 million fields per second of
production data to sell opportunity information to retailers
•  To enable customer support for a health benefits provider by
shortening a data integration process from 16 hours to 45 seconds
•  Property casualty company shortens a daily task of processing 450
million records from 5 hours to less than 1 hour
•  Retailer now processes xml data to integrate 360 degree customer
data from in-store, on-line, and mobile sources in real time
CSF #5: Big Data project risks can be mitigated effectively
16

www.velocidata.com

info@velocidata.com
VelociData: Continuous Innovation
• 3Q13
• Format Preserving Encryption and Data Masking
• Extensive Mainframe Data Conversion
• Extensive XML Processing
• 4Q13
• Expanded Hashing and Key Generation Options
• Additional Mainframe Record Types
• Scalable Deployment Management

17

www.velocidata.com

info@velocidata.com
Let’s Start the Conversation Now

For more information visit: http://velocidata.com
Helpful Resources:
Alternatives for Data Integration: http://velocidata.com/our-solution
Industry Analyst Research Reports: http://velocidata.com/resources
Data Ops – Meeting Big Data Organizational Challenges: http://velocidata.com/blog
Join us on social media:
Twitter: @VelociData
LinkedIn: http://www.linkedin.com/company/velocidata?trk=company_name
Google+: https://plus.google.com/112063174918659483670/posts
Phone: +1-314-785-0601
E-Mail: rindeck@VelociData.com / info@VelociData.com
We will send a follow-up email containing this presentation and links to contact us

18

www.velocidata.com

info@velocidata.com
Questions?

19

www.velocidata.com

info@velocidata.com
How We Achieve Orders of Magnitude in Acceleration
VelociData Big Data Operations Appliance
•  Purpose built solutions that combine a mix of software, firmware, and
massively parallel hardware to provide acceleration often approaching wirespeeds
•  Heterogeneous compute environment that includes FPGAs, GPUs, and CPUs to
offer a level of internal parallelism that can dramatically outperform software
on general purpose computers
•  Business Micro Supercomputer in a 4U rack form factor

20

www.velocidata.com

info@velocidata.com
Business Value for Most Architectures
CSV	


XML	


Big Data Operations Appliance
to Maximize Data
Transformation Acceleration to
Wire Speed	


zOS Data	


RDBMS	


Wire Rate
Transformations
•  Normalize
•  Encrypt/Mask
•  Cleanse
•  Enrich

Social Media	


•  Hadoop
•  ETL Server
•  Data Warehouse
•  Database Appliances
•  BI Tools
•  Downstream zOS Process
•  Cloud

Sensor	


Hadoop	


21

www.velocidata.com

info@velocidata.com
Platform Processes Offloaded to VelociData
Wire-rate transformations – purpose-built for better price performance

VelociData
feeds Hadoop
pre-processed,
quality data for
real-time BI
efforts

Mainframe
Too expensive to
keep
adding mainframe
MIPS?

Hadoop
Are self-service business
analytics users frustrated
with the time required to
transform unstructured
and legacy data into
something useful for
decision making?

Seamlessly
offload to
VelociData the
heavy lifting
ETL/ELT
processes from
Ab Initio, IBM,
and Informatica

MPP Platforms (Teradata, Netezza)
Is using the MPP Platform for ELT and
Push Down Optimization not an optimal
use of resources?

ETL Server
ETL server having trouble keeping
up with exploding data growth?

22

www.velocidata.com

info@velocidata.com
Common ETL Bottlenecks
Extract

Transform

Load

ETL Server
CSV	


Lookup & replace
Field validation: datatype

Mainframe	


validation

Candidates for
Acceleration

Field validation: bounds checking
Aggregation
XML	


USPS address standardization
Business rules

RDBMS	


Entity resolution
Exception / error handling

Social Media	


Primary RDBMS
Sensor	


Hadoop	


Staging
DB

www.velocidata.com

info@velocidata.com
ETL Processes Offloaded to VelociData
Extract

Transform

Keep Existing Input Interfaces	


Load

Accelerate Bottlenecks
at Wire Speed	

Reduce ETL Server
Workload	


CSV	


Faster Total Processing
Time	


Mainframe	


ETL Server
Lookup & replace

XML	


Aggregation

Field validation: datatype

Business rules

validation

RDBMS	


Primary
RDBMS

Entity resolution

Field validation: bounds checking
USPS address standardization

Exception / error
handling

Social Media	


Sensor	


Hadoop	


Staging
DB
24

www.velocidata.com

info@velocidata.com
Perceptions & Questions

Analyst:
Robin Bloor

Twitter Tag: #briefr

The Briefing Room
Technology Evolution (Bloor Curve)
Disruption on Disruption
u  We

are no longer certain
that the pattern still holds
u  We used to encounter new
technologies that were 10x
because of Moore’s Law
u  Now we encounter new
technologies that are 100x
or even 1000x
u  This is not because of
Moore’s Law but because of
parallelism
Parallelism Will Become the Norm

u  This is not just about
software
u  It is also about hardware
architectures
u  But it affects all software
u  Eventually everything will
execute in parallel
u  Everything will go much
faster
CPUs, GPUs and FPGAs

u  CPUs, GPUs and FPGAs are
commodities
u  They can be harnessed to
deliver extreme
parallelism on a single
server
u  The use of such chips can
deliver acceleration above
100x for some applications
The Memory Cascade
u  On chip speed v RAM
•  L1(32K) = 100x
•  L2(246K) = 30x
•  L3(8-20Mb) = 8.6x
u  RAM v SSD
•  RAM = 300x
u  SSD v Disk
•  SSD = 10x
Going Forward

The old limitations
are no longer

SO LIMITING
u  Can

one VelociData Appliance serve many
applications?

u  What

of data cleansing functionality (e.g.,
cleansing rules, deduplication, etc.)?

u  Please

detail.

explain wire-speed in a little more
u  How

long does it take to implement and
what is the process? Please describe.

u  With

Hadoop, what are the possibilities?

u  What

does the roadmap look like?
Twitter Tag: #briefr

The Briefing Room
Upcoming Topics

This Month: INNOVATORS
January: ANALYTICS
February: BIG DATA
2014 Editorial Calendar at

www.insideanalysis.com/webcasts/the-briefing-room

www.insideanalysis.com

Twitter Tag: #briefr

The Briefing Room
Thank You
for Your
Attention

Twitter Tag: #briefr

The Briefing Room

More Related Content

What's hot

Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Publicis Sapient Engineering
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
DataWorks Summit
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
Caserta
 
Optimizing the
 Data Supply Chain
 for Data Science
Optimizing the
 Data Supply Chain
 for Data ScienceOptimizing the
 Data Supply Chain
 for Data Science
Optimizing the
 Data Supply Chain
 for Data Science
Vital.AI
 
Lightning Talk: Get Even More Value from MongoDB Applications
Lightning Talk: Get Even More Value from MongoDB ApplicationsLightning Talk: Get Even More Value from MongoDB Applications
Lightning Talk: Get Even More Value from MongoDB Applications
MongoDB
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
 

What's hot (20)

Going Beyond Rows and Columns with Graph Analytics
Going Beyond Rows and Columns with Graph AnalyticsGoing Beyond Rows and Columns with Graph Analytics
Going Beyond Rows and Columns with Graph Analytics
 
Integrating Relational Databases with the Semantic Web: A Reflection
Integrating Relational Databases with the Semantic Web: A ReflectionIntegrating Relational Databases with the Semantic Web: A Reflection
Integrating Relational Databases with the Semantic Web: A Reflection
 
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
 
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Big Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBig Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and Cassasdra
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
 
BDaas- BigData as a service
BDaas- BigData as a service  BDaas- BigData as a service
BDaas- BigData as a service
 
Key Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShareKey Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShare
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Optimizing the
 Data Supply Chain
 for Data Science
Optimizing the
 Data Supply Chain
 for Data ScienceOptimizing the
 Data Supply Chain
 for Data Science
Optimizing the
 Data Supply Chain
 for Data Science
 
Lightning Talk: Get Even More Value from MongoDB Applications
Lightning Talk: Get Even More Value from MongoDB ApplicationsLightning Talk: Get Even More Value from MongoDB Applications
Lightning Talk: Get Even More Value from MongoDB Applications
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive Analytics
 
Solution architecture for big data projects
Solution architecture for big data projectsSolution architecture for big data projects
Solution architecture for big data projects
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
 

Similar to Five Critical Success Factors for Big Data and Traditional BI

Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)
Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)
Vishal Bamba
 

Similar to Five Critical Success Factors for Big Data and Traditional BI (20)

Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)
Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
J1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarJ1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan Kumar
 
Data Treatment MongoDB
Data Treatment MongoDBData Treatment MongoDB
Data Treatment MongoDB
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16
 
How Businesses use Big Data to Impact the Bottom Line
How Businesses use Big Data to Impact the Bottom LineHow Businesses use Big Data to Impact the Bottom Line
How Businesses use Big Data to Impact the Bottom Line
 
Big Data and Business Insight
Big Data and Business InsightBig Data and Business Insight
Big Data and Business Insight
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
The Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reductionThe Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reduction
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the Enterprise
 
Achieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataAchieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate Data
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data products
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?
 

More from Inside Analysis

Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
Inside Analysis
 

More from Inside Analysis (20)

An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BI
 
Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for Success
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data Letdown
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of Data
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time Analytics
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of Everything
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your Architecture
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the Risk
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data Warehouse
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave Duggal
 
Modus Operandi
Modus OperandiModus Operandi
Modus Operandi
 
Phasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyPhasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey Malafsky
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Five Critical Success Factors for Big Data and Traditional BI

  • 1. Grab some coffee and enjoy the pre-show banter before the top of the hour!
  • 2. Five Critical Success Factors for Big Data and Traditional BI The Briefing Room
  • 4. Mission !   Reveal the essential characteristics of enterprise software, good and bad !   Provide a forum for detailed analysis of today s innovative technologies !   Give vendors a chance to explain their product to savvy analysts !   Allow audience members to pose serious questions... and get answers! Twitter Tag: #briefr The Briefing Room
  • 5. Topics This Month: INNOVATORS January: ANALYTICS February: BIG DATA 2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room Twitter Tag: #briefr The Briefing Room
  • 6. Data Discovery & Visualization INNOVATORS Twitter Tag: #briefr The Briefing Room
  • 7. Analyst: Robin Bloor Robin Bloor is Chief Analyst at The Bloor Group robin.bloor@bloorgroup.com Twitter Tag: #briefr The Briefing Room
  • 8. VelociData ! VelociData offers purpose-built big data operations appliances !   Its solutions combine field-programmable gate arrays (FPGAs), graphics processing units (GPUs) and central processing units (CPUs) to enable high speed parallelism ! VelociData can improve data transformation and data quality performance by several orders of magnitude Twitter Tag: #briefr The Briefing Room
  • 9. Guests: Ron Indeck and Chris O’Malley Ron Indeck is President, CTO and Founder of VelociData Chris O’Malley is CEO of VelociData Twitter Tag: #briefr The Briefing Room
  • 10. VelociData Solving the Need for Speed in Big DataOps The Bloor Group – December 10, 2013 Fall 2013 10 www.velocidata.com @velocidat tel.: 314.785.0601 www.velocidata.com a info@velocidata.com info@velocidata.com
  • 11. Dr. Ronald Indeck – Founder and President, VelociData •  Founder and CTO, Exegy •  Former Professor, Washington University •  Das Family Distinguished Professor •  Director, Center for Security Technologies •  Former President, Institute of Electrical & Electronics Engineers (IEEE) Magnetics Society •  Past Recipient Bar Association Inventor of the Year 11
  • 12. Five Critical Success Factors for Leveraging Data 1.  Don’t ignore data ingest and transformation 2.  Data Integration speed and cost really count 3.  Hadoop alone does not solve the problem 4.  VelociData eliminates data ingest bottlenecks 5.  Big Data project risks can be mitigated effectively 12 www.velocidata.com info@velocidata.com
  • 13. Why Data is Breaking the Seams of Conventional Options Competitive advantage is achieved in seizing the opportunity presented in transient business moments; this is creating a crisis between the growth of data sources and the relentless quest for faster insights •  Volume: Data volume growing exponentially at 55% annually •  Variety: Must harness numerous new data sources •  Velocity: Reconcile data moving at differing speeds; batch, streaming, archived These factors are compounded by Hadoop that offers data management at ~80% less cost than conventional approaches, justifying storage of everything over longer periods of time; this is spawning business ideas for monetizing the use of data creating use cases requiring massive acceleration of data operations that must handle the scale and complexity of the 3Vs Following conventional best practices no longer satisfies critical business applications CSF #1: Don’t ignore data ingest and transformation 13 www.velocidata.com info@velocidata.com
  • 14. Complexity Cost •  high volume (e.g., 10M+ row, densely populated tables) •  high growth (e.g., >60% annually) •  multiple varieties and sources (structured and unstructured) •  high velocity (e.g., data available in less than an hour) Scalability Conventional options for improving data operations performance under the following requirements: Performance What are Conventional Options for Accelerating DataOps? Add cores to existing ETL processes Add MIPS to existing IBM mainframe data integration jobs Push down optimization (ELT) Hadoop (ELT) Entirely new engineered system platform CSF #2: Data integration speed and cost really count CSF #3: Hadoop alone doesn’t solve the problem 14 www.velocidata.com info@velocidata.com
  • 15. VelociData Solution Palette VelociData Suites VelociData Solutions Lookup and Replace Examples Conventional VelociData (records/second) (records/second) Data enrichment by populating fields from a master file <3000 600,000 500 700,000 XML à Fixed; Binary à Char 1000-2000 800,000 2013-01-02 à 01/02/2013 1000-3000 800,000 Cardio Pulmonologist à CP Type Conversions Format Conversions Rearrange, add, drop, or resize fields to change layouts 1000 650,000 Surrogate Key Generation Hash multiple field values into a unique pseudo-key 3000 > 1,000,000 Generate MD5 or SHA hash keys 3000 > 1,000,000 Data Masking Data Transform Obfuscate data for non-production uses: Persistent or Dynamic; Format preserving encryption; AES-256 500-1000 > 1,000,000 600 400,000 Validate a value based on a list of acceptable values (e.g., all states in the US; all countries in the world) 1000-3000 750,000 Validates based on patterns such as emails, dates, phone numbers, … 1000-3000 > 1,000,000 3000 > 1,000,000 200 > 200,000 Standardization, verification, and cleansing USPS Address Processing (CASS certification in process) Data Quality Domain Data Validation Field Validation Data type validation and bounds checking Data Platform Offload Mainframe Data Offload Copybook parsing & data layout discovery; EBCDIC, COMP, COMP-3, … à ASCII, Integer, Float,… Results are system dependent but data intended to provide magnitude comparison 15 CSF #4: VelociData eliminates data ingest bottlenecks www.velocidata.com info@velocidata.com
  • 16. The New World Data Challenges Being Solved •  Credit card company reduces MIPS and improves performance to integrate historical and fresh data into Hadoop analytics process by processing 10 million records per minute •  Financial processing network masks 5 million fields per second of production data to sell opportunity information to retailers •  To enable customer support for a health benefits provider by shortening a data integration process from 16 hours to 45 seconds •  Property casualty company shortens a daily task of processing 450 million records from 5 hours to less than 1 hour •  Retailer now processes xml data to integrate 360 degree customer data from in-store, on-line, and mobile sources in real time CSF #5: Big Data project risks can be mitigated effectively 16 www.velocidata.com info@velocidata.com
  • 17. VelociData: Continuous Innovation • 3Q13 • Format Preserving Encryption and Data Masking • Extensive Mainframe Data Conversion • Extensive XML Processing • 4Q13 • Expanded Hashing and Key Generation Options • Additional Mainframe Record Types • Scalable Deployment Management 17 www.velocidata.com info@velocidata.com
  • 18. Let’s Start the Conversation Now For more information visit: http://velocidata.com Helpful Resources: Alternatives for Data Integration: http://velocidata.com/our-solution Industry Analyst Research Reports: http://velocidata.com/resources Data Ops – Meeting Big Data Organizational Challenges: http://velocidata.com/blog Join us on social media: Twitter: @VelociData LinkedIn: http://www.linkedin.com/company/velocidata?trk=company_name Google+: https://plus.google.com/112063174918659483670/posts Phone: +1-314-785-0601 E-Mail: rindeck@VelociData.com / info@VelociData.com We will send a follow-up email containing this presentation and links to contact us 18 www.velocidata.com info@velocidata.com
  • 20. How We Achieve Orders of Magnitude in Acceleration VelociData Big Data Operations Appliance •  Purpose built solutions that combine a mix of software, firmware, and massively parallel hardware to provide acceleration often approaching wirespeeds •  Heterogeneous compute environment that includes FPGAs, GPUs, and CPUs to offer a level of internal parallelism that can dramatically outperform software on general purpose computers •  Business Micro Supercomputer in a 4U rack form factor 20 www.velocidata.com info@velocidata.com
  • 21. Business Value for Most Architectures CSV XML Big Data Operations Appliance to Maximize Data Transformation Acceleration to Wire Speed zOS Data RDBMS Wire Rate Transformations •  Normalize •  Encrypt/Mask •  Cleanse •  Enrich Social Media •  Hadoop •  ETL Server •  Data Warehouse •  Database Appliances •  BI Tools •  Downstream zOS Process •  Cloud Sensor Hadoop 21 www.velocidata.com info@velocidata.com
  • 22. Platform Processes Offloaded to VelociData Wire-rate transformations – purpose-built for better price performance VelociData feeds Hadoop pre-processed, quality data for real-time BI efforts Mainframe Too expensive to keep adding mainframe MIPS? Hadoop Are self-service business analytics users frustrated with the time required to transform unstructured and legacy data into something useful for decision making? Seamlessly offload to VelociData the heavy lifting ETL/ELT processes from Ab Initio, IBM, and Informatica MPP Platforms (Teradata, Netezza) Is using the MPP Platform for ELT and Push Down Optimization not an optimal use of resources? ETL Server ETL server having trouble keeping up with exploding data growth? 22 www.velocidata.com info@velocidata.com
  • 23. Common ETL Bottlenecks Extract Transform Load ETL Server CSV Lookup & replace Field validation: datatype Mainframe validation Candidates for Acceleration Field validation: bounds checking Aggregation XML USPS address standardization Business rules RDBMS Entity resolution Exception / error handling Social Media Primary RDBMS Sensor Hadoop Staging DB www.velocidata.com info@velocidata.com
  • 24. ETL Processes Offloaded to VelociData Extract Transform Keep Existing Input Interfaces Load Accelerate Bottlenecks at Wire Speed Reduce ETL Server Workload CSV Faster Total Processing Time Mainframe ETL Server Lookup & replace XML Aggregation Field validation: datatype Business rules validation RDBMS Primary RDBMS Entity resolution Field validation: bounds checking USPS address standardization Exception / error handling Social Media Sensor Hadoop Staging DB 24 www.velocidata.com info@velocidata.com
  • 25. Perceptions & Questions Analyst: Robin Bloor Twitter Tag: #briefr The Briefing Room
  • 26.
  • 28. Disruption on Disruption u  We are no longer certain that the pattern still holds u  We used to encounter new technologies that were 10x because of Moore’s Law u  Now we encounter new technologies that are 100x or even 1000x u  This is not because of Moore’s Law but because of parallelism
  • 29. Parallelism Will Become the Norm u  This is not just about software u  It is also about hardware architectures u  But it affects all software u  Eventually everything will execute in parallel u  Everything will go much faster
  • 30. CPUs, GPUs and FPGAs u  CPUs, GPUs and FPGAs are commodities u  They can be harnessed to deliver extreme parallelism on a single server u  The use of such chips can deliver acceleration above 100x for some applications
  • 31. The Memory Cascade u  On chip speed v RAM •  L1(32K) = 100x •  L2(246K) = 30x •  L3(8-20Mb) = 8.6x u  RAM v SSD •  RAM = 300x u  SSD v Disk •  SSD = 10x
  • 32. Going Forward The old limitations are no longer SO LIMITING
  • 33. u  Can one VelociData Appliance serve many applications? u  What of data cleansing functionality (e.g., cleansing rules, deduplication, etc.)? u  Please detail. explain wire-speed in a little more
  • 34. u  How long does it take to implement and what is the process? Please describe. u  With Hadoop, what are the possibilities? u  What does the roadmap look like?
  • 35. Twitter Tag: #briefr The Briefing Room
  • 36. Upcoming Topics This Month: INNOVATORS January: ANALYTICS February: BIG DATA 2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room www.insideanalysis.com Twitter Tag: #briefr The Briefing Room
  • 37. Thank You for Your Attention Twitter Tag: #briefr The Briefing Room