VelociData offers big data operations appliances that combine FPGAs, GPUs, and CPUs to enable high-speed parallel data processing. This allows VelociData to accelerate common data transformation and quality tasks by several orders of magnitude compared to conventional approaches. Examples included accelerating lookups from 3000 to 600,000 records per second. VelociData appliances can offload bottlenecks from ETL servers, mainframes, Hadoop, and data warehouses to improve overall performance. The presentation demonstrated how VelociData solutions provide 100x or greater acceleration through massively parallel hardware architectures.
4. Mission
! Reveal the essential characteristics of enterprise software,
good and bad
! Provide a forum for detailed analysis of today s innovative
technologies
! Give vendors a chance to explain their product to savvy
analysts
! Allow audience members to pose serious questions... and get
answers!
Twitter Tag: #briefr
The Briefing Room
5. Topics
This Month: INNOVATORS
January: ANALYTICS
February: BIG DATA
2014 Editorial Calendar at
www.insideanalysis.com/webcasts/the-briefing-room
Twitter Tag: #briefr
The Briefing Room
6. Data Discovery & Visualization
INNOVATORS
Twitter Tag: #briefr
The Briefing Room
7. Analyst: Robin Bloor
Robin Bloor is
Chief Analyst at
The Bloor Group
robin.bloor@bloorgroup.com
Twitter Tag: #briefr
The Briefing Room
8. VelociData
! VelociData offers purpose-built big data operations
appliances
! Its solutions combine field-programmable gate arrays
(FPGAs), graphics processing units (GPUs) and central
processing units (CPUs) to enable high speed parallelism
! VelociData can improve data transformation and data
quality performance by several orders of magnitude
Twitter Tag: #briefr
The Briefing Room
9. Guests: Ron Indeck and Chris O’Malley
Ron Indeck is President,
CTO and Founder
of VelociData
Chris O’Malley is
CEO of VelociData
Twitter Tag: #briefr
The Briefing Room
10. VelociData
Solving the Need for Speed in Big DataOps
The Bloor Group – December 10, 2013
Fall 2013
10
www.velocidata.com
@velocidat
tel.: 314.785.0601
www.velocidata.com
a
info@velocidata.com
info@velocidata.com
11. Dr. Ronald Indeck – Founder and President, VelociData
• Founder and CTO, Exegy
• Former Professor, Washington University
• Das Family Distinguished Professor
• Director, Center for Security Technologies
• Former President, Institute of Electrical & Electronics
Engineers (IEEE) Magnetics Society
• Past Recipient Bar Association Inventor of the Year
11
12. Five Critical Success Factors for Leveraging Data
1. Don’t ignore data ingest and transformation
2. Data Integration speed and cost really count
3. Hadoop alone does not solve the problem
4. VelociData eliminates data ingest bottlenecks
5. Big Data project risks can be mitigated effectively
12
www.velocidata.com
info@velocidata.com
13. Why Data is Breaking the Seams of Conventional Options
Competitive advantage is achieved in seizing the opportunity presented in transient
business moments; this is creating a crisis between the growth of data sources and
the relentless quest for faster insights
• Volume: Data volume growing exponentially at 55% annually
• Variety: Must harness numerous new data sources
• Velocity: Reconcile data moving at differing speeds; batch, streaming, archived
These factors are compounded by Hadoop that offers data management at ~80% less cost
than conventional approaches, justifying storage of everything over longer periods of
time; this is spawning business ideas for monetizing the use of data creating use cases
requiring massive acceleration of data operations that must handle the scale and
complexity of the 3Vs
Following conventional best practices no longer satisfies critical business
applications
CSF #1: Don’t ignore data ingest and transformation
13
www.velocidata.com
info@velocidata.com
14. Complexity
Cost
• high volume (e.g., 10M+ row, densely populated tables)
• high growth (e.g., >60% annually)
• multiple varieties and sources (structured and unstructured)
• high velocity (e.g., data available in less than an hour)
Scalability
Conventional options for improving data operations
performance under the following requirements:
Performance
What are Conventional Options for Accelerating DataOps?
Add cores to existing ETL processes
Add MIPS to existing IBM mainframe data integration jobs
Push down optimization (ELT)
Hadoop (ELT)
Entirely new engineered system platform
CSF #2: Data integration speed and cost really count
CSF #3: Hadoop alone doesn’t solve the problem
14
www.velocidata.com
info@velocidata.com
15. VelociData Solution Palette
VelociData Suites
VelociData Solutions
Lookup and Replace
Examples
Conventional
VelociData
(records/second)
(records/second)
Data enrichment by populating fields from a master
file
<3000
600,000
500
700,000
XML à Fixed; Binary à Char
1000-2000
800,000
2013-01-02 à 01/02/2013
1000-3000
800,000
Cardio Pulmonologist à CP
Type Conversions
Format Conversions
Rearrange, add, drop, or resize fields to change
layouts
1000
650,000
Surrogate Key
Generation
Hash multiple field values into a unique pseudo-key
3000
> 1,000,000
Generate MD5 or SHA hash keys
3000
> 1,000,000
Data Masking
Data Transform
Obfuscate data for non-production uses: Persistent
or Dynamic; Format preserving encryption; AES-256
500-1000
> 1,000,000
600
400,000
Validate a value based on a list of acceptable values
(e.g., all states in the US; all countries in the world)
1000-3000
750,000
Validates based on patterns such as emails, dates,
phone numbers, …
1000-3000
> 1,000,000
3000
> 1,000,000
200
> 200,000
Standardization, verification, and cleansing
USPS Address Processing (CASS certification in process)
Data Quality
Domain Data Validation
Field Validation
Data type validation and bounds checking
Data Platform
Offload
Mainframe Data Offload
Copybook parsing & data layout discovery; EBCDIC,
COMP, COMP-3, … à ASCII, Integer, Float,…
Results are system dependent but data intended to provide magnitude comparison
15
CSF #4: VelociData eliminates data ingest bottlenecks
www.velocidata.com
info@velocidata.com
16. The New World Data Challenges Being Solved
• Credit card company reduces MIPS and improves performance to
integrate historical and fresh data into Hadoop analytics process by
processing 10 million records per minute
• Financial processing network masks 5 million fields per second of
production data to sell opportunity information to retailers
• To enable customer support for a health benefits provider by
shortening a data integration process from 16 hours to 45 seconds
• Property casualty company shortens a daily task of processing 450
million records from 5 hours to less than 1 hour
• Retailer now processes xml data to integrate 360 degree customer
data from in-store, on-line, and mobile sources in real time
CSF #5: Big Data project risks can be mitigated effectively
16
www.velocidata.com
info@velocidata.com
17. VelociData: Continuous Innovation
• 3Q13
• Format Preserving Encryption and Data Masking
• Extensive Mainframe Data Conversion
• Extensive XML Processing
• 4Q13
• Expanded Hashing and Key Generation Options
• Additional Mainframe Record Types
• Scalable Deployment Management
17
www.velocidata.com
info@velocidata.com
18. Let’s Start the Conversation Now
For more information visit: http://velocidata.com
Helpful Resources:
Alternatives for Data Integration: http://velocidata.com/our-solution
Industry Analyst Research Reports: http://velocidata.com/resources
Data Ops – Meeting Big Data Organizational Challenges: http://velocidata.com/blog
Join us on social media:
Twitter: @VelociData
LinkedIn: http://www.linkedin.com/company/velocidata?trk=company_name
Google+: https://plus.google.com/112063174918659483670/posts
Phone: +1-314-785-0601
E-Mail: rindeck@VelociData.com / info@VelociData.com
We will send a follow-up email containing this presentation and links to contact us
18
www.velocidata.com
info@velocidata.com
20. How We Achieve Orders of Magnitude in Acceleration
VelociData Big Data Operations Appliance
• Purpose built solutions that combine a mix of software, firmware, and
massively parallel hardware to provide acceleration often approaching wirespeeds
• Heterogeneous compute environment that includes FPGAs, GPUs, and CPUs to
offer a level of internal parallelism that can dramatically outperform software
on general purpose computers
• Business Micro Supercomputer in a 4U rack form factor
20
www.velocidata.com
info@velocidata.com
21. Business Value for Most Architectures
CSV
XML
Big Data Operations Appliance
to Maximize Data
Transformation Acceleration to
Wire Speed
zOS Data
RDBMS
Wire Rate
Transformations
• Normalize
• Encrypt/Mask
• Cleanse
• Enrich
Social Media
• Hadoop
• ETL Server
• Data Warehouse
• Database Appliances
• BI Tools
• Downstream zOS Process
• Cloud
Sensor
Hadoop
21
www.velocidata.com
info@velocidata.com
22. Platform Processes Offloaded to VelociData
Wire-rate transformations – purpose-built for better price performance
VelociData
feeds Hadoop
pre-processed,
quality data for
real-time BI
efforts
Mainframe
Too expensive to
keep
adding mainframe
MIPS?
Hadoop
Are self-service business
analytics users frustrated
with the time required to
transform unstructured
and legacy data into
something useful for
decision making?
Seamlessly
offload to
VelociData the
heavy lifting
ETL/ELT
processes from
Ab Initio, IBM,
and Informatica
MPP Platforms (Teradata, Netezza)
Is using the MPP Platform for ELT and
Push Down Optimization not an optimal
use of resources?
ETL Server
ETL server having trouble keeping
up with exploding data growth?
22
www.velocidata.com
info@velocidata.com
23. Common ETL Bottlenecks
Extract
Transform
Load
ETL Server
CSV
Lookup & replace
Field validation: datatype
Mainframe
validation
Candidates for
Acceleration
Field validation: bounds checking
Aggregation
XML
USPS address standardization
Business rules
RDBMS
Entity resolution
Exception / error handling
Social Media
Primary RDBMS
Sensor
Hadoop
Staging
DB
www.velocidata.com
info@velocidata.com
24. ETL Processes Offloaded to VelociData
Extract
Transform
Keep Existing Input Interfaces
Load
Accelerate Bottlenecks
at Wire Speed
Reduce ETL Server
Workload
CSV
Faster Total Processing
Time
Mainframe
ETL Server
Lookup & replace
XML
Aggregation
Field validation: datatype
Business rules
validation
RDBMS
Primary
RDBMS
Entity resolution
Field validation: bounds checking
USPS address standardization
Exception / error
handling
Social Media
Sensor
Hadoop
Staging
DB
24
www.velocidata.com
info@velocidata.com
28. Disruption on Disruption
u We
are no longer certain
that the pattern still holds
u We used to encounter new
technologies that were 10x
because of Moore’s Law
u Now we encounter new
technologies that are 100x
or even 1000x
u This is not because of
Moore’s Law but because of
parallelism
29. Parallelism Will Become the Norm
u This is not just about
software
u It is also about hardware
architectures
u But it affects all software
u Eventually everything will
execute in parallel
u Everything will go much
faster
30. CPUs, GPUs and FPGAs
u CPUs, GPUs and FPGAs are
commodities
u They can be harnessed to
deliver extreme
parallelism on a single
server
u The use of such chips can
deliver acceleration above
100x for some applications
31. The Memory Cascade
u On chip speed v RAM
• L1(32K) = 100x
• L2(246K) = 30x
• L3(8-20Mb) = 8.6x
u RAM v SSD
• RAM = 300x
u SSD v Disk
• SSD = 10x
33. u Can
one VelociData Appliance serve many
applications?
u What
of data cleansing functionality (e.g.,
cleansing rules, deduplication, etc.)?
u Please
detail.
explain wire-speed in a little more
34. u How
long does it take to implement and
what is the process? Please describe.
u With
Hadoop, what are the possibilities?
u What
does the roadmap look like?
36. Upcoming Topics
This Month: INNOVATORS
January: ANALYTICS
February: BIG DATA
2014 Editorial Calendar at
www.insideanalysis.com/webcasts/the-briefing-room
www.insideanalysis.com
Twitter Tag: #briefr
The Briefing Room