This presentation was delivered at the 2016 Strata + Hadoop World in San Jose. During the presentation, Ryft VP of Engineering, Pat McGarry, took a close look at five of the most common data analytics bottlenecks that hinder real-time insights. Additionally, he discussed how new hybrid FPGA/x86 computing architectures help overcome these hurdles.
2. Information—the fuel of business—is trapped
in analysis platforms built on 70-year
old architectures.
3. Real-time insights as events occur, close to the
source of data
Analysis of data from a range of IoT devices—
video, mobile, batch stores, etc.—together
Ultra small & efficient analytics infrastructure
Easy to deploy, use & maintain systems
Low operational costs
No security or performance trade-offs
IoT is exacerbating the widening data
analytics technology divide.
REQUIREMENTS
Persistent compute/IO/storage bottlenecks
Data analyzed in silos
Data movement & ETL delays
Sprawling inefficient analytics infrastructures
Persistent data privacy & security issues
REALITY
4. T H E C H AL L E N G E
Complex, Closed Systems vs
Low Performance Open Source Software
Closed analytics systems are expensive, hard to use and require huge
teams to implement
Open source frameworks are easier to use, but their performance is
limited by the commodity x86 servers they run on
Organizations have been forced to sacrifice performance or simplicity
5. T H E C H AL L E N G E
Slow Networking Speeds That
Extend Data Transport Times
Current infrastructures do not have the power or efficiency to be put at
the network’s edge
Data networking speeds can be slow or unreliable and have a drastic
impact on data analytics speeds
6. T H E C H AL L E N G E
Time Consuming ETL and
Indexing Bottlenecks
Traditional x86-based architectures require lengthy Extract, Transform
and Load (ETL) and Indexing processes
These processes balloon the data size to an unreasonable degree
Data preparation time often means the difference between actionable
insights or poor business decisions
7. T H E C H AL L E N G E
Complex, and Sometimes Impossible, Analytic
Functions
Analytics functions—like fuzzy search—often require more or different
computing power than is available in today’s analytics infrastructures
Traditional analytics ecosystems require massive indexes and data
preparation functions
The combination of data preparation time and analysis limitations don’t
allow for real-time analytics that capture all relevant insights
8. T H E C H AL L E N G E
Costly, Complex and Inefficient
x86-based Clusters
Hardware bottlenecks still stifle data analytics performance
Required data processing, indexing, data sharding and other
bottlenecks inherently slow down analytics
Cluster complexity can lead to inferior data center infrastructures that
do not provide real-time performance
9. Heterogeneous (Hybrid) Computing is the
solution…
SOURCES: BLOOMBERG BUSINESS, THE PLATFORM
Heterogeneous or hybrid
computing refers to
systems that use more than
one kind of processor or
cores. These systems gain
performance or energy
efficiency not just by adding
the same type of processors,
but by adding dissimilar
processors, usually
incorporating specialized
processing capabilities to
handle particular tasks.
10. …because optimal performance & efficiency
demands the right “engine” for the job.
CPUs FPGA
• General purpose
computing
• Sequential in nature
• Non-deterministic
performance
• Interrupts
• Memory
allocation
• Not general purpose and can be
reprogramed via firmware
• Best at data-heavy analysis such as
Search, fuzzy search, image and video
analysis, deep learning
• Inherently massively parallel to give more
output with less power
GPUs
• Some general
purpose computing
• Can excel at certain
complex algorithms
• Generally more
parallel than CPUs,
since GPUs have
more cores