The Briefing Room with Dr. Robin Bloor and Teradata RainStor
Live Webcast October 13, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=012bb2c290097165911872b1f241531d
Hadoop data lakes are emerging as peers to corporate data warehouses. However, successful data management solutions require a fusion of all relevant data, new and old, which has proven challenging for many companies. With a data lake that’s been optimized for fast queries, solid governance and lifecycle management, users can take data management to a whole new level.
Register for this episode of The Briefing Room to learn from veteran Analyst Dr. Robin Bloor as he discusses the relevance of data lakes in today’s information landscape. He’ll be briefed by Mark Cusack of Teradata, who will explain how his company’s archiving solution has developed into a storage point for raw data. He’ll show how the proven compression, scalability and governance of Teradata RainStor combined with Hadoop can enable an optimized data lake that serves as both reservoir for historical data and as a "system of record” for the enterprise.
Visit InsideAnalysis.com for more information.
3. Twitter Tag: #briefr The Briefing Room
Welcome
Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com
@eric_kavanagh
4. Twitter Tag: #briefr The Briefing Room
Reveal the essential characteristics of enterprise
software, good and bad
Provide a forum for detailed analysis of today s innovative
technologies
Give vendors a chance to explain their product to savvy
analysts
Allow audience members to pose serious questions... and
get answers!
Mission
5. Twitter Tag: #briefr The Briefing Room
Topics
October: DATA MANAGEMENT
November: ANALYTICS
December: INNOVATORS
6. Twitter Tag: #briefr The Briefing Room
What Goes In, Should Come Out
! Well Begun = Half Done
! Smart Architecture > Clever Queries
! Low Cost for Planning < Optimal
! Schema on Read ≠ Haphazard Ingestion
7. Twitter Tag: #briefr The Briefing Room
Analyst: Robin Bloor
Robin Bloor is
Chief Analyst at
The Bloor Group
robin.bloor@bloorgroup.com
@robinbloor
8. Twitter Tag: #briefr The Briefing Room
Teradata RainStor
Teradata RainStor is well known for its data archiving
solutions
Its capabilities include an archive on Hadoop’s HDFS, which
allows for SQL queries over the archive
When combined with Hadoop, Teradata RainStor can enable
an optimized data lake capable of storing raw data and
acting as an enterprise system of record
9. Twitter Tag: #briefr The Briefing Room
Guest: Mark Cusack
Mark joined Teradata in 2014 as part of its
RainStor acquisition. As a founding developer and
Chief Architect at RainStor, he has worked on
many different aspects of the product since 2004.
Most recently, he led the efforts to integrate
RainStor with Hadoop and with Teradata. He was
formerly a senior scientist and team lead at
QinetiQ, where he researched distributed
simulation techniques and developed physics-
based models of human behavior to support
military training and operations. He also led
government and industry projects in the areas of
grid and pervasive computing. Before joining
QinetiQ, Mark worked in academia, where he
combined cluster computing methods with
quantum mechanics to predict the properties of
semiconductor microstructures. Mark holds a
Masters in Computing and a PhD in Physics from
Newcastle University.
33. But Not Much Changed
" Nothing changed in respect to enterprise
operational discipline
" Nothing changed in respect to service level
policy
" Nothing changed in respect to data
governance (although it may have gotten
more demanding)
" Possibly the data got dirtier
" Security became more onerous
" Some things became more onerous
" Data volumes increased
34. Hadoop: Good, Bad, Ugly
" GOOD: scalability and
parallelism, some
components (like Kafka
and Presto), costs
" BAD: security, lack of
system management
components, some
components (like Hive)
" UGLY: Lack of stability,
a servant with three
masters, skills and
experience, cultural
issues
35. The Consequence
You need to make sensible
COMPONENT decisions and sensible
ARCHITECTURAL decisions
36. " Can RainStor simply be used as a SQL-capable
query-only database sitting on Hadoop? What
are the gating factors?
" How fast is data ingest? Are there any limits to
how this is done?
" What is the data compression limitation, if
any? How much space would be saved over
Hive or HBase?
" Walk me through a data lake implementation.
37. " Is there any Hadoop distribution that you prefer,
or doesn’t it matter?
" What if I’m not a Teradata user? Is there any
downside to using RainStor?