SlideShare a Scribd company logo
1 of 40
Download to read offline
Grab s-­me
c-­ffee and
enj-­y the
pre!sh-­w
banter
bef-­re the
t-­p -­f the
h-­ur
The Briefing Room
First in Class: Optimizing the Data Lake for Tighter Integration
Twitter Tag: #briefr The Briefing Room
Welcome
Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com
@eric_kavanagh
Twitter Tag: #briefr The Briefing Room
  Reveal the essential characteristics of enterprise
software, good and bad
  Provide a forum for detailed analysis of today s innovative
technologies
  Give vendors a chance to explain their product to savvy
analysts
  Allow audience members to pose serious questions... and
get answers!
Mission
Twitter Tag: #briefr The Briefing Room
Topics
October: DATA MANAGEMENT
November: ANALYTICS
December: INNOVATORS
Twitter Tag: #briefr The Briefing Room
What Goes In, Should Come Out
!  Well Begun = Half Done
!  Smart Architecture > Clever Queries
!  Low Cost for Planning < Optimal
!  Schema on Read ≠ Haphazard Ingestion
Twitter Tag: #briefr The Briefing Room
Analyst: Robin Bloor
Robin Bloor is
Chief Analyst at
The Bloor Group
robin.bloor@bloorgroup.com
@robinbloor
Twitter Tag: #briefr The Briefing Room
Teradata RainStor
  Teradata RainStor is well known for its data archiving
solutions
  Its capabilities include an archive on Hadoop’s HDFS, which
allows for SQL queries over the archive
  When combined with Hadoop, Teradata RainStor can enable
an optimized data lake capable of storing raw data and
acting as an enterprise system of record
Twitter Tag: #briefr The Briefing Room
Guest: Mark Cusack
Mark joined Teradata in 2014 as part of its
RainStor acquisition. As a founding developer and
Chief Architect at RainStor, he has worked on
many different aspects of the product since 2004.
Most recently, he led the efforts to integrate
RainStor with Hadoop and with Teradata. He was
formerly a senior scientist and team lead at
QinetiQ, where he researched distributed
simulation techniques and developed physics-
based models of human behavior to support
military training and operations. He also led
government and industry projects in the areas of
grid and pervasive computing. Before joining
QinetiQ, Mark worked in academia, where he
combined cluster computing methods with
quantum mechanics to predict the properties of
semiconductor microstructures. Mark holds a
Masters in Computing and a PhD in Physics from
Newcastle University.
Teradata RainStor
®
for the Data Lake
2 © 2015 Teradata
•  Cost Savings
–  Convert CapEx to OpEx
–  Decrease storage footprint
–  Future-proof capacity
•  Fast Flexible Access
–  Standards based
–  Compression optimizes queries
•  Data Governance
–  Privacy
–  Security
–  Integrity
Teradata RainStor® – The Structured Data Lake Foundation
Simple, Efficient, Scalable, Cost-Effective
Teradata RainStor is the most efficient,
scalable and accessible way to store
structured or semi-structured data in
your data lake
INGEST
At Network
Speed
COMPRESS
50-80%
Cluster Reduction
ANALYZE
10-100%
Performance Boost
RainStor Partitions
HDFS Files...
3 © 2015 Teradata
Data Lake Use Cases Applicable to RainStor
•  System of Record for Structured Data
–  Provide a trusted source of data with tracking
–  Meet commercial and regulatory requirements
•  Archive for Structured Data
–  Offload historical data
–  Central control of restore capabilities
•  Discovery
–  Data profiling to discover correlations
•  Analysis
–  Custom analytics, signal analysis, and event reporting
•  ETL Remix
–  Staging platform for data cleanup prior to EDW analysis
4 © 2015 Teradata
Archive or a System of Record
Depends on the position of RainStor with respect to the source
Warehouse or
Database
Warehouse or
Database
Archive
System
of
Record
Source
Source
RainStor
RainStor
5 © 2015 Teradata
QUERY
SQL
BI Tools: Hive,
MapReduce
SCALE – Any Platform (MPP, Shared Everything)
COMPRESSLOAD
Billions
Records/Day
10-40X
(90%+)
AVAILABILITY
Replication
EDW/DB
GOVERN
Rules
Based
SECURE – Enterprise-Grade
Network
Tape
Hadoop
NAS, CAS
Apps
MOVE
Teradata RainStor® Overview
6 © 2015 Teradata
Challenges
–  Log, clickstream, and sensor data,
tax import systems
–  Encryption is expensive
–  Extended wait times for data access
–  Maintaining data integrity
RainStor Solutions
–  MPP (scalable data load)
–  Encryption of compressed data
–  Data immediately available for query
–  Fingerprinted
Data
Collection
Process
RainStor Node 1
Service
Manager
Load
Query
Fork
RainStor Node N
HDFS Data
Node
“Move your costs by a decimal point!”
~ Architect, Global Financial Services Company
How does RainStor do this?
–  Separately stages
delimited text data
–  De-dupes and builds
partitions stored on HDFS
–  Shows multi-node query
process view of data
across HDFS
Source
Staging
Area
Data Load
7 © 2015 Teradata
Challenges
–  Storage costs outstrip budgets
–  Queries take longer than ever
RainStor Solutions
! Patented compression techniques
! In-memory and on disk compression is
performance multiplier and storage saver
! Stored in binary tree format
! Algorithms that query compressed data
! Hardware & bandwidth multiplier
! 2-10X more compressed than ORC
! Cost saving on floor, cooling and personnel
! Drives efficient query execution framework
! CPU rather than IO bound
How does RainStor do this?
“Now we can keep years of history that
wasn’t economically feasible until now.”
~ Architect, Communication Service Provider
Compression
0001
0002 200
100
0003
0004
$12
$13
AA
BAC
Stock Trades Example
8 © 2015 Teradata
Challenges
–  Query speed is #1 concern
–  Hive queries aren’t standards based
–  Rewriting queries is a huge task
–  Data transparency
RainStor Solutions
! SQL access – 2-10x faster than Hive
! Improved Hive performance
! Efficient parallel query execution
! User defined functions supported
! Teradata connectivity via QueryGrid
! BI Tool access
! Query access via HCatalog
Query
PPPPPP
P
Static
MetadataSQL
Hive
Pig
HCatalog
Predicates
Bloom Filter
Dynamic
Fields StatsTypes
PPPPP
HDFS
“RainStor doesn’t care what hardware it runs on.
It’s just as good on Tier-2 or Tier-3 hardware.”
~ Chief Architect, Global Investment Bank
How does RainStor do this?
9 © 2015 Teradata
RainStor Governance
•  Data Encryption
•  Data Masking
•  Log Masking
•  View-Based Dynamic Masking
•  Authentication
–  Kerberos
–  LDAP/AD
–  Linux PAM
•  SQL92 Authorization
•  Immutable Data Model
•  Record-Level Delete
•  Schema Evolution
•  Data Disposition
•  Replication
•  Audit Trail
Privacy
SecurityIntegrity
Designed to support PCI-DSS,
SEC17a-4, etc.
“How did you guys get it right and others didn’t.”
~ Architect, U.S. Bank
10 © 2015 Teradata
RainStor 7 Architecture
Apache Teradata RainStor
®
Teradata IDW & Tools
RainStor Files
MapReduce
Teradata BARTeradata IDW
Hive
Pig Java
HCatalog
MapReduce / YARN
Teradata
QueryGrid™
Interactive SQL
Oracle, SQLServer, SybaseIQ, Netezza extensions
ODBC
JDBC
Data
LoaderFastConnect™ FastForward™
!
HDFS (CDH/HDS)
Management Alerting
Security
Retention Rules
Replication
Compliance
!
NAS, CAS, SAN, WORM
Vendor specific
User-Defined
Functions
11 © 2015 Teradata
Integration with Teradata QueryGrid
TERADATA
ASTER
RAINSTOR
ON HADOOP
TERADATA
DATABASE
HADOOP OTHER
DATABASES
TD QueryGrid Support for RainStor
Business users Data scientists
12 © 2015 Teradata
RainStor Integration with Teradata
FastForward™
FastConnect™
10111001001010110
1011100100101011010111001001010110
BAR PBs of history
QueryGrid™
10111001001010110
10111001001010110
PS Engagement
RainStor
13 © 2015 Teradata
​ US Telco Case Study #1: Data Lake
​ Network Performance
​ Problem
•  Storage & analysis of network events
–  Performance, faults, changes
​ Challenges
•  50TB raw data/day
•  Demanding query SLAs
​ Results
•  Storing 30 days data – up from 3 days
•  8 node Hadoop cluster
•  83% reduction in storage footprint
•  Data lake system of record
Dual Load
RainStor
“RainStor addresses data
growth at the root cause.”
~ Architect, U.S. Bank
Network
Events
14 © 2015 Teradata
​ US Telco Case Study #2: Data Lake
​ Compliant and Secure Analytics
Problem
•  Usage data must be encrypted on Hadoop
•  Avoid any query performance impact
Challenge
•  Deliver cost-effective & secure scalability
Solution
•  RainStor 15x compression vs. ORC 7x
•  Encryption with only 3% query overhead
•  Queries 3X faster than Hive
Clickstream/Usage Data
Network
Customer
Data Extract/
Scrubbing
1.2PB
62 Nodes
Running Hortonworks 2.1
RainStor
“We keep finding new stuff
we can do with RainStor!
We are just getting rolling!
~ Principal Architect, Global CSP
15 © 2015 Teradata
​ US Telco Case Study #3: Data Lake
Application Retirement & Access
Problem
•  Hundreds of applications taking up space
Challenges
•  Needed to lower TCO
•  Free up capacity and maintain user access
Results
•  Hundreds of apps retired into RainStor
•  Users access data using BI tool of their choice
•  Administration is minimal on low cost NAS
•  Saving $800K for every 100TB stored in RainStor
AfterBefore
“I installed RainStor in less than
5 minutes and was querying the
data 30 minutes later.”
~ Principal Architect, U.S. Telco
RainStor
16 © 2015 Teradata
•  Cost Savings
–  Convert CapEx to OpEx
–  Decrease storage footprint
–  Future-proof capacity
•  Fast Flexible Access
–  Standards based
–  Compression optimizes queries
•  Data Governance
–  Privacy
–  Security
–  Integrity
Teradata RainStor® – The Structured Data Lake Foundation
Simple, Efficient, Scalable, Cost-Effective
Teradata RainStor is the most efficient,
scalable and accessible way to store
structured or semi-structured data in
your data lake
INGEST
At Network
Speed
COMPRESS
50-80%
Cluster Reduction
ANALYZE
10-100%
Performance Boost
RainStor Partitions
HDFS Files...
17 © 2015 Teradata
Backup Slides
19 © 2015 Teradata
Teradata Appliance for Hadoop
•  Future-proof capacity
–  2x to 8x more compressed, including ORC
•  Fast analysis (2x to 100x performance boost)
–  Mature SQL stack
-  Multiple parsers – Oracle, SQL Server, Sybase
–  Fast Hive QL, Pig, MapReduce
–  Support for BI tool
•  Security and compliance
–  Encryption
–  LDAP/AD/PAM/Kerberos/PCI
–  SQL92 users, tables, views, and data masking
–  Audit trails & logging
•  Life cycle management
–  Retention rules & expiry policies
–  Schema evolution
•  Faster time-to-value
Twitter Tag: #briefr The Briefing Room
Perceptions & Questions
Analyst:
Robin Bloor
The Quality of the
Data Lake
Robin Bloor, PhD
The Departure Point
A data lake is a WHOLLY NEW
architectural idea
Pouring Data Into the Lake
But Not Much Changed
" Nothing changed in respect to enterprise
operational discipline
" Nothing changed in respect to service level
policy
" Nothing changed in respect to data
governance (although it may have gotten
more demanding)
" Possibly the data got dirtier
" Security became more onerous
" Some things became more onerous
" Data volumes increased
Hadoop: Good, Bad, Ugly
" GOOD: scalability and
parallelism, some
components (like Kafka
and Presto), costs
" BAD: security, lack of
system management
components, some
components (like Hive)
" UGLY: Lack of stability,
a servant with three
masters, skills and
experience, cultural
issues
The Consequence
You need to make sensible
COMPONENT decisions and sensible
ARCHITECTURAL decisions
"  Can RainStor simply be used as a SQL-capable
query-only database sitting on Hadoop? What
are the gating factors?
"  How fast is data ingest? Are there any limits to
how this is done?
"  What is the data compression limitation, if
any? How much space would be saved over
Hive or HBase?
"  Walk me through a data lake implementation.
"  Is there any Hadoop distribution that you prefer,
or doesn’t it matter?
"  What if I’m not a Teradata user? Is there any
downside to using RainStor?
Twitter Tag: #briefr The Briefing Room
Twitter Tag: #briefr The Briefing Room
Upcoming Topics
www.insideanalysis.com
October: DATA MANAGEMENT
November: ANALYTICS
December: INNOVATORS
Twitter Tag: #briefr The Briefing Room
THANK YOU
for your
ATTENTION!
Some images provided courtesy of Wikimedia Commons

More Related Content

What's hot

Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...Cloudera, Inc.
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB
 
Driven by data - Why we need a Modern Enterprise Data Analytics Platform
Driven by data - Why we need a Modern Enterprise Data Analytics PlatformDriven by data - Why we need a Modern Enterprise Data Analytics Platform
Driven by data - Why we need a Modern Enterprise Data Analytics PlatformArne Roßmann
 
Harnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case StudyHarnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case StudyDataWorks Summit
 
The Five Markers on Your Big Data Journey
The Five Markers on Your Big Data JourneyThe Five Markers on Your Big Data Journey
The Five Markers on Your Big Data JourneyCloudera, Inc.
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Cloudera, Inc.
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonJeffrey T. Pollock
 
Meet the experts dwo bde vds v7
Meet the experts dwo bde vds v7Meet the experts dwo bde vds v7
Meet the experts dwo bde vds v7mmathipra
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
Polymorphic Table Functions: The Best Way to Integrate SQL and Apache Spark
Polymorphic Table Functions: The Best Way to Integrate SQL and Apache SparkPolymorphic Table Functions: The Best Way to Integrate SQL and Apache Spark
Polymorphic Table Functions: The Best Way to Integrate SQL and Apache SparkDatabricks
 
An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIInside Analysis
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti
 
Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Hortonworks
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...DataWorks Summit
 
Privacy-Preserving AI Network - PlatON 2.0
Privacy-Preserving AI Network - PlatON 2.0 Privacy-Preserving AI Network - PlatON 2.0
Privacy-Preserving AI Network - PlatON 2.0 ShiHeng1
 
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...DataWorks Summit
 
Preparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity RenaissancePreparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity RenaissanceCloudera, Inc.
 
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...StampedeCon
 

What's hot (20)

Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
 
Driven by data - Why we need a Modern Enterprise Data Analytics Platform
Driven by data - Why we need a Modern Enterprise Data Analytics PlatformDriven by data - Why we need a Modern Enterprise Data Analytics Platform
Driven by data - Why we need a Modern Enterprise Data Analytics Platform
 
Harnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case StudyHarnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case Study
 
The Five Markers on Your Big Data Journey
The Five Markers on Your Big Data JourneyThe Five Markers on Your Big Data Journey
The Five Markers on Your Big Data Journey
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
 
Hadoop dev 01
Hadoop dev 01Hadoop dev 01
Hadoop dev 01
 
Capgemini Insights and Data
Capgemini Insights and Data Capgemini Insights and Data
Capgemini Insights and Data
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
Meet the experts dwo bde vds v7
Meet the experts dwo bde vds v7Meet the experts dwo bde vds v7
Meet the experts dwo bde vds v7
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Polymorphic Table Functions: The Best Way to Integrate SQL and Apache Spark
Polymorphic Table Functions: The Best Way to Integrate SQL and Apache SparkPolymorphic Table Functions: The Best Way to Integrate SQL and Apache Spark
Polymorphic Table Functions: The Best Way to Integrate SQL and Apache Spark
 
An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BI
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to Production
 
Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...
 
Privacy-Preserving AI Network - PlatON 2.0
Privacy-Preserving AI Network - PlatON 2.0 Privacy-Preserving AI Network - PlatON 2.0
Privacy-Preserving AI Network - PlatON 2.0
 
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
 
Preparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity RenaissancePreparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity Renaissance
 
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
 

Viewers also liked

Fax sad grupo união sport
Fax sad   grupo união sportFax sad   grupo união sport
Fax sad grupo união sportgrupouniaosport
 
Untitled Presentation
Untitled PresentationUntitled Presentation
Untitled Presentation晗 鲁
 
Thinking Extinction Main Poster
Thinking Extinction Main PosterThinking Extinction Main Poster
Thinking Extinction Main Posterhuman642
 
Show da Manhã - Paranaíba
Show da Manhã - ParanaíbaShow da Manhã - Paranaíba
Show da Manhã - ParanaíbaMeio & Mensagem
 
Аутентификационный центр
Аутентификационный центрАутентификационный центр
Аутентификационный центрКРОК
 
Classificação amador fpf - 1ª fase
Classificação amador fpf - 1ª faseClassificação amador fpf - 1ª fase
Classificação amador fpf - 1ª faseFPF PE
 
Digital literacy powerpoint project
Digital literacy powerpoint projectDigital literacy powerpoint project
Digital literacy powerpoint projectsara_renee98
 
Suzana Ferreira- In Memoriam
Suzana Ferreira- In MemoriamSuzana Ferreira- In Memoriam
Suzana Ferreira- In MemoriamMarlene Gaspar
 
Normes de composició
Normes de composicióNormes de composició
Normes de composicióLauBosch8
 
Conception universelle de l'apprentissage
Conception universelle de l'apprentissageConception universelle de l'apprentissage
Conception universelle de l'apprentissageJean-Luc Trussart
 

Viewers also liked (13)

Fax sad grupo união sport
Fax sad   grupo união sportFax sad   grupo união sport
Fax sad grupo união sport
 
Untitled Presentation
Untitled PresentationUntitled Presentation
Untitled Presentation
 
Thinking Extinction Main Poster
Thinking Extinction Main PosterThinking Extinction Main Poster
Thinking Extinction Main Poster
 
Show da Manhã - Paranaíba
Show da Manhã - ParanaíbaShow da Manhã - Paranaíba
Show da Manhã - Paranaíba
 
Ost 1 12646 76
Ost 1 12646 76Ost 1 12646 76
Ost 1 12646 76
 
Аутентификационный центр
Аутентификационный центрАутентификационный центр
Аутентификационный центр
 
Classificação amador fpf - 1ª fase
Classificação amador fpf - 1ª faseClassificação amador fpf - 1ª fase
Classificação amador fpf - 1ª fase
 
Digital literacy powerpoint project
Digital literacy powerpoint projectDigital literacy powerpoint project
Digital literacy powerpoint project
 
Question 3
Question 3Question 3
Question 3
 
Linea de vida
Linea de vida Linea de vida
Linea de vida
 
Suzana Ferreira- In Memoriam
Suzana Ferreira- In MemoriamSuzana Ferreira- In Memoriam
Suzana Ferreira- In Memoriam
 
Normes de composició
Normes de composicióNormes de composició
Normes de composició
 
Conception universelle de l'apprentissage
Conception universelle de l'apprentissageConception universelle de l'apprentissage
Conception universelle de l'apprentissage
 

Similar to First in Class: Optimizing the Data Lake for Tighter Integration

Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Wasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming PlatformWasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming PlatformPaolo Platter
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeKent Graziano
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDATAVERSITY
 
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...Dataconomy Media
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
 
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationNot Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationInside Analysis
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...Maya Lumbroso
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...Dataconomy Media
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockJeffrey T. Pollock
 
Audax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data ExplosionAudax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data Explosionactifio
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul
 

Similar to First in Class: Optimizing the Data Lake for Tighter Integration (20)

Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Wasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming PlatformWasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming Platform
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
 
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationNot Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
 
Audax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data ExplosionAudax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data Explosion
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 

More from Inside Analysis

To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security Inside Analysis
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeInside Analysis
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataInside Analysis
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionInside Analysis
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsInside Analysis
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingInside Analysis
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLInside Analysis
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelInside Analysis
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureInside Analysis
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskInside Analysis
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataInside Analysis
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseInside Analysis
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldInside Analysis
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave DuggalInside Analysis
 
Phasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyPhasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyInside Analysis
 
Red Hat - Sarangan Rangachari
Red Hat - Sarangan RangachariRed Hat - Sarangan Rangachari
Red Hat - Sarangan RangachariInside Analysis
 

More from Inside Analysis (20)

To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of Data
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time Analytics
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of Everything
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your Architecture
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the Risk
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data Warehouse
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave Duggal
 
Modus Operandi
Modus OperandiModus Operandi
Modus Operandi
 
Phasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyPhasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey Malafsky
 
Red Hat - Sarangan Rangachari
Red Hat - Sarangan RangachariRed Hat - Sarangan Rangachari
Red Hat - Sarangan Rangachari
 
WebAction-Sami Abkay
WebAction-Sami AbkayWebAction-Sami Abkay
WebAction-Sami Abkay
 
DisrupTech 2015ek
DisrupTech 2015ekDisrupTech 2015ek
DisrupTech 2015ek
 

Recently uploaded

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

First in Class: Optimizing the Data Lake for Tighter Integration

  • 1. Grab s-­me c-­ffee and enj-­y the pre!sh-­w banter bef-­re the t-­p -­f the h-­ur
  • 2. The Briefing Room First in Class: Optimizing the Data Lake for Tighter Integration
  • 3. Twitter Tag: #briefr The Briefing Room Welcome Host: Eric Kavanagh eric.kavanagh@bloorgroup.com @eric_kavanagh
  • 4. Twitter Tag: #briefr The Briefing Room   Reveal the essential characteristics of enterprise software, good and bad   Provide a forum for detailed analysis of today s innovative technologies   Give vendors a chance to explain their product to savvy analysts   Allow audience members to pose serious questions... and get answers! Mission
  • 5. Twitter Tag: #briefr The Briefing Room Topics October: DATA MANAGEMENT November: ANALYTICS December: INNOVATORS
  • 6. Twitter Tag: #briefr The Briefing Room What Goes In, Should Come Out !  Well Begun = Half Done !  Smart Architecture > Clever Queries !  Low Cost for Planning < Optimal !  Schema on Read ≠ Haphazard Ingestion
  • 7. Twitter Tag: #briefr The Briefing Room Analyst: Robin Bloor Robin Bloor is Chief Analyst at The Bloor Group robin.bloor@bloorgroup.com @robinbloor
  • 8. Twitter Tag: #briefr The Briefing Room Teradata RainStor   Teradata RainStor is well known for its data archiving solutions   Its capabilities include an archive on Hadoop’s HDFS, which allows for SQL queries over the archive   When combined with Hadoop, Teradata RainStor can enable an optimized data lake capable of storing raw data and acting as an enterprise system of record
  • 9. Twitter Tag: #briefr The Briefing Room Guest: Mark Cusack Mark joined Teradata in 2014 as part of its RainStor acquisition. As a founding developer and Chief Architect at RainStor, he has worked on many different aspects of the product since 2004. Most recently, he led the efforts to integrate RainStor with Hadoop and with Teradata. He was formerly a senior scientist and team lead at QinetiQ, where he researched distributed simulation techniques and developed physics- based models of human behavior to support military training and operations. He also led government and industry projects in the areas of grid and pervasive computing. Before joining QinetiQ, Mark worked in academia, where he combined cluster computing methods with quantum mechanics to predict the properties of semiconductor microstructures. Mark holds a Masters in Computing and a PhD in Physics from Newcastle University.
  • 11. 2 © 2015 Teradata •  Cost Savings –  Convert CapEx to OpEx –  Decrease storage footprint –  Future-proof capacity •  Fast Flexible Access –  Standards based –  Compression optimizes queries •  Data Governance –  Privacy –  Security –  Integrity Teradata RainStor® – The Structured Data Lake Foundation Simple, Efficient, Scalable, Cost-Effective Teradata RainStor is the most efficient, scalable and accessible way to store structured or semi-structured data in your data lake INGEST At Network Speed COMPRESS 50-80% Cluster Reduction ANALYZE 10-100% Performance Boost RainStor Partitions HDFS Files...
  • 12. 3 © 2015 Teradata Data Lake Use Cases Applicable to RainStor •  System of Record for Structured Data –  Provide a trusted source of data with tracking –  Meet commercial and regulatory requirements •  Archive for Structured Data –  Offload historical data –  Central control of restore capabilities •  Discovery –  Data profiling to discover correlations •  Analysis –  Custom analytics, signal analysis, and event reporting •  ETL Remix –  Staging platform for data cleanup prior to EDW analysis
  • 13. 4 © 2015 Teradata Archive or a System of Record Depends on the position of RainStor with respect to the source Warehouse or Database Warehouse or Database Archive System of Record Source Source RainStor RainStor
  • 14. 5 © 2015 Teradata QUERY SQL BI Tools: Hive, MapReduce SCALE – Any Platform (MPP, Shared Everything) COMPRESSLOAD Billions Records/Day 10-40X (90%+) AVAILABILITY Replication EDW/DB GOVERN Rules Based SECURE – Enterprise-Grade Network Tape Hadoop NAS, CAS Apps MOVE Teradata RainStor® Overview
  • 15. 6 © 2015 Teradata Challenges –  Log, clickstream, and sensor data, tax import systems –  Encryption is expensive –  Extended wait times for data access –  Maintaining data integrity RainStor Solutions –  MPP (scalable data load) –  Encryption of compressed data –  Data immediately available for query –  Fingerprinted Data Collection Process RainStor Node 1 Service Manager Load Query Fork RainStor Node N HDFS Data Node “Move your costs by a decimal point!” ~ Architect, Global Financial Services Company How does RainStor do this? –  Separately stages delimited text data –  De-dupes and builds partitions stored on HDFS –  Shows multi-node query process view of data across HDFS Source Staging Area Data Load
  • 16. 7 © 2015 Teradata Challenges –  Storage costs outstrip budgets –  Queries take longer than ever RainStor Solutions ! Patented compression techniques ! In-memory and on disk compression is performance multiplier and storage saver ! Stored in binary tree format ! Algorithms that query compressed data ! Hardware & bandwidth multiplier ! 2-10X more compressed than ORC ! Cost saving on floor, cooling and personnel ! Drives efficient query execution framework ! CPU rather than IO bound How does RainStor do this? “Now we can keep years of history that wasn’t economically feasible until now.” ~ Architect, Communication Service Provider Compression 0001 0002 200 100 0003 0004 $12 $13 AA BAC Stock Trades Example
  • 17. 8 © 2015 Teradata Challenges –  Query speed is #1 concern –  Hive queries aren’t standards based –  Rewriting queries is a huge task –  Data transparency RainStor Solutions ! SQL access – 2-10x faster than Hive ! Improved Hive performance ! Efficient parallel query execution ! User defined functions supported ! Teradata connectivity via QueryGrid ! BI Tool access ! Query access via HCatalog Query PPPPPP P Static MetadataSQL Hive Pig HCatalog Predicates Bloom Filter Dynamic Fields StatsTypes PPPPP HDFS “RainStor doesn’t care what hardware it runs on. It’s just as good on Tier-2 or Tier-3 hardware.” ~ Chief Architect, Global Investment Bank How does RainStor do this?
  • 18. 9 © 2015 Teradata RainStor Governance •  Data Encryption •  Data Masking •  Log Masking •  View-Based Dynamic Masking •  Authentication –  Kerberos –  LDAP/AD –  Linux PAM •  SQL92 Authorization •  Immutable Data Model •  Record-Level Delete •  Schema Evolution •  Data Disposition •  Replication •  Audit Trail Privacy SecurityIntegrity Designed to support PCI-DSS, SEC17a-4, etc. “How did you guys get it right and others didn’t.” ~ Architect, U.S. Bank
  • 19. 10 © 2015 Teradata RainStor 7 Architecture Apache Teradata RainStor ® Teradata IDW & Tools RainStor Files MapReduce Teradata BARTeradata IDW Hive Pig Java HCatalog MapReduce / YARN Teradata QueryGrid™ Interactive SQL Oracle, SQLServer, SybaseIQ, Netezza extensions ODBC JDBC Data LoaderFastConnect™ FastForward™ ! HDFS (CDH/HDS) Management Alerting Security Retention Rules Replication Compliance ! NAS, CAS, SAN, WORM Vendor specific User-Defined Functions
  • 20. 11 © 2015 Teradata Integration with Teradata QueryGrid TERADATA ASTER RAINSTOR ON HADOOP TERADATA DATABASE HADOOP OTHER DATABASES TD QueryGrid Support for RainStor Business users Data scientists
  • 21. 12 © 2015 Teradata RainStor Integration with Teradata FastForward™ FastConnect™ 10111001001010110 1011100100101011010111001001010110 BAR PBs of history QueryGrid™ 10111001001010110 10111001001010110 PS Engagement RainStor
  • 22. 13 © 2015 Teradata ​ US Telco Case Study #1: Data Lake ​ Network Performance ​ Problem •  Storage & analysis of network events –  Performance, faults, changes ​ Challenges •  50TB raw data/day •  Demanding query SLAs ​ Results •  Storing 30 days data – up from 3 days •  8 node Hadoop cluster •  83% reduction in storage footprint •  Data lake system of record Dual Load RainStor “RainStor addresses data growth at the root cause.” ~ Architect, U.S. Bank Network Events
  • 23. 14 © 2015 Teradata ​ US Telco Case Study #2: Data Lake ​ Compliant and Secure Analytics Problem •  Usage data must be encrypted on Hadoop •  Avoid any query performance impact Challenge •  Deliver cost-effective & secure scalability Solution •  RainStor 15x compression vs. ORC 7x •  Encryption with only 3% query overhead •  Queries 3X faster than Hive Clickstream/Usage Data Network Customer Data Extract/ Scrubbing 1.2PB 62 Nodes Running Hortonworks 2.1 RainStor “We keep finding new stuff we can do with RainStor! We are just getting rolling! ~ Principal Architect, Global CSP
  • 24. 15 © 2015 Teradata ​ US Telco Case Study #3: Data Lake Application Retirement & Access Problem •  Hundreds of applications taking up space Challenges •  Needed to lower TCO •  Free up capacity and maintain user access Results •  Hundreds of apps retired into RainStor •  Users access data using BI tool of their choice •  Administration is minimal on low cost NAS •  Saving $800K for every 100TB stored in RainStor AfterBefore “I installed RainStor in less than 5 minutes and was querying the data 30 minutes later.” ~ Principal Architect, U.S. Telco RainStor
  • 25. 16 © 2015 Teradata •  Cost Savings –  Convert CapEx to OpEx –  Decrease storage footprint –  Future-proof capacity •  Fast Flexible Access –  Standards based –  Compression optimizes queries •  Data Governance –  Privacy –  Security –  Integrity Teradata RainStor® – The Structured Data Lake Foundation Simple, Efficient, Scalable, Cost-Effective Teradata RainStor is the most efficient, scalable and accessible way to store structured or semi-structured data in your data lake INGEST At Network Speed COMPRESS 50-80% Cluster Reduction ANALYZE 10-100% Performance Boost RainStor Partitions HDFS Files...
  • 26. 17 © 2015 Teradata
  • 28. 19 © 2015 Teradata Teradata Appliance for Hadoop •  Future-proof capacity –  2x to 8x more compressed, including ORC •  Fast analysis (2x to 100x performance boost) –  Mature SQL stack -  Multiple parsers – Oracle, SQL Server, Sybase –  Fast Hive QL, Pig, MapReduce –  Support for BI tool •  Security and compliance –  Encryption –  LDAP/AD/PAM/Kerberos/PCI –  SQL92 users, tables, views, and data masking –  Audit trails & logging •  Life cycle management –  Retention rules & expiry policies –  Schema evolution •  Faster time-to-value
  • 29. Twitter Tag: #briefr The Briefing Room Perceptions & Questions Analyst: Robin Bloor
  • 30. The Quality of the Data Lake Robin Bloor, PhD
  • 31. The Departure Point A data lake is a WHOLLY NEW architectural idea
  • 32. Pouring Data Into the Lake
  • 33. But Not Much Changed " Nothing changed in respect to enterprise operational discipline " Nothing changed in respect to service level policy " Nothing changed in respect to data governance (although it may have gotten more demanding) " Possibly the data got dirtier " Security became more onerous " Some things became more onerous " Data volumes increased
  • 34. Hadoop: Good, Bad, Ugly " GOOD: scalability and parallelism, some components (like Kafka and Presto), costs " BAD: security, lack of system management components, some components (like Hive) " UGLY: Lack of stability, a servant with three masters, skills and experience, cultural issues
  • 35. The Consequence You need to make sensible COMPONENT decisions and sensible ARCHITECTURAL decisions
  • 36. "  Can RainStor simply be used as a SQL-capable query-only database sitting on Hadoop? What are the gating factors? "  How fast is data ingest? Are there any limits to how this is done? "  What is the data compression limitation, if any? How much space would be saved over Hive or HBase? "  Walk me through a data lake implementation.
  • 37. "  Is there any Hadoop distribution that you prefer, or doesn’t it matter? "  What if I’m not a Teradata user? Is there any downside to using RainStor?
  • 38. Twitter Tag: #briefr The Briefing Room
  • 39. Twitter Tag: #briefr The Briefing Room Upcoming Topics www.insideanalysis.com October: DATA MANAGEMENT November: ANALYTICS December: INNOVATORS
  • 40. Twitter Tag: #briefr The Briefing Room THANK YOU for your ATTENTION! Some images provided courtesy of Wikimedia Commons