SlideShare a Scribd company logo
Innovation in the Data Warehouse
Kit Menke, Software Architect
StampedeCon 2016
July 27, 2016
Agenda
▪ Use Case
▪ Architectures
▪ Decision Points
Enterprise Holdings, Inc.
▪ Our Business
• 9 thousand locations
• 80 countries
• 93 thousand employees
• 1.7 million vehicles
▪ Data Warehouse
• Near capacity: Used about 75+ of 80 Terabytes
• Streaming and batch data feeds from over 50 internal systems &
external sources
• 100+ databases and 22+ thousand tables
• Around 1 billion queries executed per month
• Over 45,000 reporting users with 5+ million report executions
every month.
• Statistical Modeling & Advanced Analytics - 40+ Projects
Implemented for Predictive & Diagnostic Analytics
Data Warehouse - Present
Data Warehouse Growth
Challenges – Current Platform
▪ System Capacity Constraints
• Overall Current System Utilization is High
• Space & CPU Constraints
• Most of these challenges can be overcome by adding
more Teradata capacity or doing augmentation
▪ Use Cases not good fit for Teradata EDW
• Unstructured data
• Source structures changing frequently
• Data for exploration, discovery, & analytics
• Staging, transient, & history data
• These challenges can be overcome by augmentation
▪ Bottom-line: Improved agility & greater value
Augmentation Recommendation: Hadoop
▪ Leverage Hadoop to complement Teradata
EDW
• Hybrid Approach
▪ The Hortonworks distribution of Hadoop
• Compatibility/integration with Teradata EDW to
achieve high degree of interoperability
▪ Intent is not to have a centralized Hadoop
service
• EDW Augmentation Only
7
Data Warehouse - Future
Architectures
▪ Data warehouse augmentation contains
streaming and batch use cases
▪ Three Big Data architectures to explore:
1. Batch
2. Lambda
3. Kappa
Batch
Batch
▪ Land data into Hadoop first
▪ ETL in Hadoop to build reporting tables and
publish to Teradata
▪ Archive old data from Teradata DB
▪ Data available for analysis in Hive
▪ Great for semi-structured data files
▪ But… too slow for streaming data
Lambda
Lambda
▪ Attempts to combine batch and streaming
to get benefits from both
▪ Batch layer is comprehensive and accurate
▪ Streaming layer is fast but might only be
able to keep recent data
▪ Potentially have to maintain two codebases
Kappa
Kappa
▪ Everything is a stream (no batch!)
▪ Depends largely on your log data store
usually Kafka
▪ All raw data is stored in Kafka
▪ Much simpler architecture than lambda
• New version? Re-deploy app and start
reprocessing from the start and generate new
output table
• Once complete point app to new output table
Choosing an Architecture
▪ Batch – process data in batches
• All data processed in batches to create an
output
▪ Lambda – split streaming data into batch
and real-time
• Stream processing for the data you need fast
and the rest is batch processed
▪ Kappa – everything is a stream
• All data is processed as a stream even when it
needs to be reprocessed
Implementing an Architecture
▪ Requirements for the use case drives
architecture
▪ Walk through decision points
1. Cloud or on premises
2. Physical or virtual machines
3. Cluster workload
▪ Plus others!
Cloud vs on premises
▪ Scalability
• Much easier to scale a Cloud solution
• Physical hardware requires an infrastructure team to manage
▪ Data source location (data gravity) / integration points
• Cluster should be as close as possible to your data source
• Cloud is good option for internet data sources
▪ Cloud offerings
• Hadoop: Azure HDInsight, Amazon EMR, Google Cloud
• Integration with other PaaS services
▪ Network
• Bandwidth to/from cloud implementation
Physical vs virtual
▪ Performance
• Physical hardware will perform better, Hadoop is
designed with physical hardware in mind
▪ Maintenance
• No hardware to maintain for virtual servers
▪ Time to market
• Virtual machines much faster to provision
• For physical hardware if infrastructure team is a
roadblock then appliance is good option instead of
commodity
▪ Development and test environments make more
sense to virtualize
Workload
▪ Streaming
• Running 24/7
• Need dedicated resources
▪ Batch
• Scheduled
• Periods of high utilization (scalability)
▪ Multi-Tenancy
• Blended workloads
• YARN (queues, node labels)
• Think about Isolating nodes for real-time
Other considerations
▪ Disaster recovery
• Data is locally redundant
• Backups not usually required unless you need geo-redundancy
▪ Security - Many different things to secure!
• Kerberos for user, service, and host authentication
• Authorization: Apache Ranger (Hortonworks) or Apache Sentry
(Cloudera) or MapR Control System
• Network isolation for Hadoop services
• Data at rest (HFDS encryption)
▪ Hadoop Distribution - Race to include the most Apache projects
• Top 3: Hortonworks, Cloudera, MapR
• Big companies with Hadoop offering:
– Teradata Hadoop aka TDH (Hortonworks, Cloudera, MapR)
– Oracle Big Data Applicance (Cloudera)
Spectrum of Options
▪ Cloud PaaS
• No hardware or software to manage
• Amazon S3, Azure Data Lake
▪ Cloud
• Weird space between IaaS and PaaS
• Amazon EMR
• HDInsight is more PaaS
▪ Cloud IaaS
• All virtual, no hardware to manage
• You manage all software
▪ Third party hosted
• Rackspace
• Software managed by you
▪ Appliance
• Infrastructure handled for you
• Dell, HP, Cisco, Teradata, Oracle
• Software (varies depending on vendor)
▪ Commodity
• DIY
Lessons Learned
▪ Workload isolation is hard
• Multi-tenancy is possible
• Takes work to make sure batch jobs don’t impact
the real-time streaming processes
▪ Things we like: Hive, Hbase
▪ Things we don’t like: SOLR, debugging
▪ Debugging / development is hard
• Lots of moving pieces
• Logs spread out across many machines
• Development environments require a lot of software
• Distributed systems just work differently
Questions?
▪ Hortonworks Community
• https://community.hortonworks.com/answers/
index.html
▪ Kit Menke
• @kitmenke on Twitter
Resources
▪ Lambda Architecture
• http://lambda-architecture.net/
▪ Kappa Architecture
• http://kappa-architecture.com/
▪ Kappa Architecture - Our Experience by ASPgems
• http://events.linuxfoundation.org/sites/events/files/slides/
ASPgems%20-%20Kappa%20Architecture.pdf
▪ Apache Hadoop YARN – Multi-Tenancy, Capacity
Scheduler & Preemption - StampedeCon 2015
• http://www.slideshare.net/StampedeCon/apache-hadoop-
yarn-multitenancy-capacity-scheduler-preemption-
stampedecon-2015

More Related Content

What's hot

Data Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringData Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data Engineering
Anant Corporation
 
2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final
Adam Muise
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS Cloud
Idan Tohami
 
Big Data in Azure
Big Data in AzureBig Data in Azure
The EDW Ecosystem
The EDW EcosystemThe EDW Ecosystem
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data Platforms
Mark Rittman
 
Beyond TCO
Beyond TCOBeyond TCO
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
Cisco Canada
 
Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryNavigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data Discovery
DataWorks Summit/Hadoop Summit
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
DataWorks Summit
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle Database
Gwen (Chen) Shapira
 
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
Avinash Ramineni
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing Architecture
Gang Tao
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma Presentation
Zaloni
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introduction
Durga Gadiraju
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
DataWorks Summit
 
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0   virtual - subhadeep duttaCWIN17 India / Insights platform architecture v1 0   virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
Capgemini
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
DataWorks Summit/Hadoop Summit
 
Data-In-Motion Unleashed
Data-In-Motion UnleashedData-In-Motion Unleashed
Data-In-Motion Unleashed
DataWorks Summit
 

What's hot (19)

Data Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringData Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data Engineering
 
2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS Cloud
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
The EDW Ecosystem
The EDW EcosystemThe EDW Ecosystem
The EDW Ecosystem
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data Platforms
 
Beyond TCO
Beyond TCOBeyond TCO
Beyond TCO
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryNavigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data Discovery
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle Database
 
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing Architecture
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma Presentation
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introduction
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
 
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0   virtual - subhadeep duttaCWIN17 India / Insights platform architecture v1 0   virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
 
Data-In-Motion Unleashed
Data-In-Motion UnleashedData-In-Motion Unleashed
Data-In-Motion Unleashed
 

Viewers also liked

Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
StampedeCon
 
Intuit Analytics Cloud 101
Intuit Analytics Cloud 101Intuit Analytics Cloud 101
Intuit Analytics Cloud 101
DataWorks Summit/Hadoop Summit
 
Driving Innovation with Open Data
Driving Innovation with Open DataDriving Innovation with Open Data
Driving Innovation with Open Data
Amazon Web Services
 
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
StampedeCon
 
Floods of Twitter Data - StampedeCon 2016
Floods of Twitter Data - StampedeCon 2016Floods of Twitter Data - StampedeCon 2016
Floods of Twitter Data - StampedeCon 2016
StampedeCon
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Node Architecture Implications for In-Memory Data Analytics on Scale-in ClustersNode Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Ahsan Javed Awan
 
Visualizing Big Data – The Fundamentals
Visualizing Big Data – The FundamentalsVisualizing Big Data – The Fundamentals
Visualizing Big Data – The Fundamentals
StampedeCon
 
ASPgems - kappa architecture
ASPgems - kappa architectureASPgems - kappa architecture
ASPgems - kappa architecture
Juantomás García Molina
 
Real time data ingestion and Hybrid Cloud
Real time data ingestion and Hybrid CloudReal time data ingestion and Hybrid Cloud
Real time data ingestion and Hybrid Cloud
Neeraj Sabharwal
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
Venu Anuganti
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APIStreaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
Carol McDonald
 
Insurtech presentation
Insurtech presentationInsurtech presentation
Insurtech presentation
Fabrizio Villani
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Gyula Fóra
 
Voldemort : Prototype to Production
Voldemort : Prototype to ProductionVoldemort : Prototype to Production
Voldemort : Prototype to Production
Vinoth Chandar
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
Caserta
 
Interplay of Big Data and IoT - StampedeCon 2016
Interplay of Big Data and IoT - StampedeCon 2016Interplay of Big Data and IoT - StampedeCon 2016
Interplay of Big Data and IoT - StampedeCon 2016
StampedeCon
 
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
StampedeCon
 
Kappa Architecture, IoT of the cars - LibreCon 2016
Kappa Architecture, IoT of the cars - LibreCon 2016Kappa Architecture, IoT of the cars - LibreCon 2016
Kappa Architecture, IoT of the cars - LibreCon 2016
LibreCon
 

Viewers also liked (20)

Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Intuit Analytics Cloud 101
Intuit Analytics Cloud 101Intuit Analytics Cloud 101
Intuit Analytics Cloud 101
 
Driving Innovation with Open Data
Driving Innovation with Open DataDriving Innovation with Open Data
Driving Innovation with Open Data
 
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
 
Floods of Twitter Data - StampedeCon 2016
Floods of Twitter Data - StampedeCon 2016Floods of Twitter Data - StampedeCon 2016
Floods of Twitter Data - StampedeCon 2016
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Node Architecture Implications for In-Memory Data Analytics on Scale-in ClustersNode Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
 
Visualizing Big Data – The Fundamentals
Visualizing Big Data – The FundamentalsVisualizing Big Data – The Fundamentals
Visualizing Big Data – The Fundamentals
 
ASPgems - kappa architecture
ASPgems - kappa architectureASPgems - kappa architecture
ASPgems - kappa architecture
 
Real time data ingestion and Hybrid Cloud
Real time data ingestion and Hybrid CloudReal time data ingestion and Hybrid Cloud
Real time data ingestion and Hybrid Cloud
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APIStreaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
 
Insurtech presentation
Insurtech presentationInsurtech presentation
Insurtech presentation
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
 
Voldemort : Prototype to Production
Voldemort : Prototype to ProductionVoldemort : Prototype to Production
Voldemort : Prototype to Production
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Interplay of Big Data and IoT - StampedeCon 2016
Interplay of Big Data and IoT - StampedeCon 2016Interplay of Big Data and IoT - StampedeCon 2016
Interplay of Big Data and IoT - StampedeCon 2016
 
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
 
Kappa Architecture, IoT of the cars - LibreCon 2016
Kappa Architecture, IoT of the cars - LibreCon 2016Kappa Architecture, IoT of the cars - LibreCon 2016
Kappa Architecture, IoT of the cars - LibreCon 2016
 

Similar to Innovation in the Data Warehouse - StampedeCon 2016

Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
N Masahiro
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
saipriyacoool
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
larsgeorge
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
chariorienit
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
Roorkee College of Engineering, Roorkee
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
arslanhaneef
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
sonukumar379092
 
Bigdata workshop february 2015
Bigdata workshop  february 2015 Bigdata workshop  february 2015
Bigdata workshop february 2015
clairvoyantllc
 
Piranha vs. mammoth predator appliances that chew up big data
Piranha vs. mammoth   predator appliances that chew up big dataPiranha vs. mammoth   predator appliances that chew up big data
Piranha vs. mammoth predator appliances that chew up big data
Jack (Yaakov) Bezalel
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
Simon Ambridge
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
Zohar Elkayam
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
Venneladonthireddy1
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
Ayyappan Paramesh
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Mike Percy
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
Sandeep Singh
 

Similar to Innovation in the Data Warehouse - StampedeCon 2016 (20)

Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Bigdata workshop february 2015
Bigdata workshop  february 2015 Bigdata workshop  february 2015
Bigdata workshop february 2015
 
Piranha vs. mammoth predator appliances that chew up big data
Piranha vs. mammoth   predator appliances that chew up big dataPiranha vs. mammoth   predator appliances that chew up big data
Piranha vs. mammoth predator appliances that chew up big data
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 

More from StampedeCon

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
StampedeCon
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
StampedeCon
 
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
StampedeCon
 
Resource Management in Impala - StampedeCon 2016
Resource Management in Impala - StampedeCon 2016Resource Management in Impala - StampedeCon 2016
Resource Management in Impala - StampedeCon 2016
StampedeCon
 
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
StampedeCon
 

More from StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
 
Resource Management in Impala - StampedeCon 2016
Resource Management in Impala - StampedeCon 2016Resource Management in Impala - StampedeCon 2016
Resource Management in Impala - StampedeCon 2016
 
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
 

Recently uploaded

Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 

Recently uploaded (20)

Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 

Innovation in the Data Warehouse - StampedeCon 2016

  • 1. Innovation in the Data Warehouse Kit Menke, Software Architect StampedeCon 2016 July 27, 2016
  • 2. Agenda ▪ Use Case ▪ Architectures ▪ Decision Points
  • 3. Enterprise Holdings, Inc. ▪ Our Business • 9 thousand locations • 80 countries • 93 thousand employees • 1.7 million vehicles ▪ Data Warehouse • Near capacity: Used about 75+ of 80 Terabytes • Streaming and batch data feeds from over 50 internal systems & external sources • 100+ databases and 22+ thousand tables • Around 1 billion queries executed per month • Over 45,000 reporting users with 5+ million report executions every month. • Statistical Modeling & Advanced Analytics - 40+ Projects Implemented for Predictive & Diagnostic Analytics
  • 6. Challenges – Current Platform ▪ System Capacity Constraints • Overall Current System Utilization is High • Space & CPU Constraints • Most of these challenges can be overcome by adding more Teradata capacity or doing augmentation ▪ Use Cases not good fit for Teradata EDW • Unstructured data • Source structures changing frequently • Data for exploration, discovery, & analytics • Staging, transient, & history data • These challenges can be overcome by augmentation ▪ Bottom-line: Improved agility & greater value
  • 7. Augmentation Recommendation: Hadoop ▪ Leverage Hadoop to complement Teradata EDW • Hybrid Approach ▪ The Hortonworks distribution of Hadoop • Compatibility/integration with Teradata EDW to achieve high degree of interoperability ▪ Intent is not to have a centralized Hadoop service • EDW Augmentation Only 7
  • 9. Architectures ▪ Data warehouse augmentation contains streaming and batch use cases ▪ Three Big Data architectures to explore: 1. Batch 2. Lambda 3. Kappa
  • 10. Batch
  • 11. Batch ▪ Land data into Hadoop first ▪ ETL in Hadoop to build reporting tables and publish to Teradata ▪ Archive old data from Teradata DB ▪ Data available for analysis in Hive ▪ Great for semi-structured data files ▪ But… too slow for streaming data
  • 13. Lambda ▪ Attempts to combine batch and streaming to get benefits from both ▪ Batch layer is comprehensive and accurate ▪ Streaming layer is fast but might only be able to keep recent data ▪ Potentially have to maintain two codebases
  • 14. Kappa
  • 15. Kappa ▪ Everything is a stream (no batch!) ▪ Depends largely on your log data store usually Kafka ▪ All raw data is stored in Kafka ▪ Much simpler architecture than lambda • New version? Re-deploy app and start reprocessing from the start and generate new output table • Once complete point app to new output table
  • 16. Choosing an Architecture ▪ Batch – process data in batches • All data processed in batches to create an output ▪ Lambda – split streaming data into batch and real-time • Stream processing for the data you need fast and the rest is batch processed ▪ Kappa – everything is a stream • All data is processed as a stream even when it needs to be reprocessed
  • 17. Implementing an Architecture ▪ Requirements for the use case drives architecture ▪ Walk through decision points 1. Cloud or on premises 2. Physical or virtual machines 3. Cluster workload ▪ Plus others!
  • 18. Cloud vs on premises ▪ Scalability • Much easier to scale a Cloud solution • Physical hardware requires an infrastructure team to manage ▪ Data source location (data gravity) / integration points • Cluster should be as close as possible to your data source • Cloud is good option for internet data sources ▪ Cloud offerings • Hadoop: Azure HDInsight, Amazon EMR, Google Cloud • Integration with other PaaS services ▪ Network • Bandwidth to/from cloud implementation
  • 19. Physical vs virtual ▪ Performance • Physical hardware will perform better, Hadoop is designed with physical hardware in mind ▪ Maintenance • No hardware to maintain for virtual servers ▪ Time to market • Virtual machines much faster to provision • For physical hardware if infrastructure team is a roadblock then appliance is good option instead of commodity ▪ Development and test environments make more sense to virtualize
  • 20. Workload ▪ Streaming • Running 24/7 • Need dedicated resources ▪ Batch • Scheduled • Periods of high utilization (scalability) ▪ Multi-Tenancy • Blended workloads • YARN (queues, node labels) • Think about Isolating nodes for real-time
  • 21. Other considerations ▪ Disaster recovery • Data is locally redundant • Backups not usually required unless you need geo-redundancy ▪ Security - Many different things to secure! • Kerberos for user, service, and host authentication • Authorization: Apache Ranger (Hortonworks) or Apache Sentry (Cloudera) or MapR Control System • Network isolation for Hadoop services • Data at rest (HFDS encryption) ▪ Hadoop Distribution - Race to include the most Apache projects • Top 3: Hortonworks, Cloudera, MapR • Big companies with Hadoop offering: – Teradata Hadoop aka TDH (Hortonworks, Cloudera, MapR) – Oracle Big Data Applicance (Cloudera)
  • 22. Spectrum of Options ▪ Cloud PaaS • No hardware or software to manage • Amazon S3, Azure Data Lake ▪ Cloud • Weird space between IaaS and PaaS • Amazon EMR • HDInsight is more PaaS ▪ Cloud IaaS • All virtual, no hardware to manage • You manage all software ▪ Third party hosted • Rackspace • Software managed by you ▪ Appliance • Infrastructure handled for you • Dell, HP, Cisco, Teradata, Oracle • Software (varies depending on vendor) ▪ Commodity • DIY
  • 23. Lessons Learned ▪ Workload isolation is hard • Multi-tenancy is possible • Takes work to make sure batch jobs don’t impact the real-time streaming processes ▪ Things we like: Hive, Hbase ▪ Things we don’t like: SOLR, debugging ▪ Debugging / development is hard • Lots of moving pieces • Logs spread out across many machines • Development environments require a lot of software • Distributed systems just work differently
  • 24. Questions? ▪ Hortonworks Community • https://community.hortonworks.com/answers/ index.html ▪ Kit Menke • @kitmenke on Twitter
  • 25. Resources ▪ Lambda Architecture • http://lambda-architecture.net/ ▪ Kappa Architecture • http://kappa-architecture.com/ ▪ Kappa Architecture - Our Experience by ASPgems • http://events.linuxfoundation.org/sites/events/files/slides/ ASPgems%20-%20Kappa%20Architecture.pdf ▪ Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - StampedeCon 2015 • http://www.slideshare.net/StampedeCon/apache-hadoop- yarn-multitenancy-capacity-scheduler-preemption- stampedecon-2015

Editor's Notes

  1. Explain our use case Expanding reporting windows and shrinking ETL windows