SlideShare a Scribd company logo
1 of 42
Realtime Analytics in Hadoop 
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Rommel Garcia – Solution Engineer 
October 10, 2014
Hadoop 
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop provides 
• Terabytes to Petabytes of storage on commodity hardware (HDFS) 
• Massive parallel computation on enormous amount of data (YARN) 
Hadoop is essentially a supercomputer for the masses! 
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDFS: Scalable, Reliable, Secure Storage Platform 
The Storage Platform for the Modern Data Architecture 
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
YARN: Data Operating System 
B A B A C A 
C A B C B B A C 
HDFS 
(Hadoop Distributed File System) 
Reliable 
Highly Available &Fault Tolerant 
Protects against data loss & 
corruption 
Cost Effective 
Horizontally scales on 
Commodity Hardware 
Secure 
Strong access controls, integrated 
with authentication mechanisms 
Granular data access controls to 
datasets across users and groups 
NFS 
Source/Dest 
ination 
REST 
RPC 
Source/Dest 
ination 
Source/Dest 
ination 
Standards 
Based Data 
Interfaces 
Ingest and store any data in any format 
Flexible read access enables a variety 
of work loads
Hadoop 1 
Single Use Data Platform 
Hive Pig 
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Batch 
HADOOP 1 
Mapreduce 
Redundant, Reliable Storage 
(HDFS) 
Java
2006 2009 
MR-279: YARN 
Hadoop w/ MapReduce 
MapReduce 
Largely Batch Processing 
1 ° ° ° ° ° 
HDFS 
(Hadoop Distributed File System) 
° ° ° ° ° N 
Hadoop2 & YARN based Architecture 
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
YARN: Data Operating System 
1 ° ° ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° ° 
° 
N ° 
HDFS 
(Hadoop Distributed File System) 
Silo’d clusters 
Largely batch system 
Difficult to integrate 
Hadoop 2 & YARN 
Batch Interactive Real-Time 
Enabled the 
Modern Data 
Architecture 
October 23, 2013
Hadoop 
Multi Use Data Platform 
Batch, Interactive, Realtime, Online, Streaming, … 
Management & Shared Services 
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
HADOOP 2 
Efficient Cluster Resource 
(YARN) 
Redundant, Reliable Storage 
(HDFS) 
Standard Query 
Processing 
Hive 
Batch 
MapReduce 
Online Data 
Processing 
Interactive 
Tez 
Real Time Stream 
Processing Others
Why Are Enterprises Using Hadoop? 
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Traditional systems under pressure 
DATA SYSTEM APPLICATIONS 
Business 
Analytics 
Custom 
Applications 
RDBMS EDW MPP 
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Packaged 
Applications 
• Silos of Data 
• Costly to Scale 
• Constrained Schemas 
Clickstream 
Geolocation 
Sentiment, Web Data 
Sensor, Machine Data (IoT) 
Unstructured docs, emails 
Server logs 
SOURCES 
Existing Sources 
(CRM, ERP,…) 
New Data Types 
…and difficult to 
manage new data
Hadoop 2 and YARN enable the Modern Data Architecture 
Batch Interactive Real-Time 
HDFS 
(Hadoop Distributed File System) 
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Common data set, multiple applications 
• Optionally land all data in a single cluster 
• Batch, interactive & real-time use cases 
• Support multi-tenant access, processing 
& segmentation of data 
YARN: Architectural center of Hadoop 
• Consistent security, governance & operations 
• Ecosystem applications run natively in Hadoop 
SOURCES 
EXISTING 
Systems 
Clickstream Web 
&Social 
Geolocation Sensor 
& Machine 
Server 
Logs 
Unstructured 
DATA SYSTEM APPLICATIONS 
Business 
Analytics 
Custom 
Applications 
Packaged 
Applications 
RDBMS EDW MPP YARN: Data Operating System 
1 ° ° ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° ° N
Real-Time Use Cases 
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Realtime Analytics in… 
$ 
• Fraud Detection/Prevention • Cell tower diagnostics • Proactive Maintenance 
Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
• Bandwidth Allocation 
• Brand Sentiment Analysis 
• Localized, Personalized 
Promotions 
Financial 
Services 
Retail Telecom Manufacturing 
Healthcare 
Utilities, 
Oil & Gas 
Public 
Sector 
• Monitor patient vitals 
• Patient care and safety 
• Reduce re-admittance rates 
• Smart meter stream 
analysis 
• Proactive equipment repair 
• Power and consumption 
matching 
• Network intrusion detection 
and prevention 
• Disease outbreak detection 
Transportation 
• Unsafe driving detection and 
monitoring
Truck Demo: Real-Time Analytics 
Problem: 
• The only way to measure “safe driving” is through accident 
occurences. 
• There’s no realtime accident prevention mechanism in place 
Solution: 
• Use Hadoop to analyze driving violations in real-time 
• Provide a UI to view to real-time violation alerts 
• Provide a dashboard to review violation reports 
Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Demo Time ! 
Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Truck Demo Real-Time Hadoop Architecture 
Truck Events 
High Speed Ingestion 
Message Queue 
Distributed Processing 
Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Kafka 
Storm 
Show Driving Report 
HDFS/Hive HBase 
(ActiveMQ) 
Solr 
(Reporting 
Dashboard) 
Real-Time 
Monitoring App 
Truck Event Data Alerts Violations 
Show
Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Q&A
Hadoop 2.0 
Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Rommel Garcia – Solution Engineer 
October 10, 2014
Hadoop 2 Becoming A Critical Platform 
Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop 2 delivers a comprehensive data management platform 
Hadoop 2 Platform 
Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Provision, 
Manage & 
Monitor 
Ambari 
Zookeeper 
Scheduling 
Oozie 
Data Workflow, 
Lifecycle & 
Governance 
Falcon 
Sqoop 
Flume 
NFS 
WebHDFS 
In-Memory 
Spark 
YARN: Data Operating System 
DATA MANAGEMENT 
SECURITY 
BATCH, INTERACTIVE & REAL-TIME 
DATA ACCESS 
GOVERNANCE 
& INTEGRATION 
Authentication 
Authorization 
Accounting 
Data Protection 
Storage: HDFS 
Resources: YARN 
Access: Hive, … 
Pipeline: Falcon 
Cluster: Knox 
OPERATIONS 
Script 
Pig 
Search 
Solr 
SQL 
Hive 
HCatalog 
NoSQL 
HBase 
Accumulo 
Stream 
Storm 
Others 
ISV 
Engines 
1 ° ° ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° ° ° 
° 
° 
N 
HDFS 
(Hadoop Distributed File System) 
Deployment Choice 
Linux Windows On- 
Premise 
Cloud 
YARN is the architectural 
center of Hadoop 2 
• Enables batch, interactive 
and real-time workloads 
• Single SQL engine for both batch 
and interactive 
• Enable existing ISV apps to plug 
directly into Hadoop via YARN 
Provides comprehensive 
enterprise capabilities 
• Governance 
• Security 
• Operations 
The widest range of 
deployment options 
• Linux & Windows 
• On premise & cloud 
Tez Tez
YARN – Roadmap 
Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN Development Framework 
API 
Engine 
System 
YARN : Data Operating System 
1 ° ° ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° ° 
Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
° 
° ° 
° ° ° ° ° ° ° 
° ° ° ° ° ° N 
HDFS 
(Hadoop Distributed File System) 
Batch 
MapReduce 
Real-Time 
Slider 
Direct 
Java 
.NET 
Scripting 
Pig 
SQL 
Hive 
Cascading 
Java 
Scala 
NoSQL 
HBase 
Accumulo 
Stream 
Storm 
Other 
ISV 
Other 
ISV 
Applications 
Others 
Spark 
Other ISV 
New New 
New New 
Tez Tez Tez Tez New
YARN General Store – The Future 
• A Data Lake that has a General Store to continually serve you…. 
– App Store – YARN Ready Applications 
– Data Store – Where do I get the interesting data…Weather, Geo, ..etc. 
– View Store – How do I get UI’s to the cluster 
– Processing Store – Falcon, Pig...etc. for “standard” data sets or common “processing 
patterns” 
Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Argus– Security 
Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Argus: Security needs are changing 
Administration 
Centrally management & 
consistent security 
Authentication 
Authenticate users and systems 
Authorization 
Provision access to data 
Audit 
Maintain a record of data access 
Data Protection 
Protect data at rest and in motion 
Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Security needs are changing 
• YARN unlocks the data lake 
• Multi-tenant: Multiple applications for data access 
• Changing and complex compliance environment 
• ETL of non-sensitive data can yield sensitive data 
Summer 2014 
65% of clusters host 
multiple workloads 
Fall 2013 
Largely silo’d deployments 
with single workload clusters 
5 areas of security focus
Security in Hadoop with HDP + Argus (XA Secure) 
Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Authorization 
Restrict access to 
explicit data 
Audit 
Understand who 
did what 
Data Protection 
Encrypt data at 
rest & in motion 
• Kerberos in native 
Apache Hadoop 
• HTTP/REST API 
Secured with 
Apache Knox 
Gateway 
• HDFS Permissions, HDFS ACL, 
• Audit logs in with HDFS & MR 
• Hive ATZ-NG 
Authentication 
Who am I/prove it? 
• Wire encryption 
in Hadoop 
• Open Source 
Initiatives 
• Partner 
Solutions 
• HDFS, Hive and 
Hbase 
• Fine grain 
access control 
• RBAC 
• Centralized 
audit reporting 
• Policy and 
access history 
• Future 
Integration 
Argus Hadoop 2 
Centralized Security Administration 
• As-Is, works with 
current 
authentication 
methods
Hive– SQL In Hadoop & Roadmap 
Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hive: The De-Facto SQL Interface for Hadoop 
Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Page 27
Data Abstractions in Hive 
Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Partitions, buckets and skews facilitate 
faster, more direct data access. Cube, windowing, aggregation 
functions supported as well 
Page 28 
Database 
Table Table 
Partition Partition Partition 
Bucket 
Bucket 
Bucket 
Optional Per Table 
Unskewed Keys Skewed Keys
Stinger.Next - Roadmap 
Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Stinger.Next – Release Cycle 
Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hive Demo Using DBVisualizer or Excel? 
Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Falcon– Data Governance 
Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Data Pipeline Tracing 
Data pipeline 
dependencies 
Customer 
feed 
Purchase 
feed 
Product 
feed 
Store 
feed 
View dependencies 
between clusters, datasets 
and processes 
Data pipeline 
tagging 
Sensitive encrypted 
Add arbitrary tags to 
feeds & processes 
Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Credit 
feed 
Data pipeline 
audits 
Know who modified a 
dataset when and into 
what 
Data pipeline 
lineage 
File- 
1 
File- 
2 
File- 
3 
Analyze how a dataset 
reached a particular 
state
Example: Multi-Cluster Replication 
Primary Hadoop Cluster 
Raw Data 
Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Presented 
Data 
Cleansed 
Data 
Conformed 
Data 
Staged Data 
Presented 
Data 
Replication 
Failover Hadoop Cluster 
Replication 
Bi and Analytic Applications 
• Falcon manages workflow and replication 
• Enables business continuity without requiring full data reprocessing 
• Failover clusters can be smaller than primary clusters 
..and many more
Example: Retention 
Staged Data 
Retention 
Policy 
Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Presented 
Data 
Cleansed 
Data 
Conformed 
Data 
Retain 5 
Years 
Retain Last 
Copy Only 
Retain 3 
Years 
Retain 3 
Years 
• Sophisticated retention policies expressed in one place 
• Simplify data retention for audit, compliance, or for data re-processing
Ambari – Hadoop Cluster Monitoring 
Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Ambari Dashboard 
Page 37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Ambari 2H 2014 
1.7.0 (September) 1.8.0 (October) 2.0.0 (December) 
Features 
• Config versioning + history 
• Config <final> Properties 
• Flume Support 
• Ubuntu Support 
• ResourceManager HA 
• HDFS Rebalance 
• Ambari Views Framework 
• Slider Support 
Tech Preview 
• Windows Support 
• Ambari Shell 
Page 38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Features 
• ServiceX on YARN via Slider 
• Log Access + Search 
• Rack Awareness 
• Simplified Kerberos Setup 
• NameNode SafeMode 
• Ambari Shell GA 
Features 
• Automated Rolling Upgrades 
• Oozie HA 
• Ambari Alerts 
• Ambari Metrics 
• Windows Support GA
Hadoop 2 Deployment Options 
Page 39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Efficient Data Lakes can Span to the Cloud 
On-Premises Cloud 
Page 40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
HDP on Windows 
HDP on Linux 
Your deployment of Hadoop 
hosted as a VM in Azure 
HDP on Windows 
HDP on Linux 
Full control of HW and 
software configs 
1 2 
Analytics Platform System 
Turnkey Hadoop and 
relational warehouse appliance 
HDInsight 
Managed Hadoop Service 
Built on Azure storage 
3 4 
Enjoy cross-platform interoperability based on 100% open source HDP
Page 41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Q&A
Thank You! 
Rommel Garcia – Solution Engineer 
Twitter: @rommelgarcia 
LinkedIn: /rommelgarcia 
Page 42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

More Related Content

What's hot

Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep DiveHortonworks
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...DataWorks Summit
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopHortonworks
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentDataWorks Summit/Hadoop Summit
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in HadoopRommel Garcia
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)DataWorks Summit
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Cloudera, Inc.
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseDataWorks Summit/Hadoop Summit
 
2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_finalAdam Muise
 
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleManaging Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleDataWorks Summit/Hadoop Summit
 
Deploying Docker applications on YARN via Slider
Deploying Docker applications on YARN via SliderDeploying Docker applications on YARN via Slider
Deploying Docker applications on YARN via SliderHortonworks
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsDataWorks Summit/Hadoop Summit
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureDataWorks Summit
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...DataWorks Summit
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnhdhappy001
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoophadooparchbook
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...StampedeCon
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive DataWorks Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 

What's hot (20)

Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service Deployment
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in Hadoop
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
 
2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final
 
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleManaging Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
 
Deploying Docker applications on YARN via Slider
Deploying Docker applications on YARN via SliderDeploying Docker applications on YARN via Slider
Deploying Docker applications on YARN via Slider
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 

Similar to Realtime analytics + hadoop 2.0

Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - finalHortonworks
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Hortonworks
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopPOSSCON
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceHortonworks
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalHortonworks
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014Hortonworks
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Hortonworks
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextHortonworks
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Hortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Hortonworks
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchHortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
 

Similar to Realtime analytics + hadoop 2.0 (20)

Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - final
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 

More from Rommel Garcia

The of Operational Analytics Data Store
The of Operational Analytics Data StoreThe of Operational Analytics Data Store
The of Operational Analytics Data StoreRommel Garcia
 
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Rommel Garcia
 
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.Rommel Garcia
 
GPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersGPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersRommel Garcia
 
PCI Compliane With Hadoop
PCI Compliane With HadoopPCI Compliane With Hadoop
PCI Compliane With HadoopRommel Garcia
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataRommel Garcia
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoopRommel Garcia
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupRommel Garcia
 

More from Rommel Garcia (12)

The of Operational Analytics Data Store
The of Operational Analytics Data StoreThe of Operational Analytics Data Store
The of Operational Analytics Data Store
 
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
 
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
 
GPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersGPU 101: The Beast In Data Centers
GPU 101: The Beast In Data Centers
 
PCI Compliane With Hadoop
PCI Compliane With HadoopPCI Compliane With Hadoop
PCI Compliane With Hadoop
 
Virtualizing Hadoop
Virtualizing HadoopVirtualizing Hadoop
Virtualizing Hadoop
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Hadoop Meets Scrum
Hadoop Meets ScrumHadoop Meets Scrum
Hadoop Meets Scrum
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoop
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User Group
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 

Recently uploaded

20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 

Recently uploaded (20)

20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 

Realtime analytics + hadoop 2.0

  • 1. Realtime Analytics in Hadoop Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Rommel Garcia – Solution Engineer October 10, 2014
  • 2. Hadoop Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 3. Hadoop provides • Terabytes to Petabytes of storage on commodity hardware (HDFS) • Massive parallel computation on enormous amount of data (YARN) Hadoop is essentially a supercomputer for the masses! Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 4. HDFS: Scalable, Reliable, Secure Storage Platform The Storage Platform for the Modern Data Architecture Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN: Data Operating System B A B A C A C A B C B B A C HDFS (Hadoop Distributed File System) Reliable Highly Available &Fault Tolerant Protects against data loss & corruption Cost Effective Horizontally scales on Commodity Hardware Secure Strong access controls, integrated with authentication mechanisms Granular data access controls to datasets across users and groups NFS Source/Dest ination REST RPC Source/Dest ination Source/Dest ination Standards Based Data Interfaces Ingest and store any data in any format Flexible read access enables a variety of work loads
  • 5. Hadoop 1 Single Use Data Platform Hive Pig Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Batch HADOOP 1 Mapreduce Redundant, Reliable Storage (HDFS) Java
  • 6. 2006 2009 MR-279: YARN Hadoop w/ MapReduce MapReduce Largely Batch Processing 1 ° ° ° ° ° HDFS (Hadoop Distributed File System) ° ° ° ° ° N Hadoop2 & YARN based Architecture Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N ° HDFS (Hadoop Distributed File System) Silo’d clusters Largely batch system Difficult to integrate Hadoop 2 & YARN Batch Interactive Real-Time Enabled the Modern Data Architecture October 23, 2013
  • 7. Hadoop Multi Use Data Platform Batch, Interactive, Realtime, Online, Streaming, … Management & Shared Services Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HADOOP 2 Efficient Cluster Resource (YARN) Redundant, Reliable Storage (HDFS) Standard Query Processing Hive Batch MapReduce Online Data Processing Interactive Tez Real Time Stream Processing Others
  • 8. Why Are Enterprises Using Hadoop? Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 9. Traditional systems under pressure DATA SYSTEM APPLICATIONS Business Analytics Custom Applications RDBMS EDW MPP Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Packaged Applications • Silos of Data • Costly to Scale • Constrained Schemas Clickstream Geolocation Sentiment, Web Data Sensor, Machine Data (IoT) Unstructured docs, emails Server logs SOURCES Existing Sources (CRM, ERP,…) New Data Types …and difficult to manage new data
  • 10. Hadoop 2 and YARN enable the Modern Data Architecture Batch Interactive Real-Time HDFS (Hadoop Distributed File System) Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Common data set, multiple applications • Optionally land all data in a single cluster • Batch, interactive & real-time use cases • Support multi-tenant access, processing & segmentation of data YARN: Architectural center of Hadoop • Consistent security, governance & operations • Ecosystem applications run natively in Hadoop SOURCES EXISTING Systems Clickstream Web &Social Geolocation Sensor & Machine Server Logs Unstructured DATA SYSTEM APPLICATIONS Business Analytics Custom Applications Packaged Applications RDBMS EDW MPP YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N
  • 11. Real-Time Use Cases Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 12. Realtime Analytics in… $ • Fraud Detection/Prevention • Cell tower diagnostics • Proactive Maintenance Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved • Bandwidth Allocation • Brand Sentiment Analysis • Localized, Personalized Promotions Financial Services Retail Telecom Manufacturing Healthcare Utilities, Oil & Gas Public Sector • Monitor patient vitals • Patient care and safety • Reduce re-admittance rates • Smart meter stream analysis • Proactive equipment repair • Power and consumption matching • Network intrusion detection and prevention • Disease outbreak detection Transportation • Unsafe driving detection and monitoring
  • 13. Truck Demo: Real-Time Analytics Problem: • The only way to measure “safe driving” is through accident occurences. • There’s no realtime accident prevention mechanism in place Solution: • Use Hadoop to analyze driving violations in real-time • Provide a UI to view to real-time violation alerts • Provide a dashboard to review violation reports Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 14. Demo Time ! Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 15. Truck Demo Real-Time Hadoop Architecture Truck Events High Speed Ingestion Message Queue Distributed Processing Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Kafka Storm Show Driving Report HDFS/Hive HBase (ActiveMQ) Solr (Reporting Dashboard) Real-Time Monitoring App Truck Event Data Alerts Violations Show
  • 16. Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Q&A
  • 17. Hadoop 2.0 Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Rommel Garcia – Solution Engineer October 10, 2014
  • 18. Hadoop 2 Becoming A Critical Platform Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 19. Hadoop 2 delivers a comprehensive data management platform Hadoop 2 Platform Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Data Workflow, Lifecycle & Governance Falcon Sqoop Flume NFS WebHDFS In-Memory Spark YARN: Data Operating System DATA MANAGEMENT SECURITY BATCH, INTERACTIVE & REAL-TIME DATA ACCESS GOVERNANCE & INTEGRATION Authentication Authorization Accounting Data Protection Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox OPERATIONS Script Pig Search Solr SQL Hive HCatalog NoSQL HBase Accumulo Stream Storm Others ISV Engines 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS (Hadoop Distributed File System) Deployment Choice Linux Windows On- Premise Cloud YARN is the architectural center of Hadoop 2 • Enables batch, interactive and real-time workloads • Single SQL engine for both batch and interactive • Enable existing ISV apps to plug directly into Hadoop via YARN Provides comprehensive enterprise capabilities • Governance • Security • Operations The widest range of deployment options • Linux & Windows • On premise & cloud Tez Tez
  • 20. YARN – Roadmap Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 21. YARN Development Framework API Engine System YARN : Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS (Hadoop Distributed File System) Batch MapReduce Real-Time Slider Direct Java .NET Scripting Pig SQL Hive Cascading Java Scala NoSQL HBase Accumulo Stream Storm Other ISV Other ISV Applications Others Spark Other ISV New New New New Tez Tez Tez Tez New
  • 22. YARN General Store – The Future • A Data Lake that has a General Store to continually serve you…. – App Store – YARN Ready Applications – Data Store – Where do I get the interesting data…Weather, Geo, ..etc. – View Store – How do I get UI’s to the cluster – Processing Store – Falcon, Pig...etc. for “standard” data sets or common “processing patterns” Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 23. Argus– Security Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 24. Argus: Security needs are changing Administration Centrally management & consistent security Authentication Authenticate users and systems Authorization Provision access to data Audit Maintain a record of data access Data Protection Protect data at rest and in motion Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Security needs are changing • YARN unlocks the data lake • Multi-tenant: Multiple applications for data access • Changing and complex compliance environment • ETL of non-sensitive data can yield sensitive data Summer 2014 65% of clusters host multiple workloads Fall 2013 Largely silo’d deployments with single workload clusters 5 areas of security focus
  • 25. Security in Hadoop with HDP + Argus (XA Secure) Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & in motion • Kerberos in native Apache Hadoop • HTTP/REST API Secured with Apache Knox Gateway • HDFS Permissions, HDFS ACL, • Audit logs in with HDFS & MR • Hive ATZ-NG Authentication Who am I/prove it? • Wire encryption in Hadoop • Open Source Initiatives • Partner Solutions • HDFS, Hive and Hbase • Fine grain access control • RBAC • Centralized audit reporting • Policy and access history • Future Integration Argus Hadoop 2 Centralized Security Administration • As-Is, works with current authentication methods
  • 26. Hive– SQL In Hadoop & Roadmap Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 27. Hive: The De-Facto SQL Interface for Hadoop Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Page 27
  • 28. Data Abstractions in Hive Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Partitions, buckets and skews facilitate faster, more direct data access. Cube, windowing, aggregation functions supported as well Page 28 Database Table Table Partition Partition Partition Bucket Bucket Bucket Optional Per Table Unskewed Keys Skewed Keys
  • 29. Stinger.Next - Roadmap Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 30. Stinger.Next – Release Cycle Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 31. Hive Demo Using DBVisualizer or Excel? Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 32. Falcon– Data Governance Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 33. Data Pipeline Tracing Data pipeline dependencies Customer feed Purchase feed Product feed Store feed View dependencies between clusters, datasets and processes Data pipeline tagging Sensitive encrypted Add arbitrary tags to feeds & processes Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Credit feed Data pipeline audits Know who modified a dataset when and into what Data pipeline lineage File- 1 File- 2 File- 3 Analyze how a dataset reached a particular state
  • 34. Example: Multi-Cluster Replication Primary Hadoop Cluster Raw Data Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Presented Data Cleansed Data Conformed Data Staged Data Presented Data Replication Failover Hadoop Cluster Replication Bi and Analytic Applications • Falcon manages workflow and replication • Enables business continuity without requiring full data reprocessing • Failover clusters can be smaller than primary clusters ..and many more
  • 35. Example: Retention Staged Data Retention Policy Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Presented Data Cleansed Data Conformed Data Retain 5 Years Retain Last Copy Only Retain 3 Years Retain 3 Years • Sophisticated retention policies expressed in one place • Simplify data retention for audit, compliance, or for data re-processing
  • 36. Ambari – Hadoop Cluster Monitoring Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 37. Ambari Dashboard Page 37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 38. Ambari 2H 2014 1.7.0 (September) 1.8.0 (October) 2.0.0 (December) Features • Config versioning + history • Config <final> Properties • Flume Support • Ubuntu Support • ResourceManager HA • HDFS Rebalance • Ambari Views Framework • Slider Support Tech Preview • Windows Support • Ambari Shell Page 38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Features • ServiceX on YARN via Slider • Log Access + Search • Rack Awareness • Simplified Kerberos Setup • NameNode SafeMode • Ambari Shell GA Features • Automated Rolling Upgrades • Oozie HA • Ambari Alerts • Ambari Metrics • Windows Support GA
  • 39. Hadoop 2 Deployment Options Page 39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 40. Efficient Data Lakes can Span to the Cloud On-Premises Cloud Page 40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDP on Windows HDP on Linux Your deployment of Hadoop hosted as a VM in Azure HDP on Windows HDP on Linux Full control of HW and software configs 1 2 Analytics Platform System Turnkey Hadoop and relational warehouse appliance HDInsight Managed Hadoop Service Built on Azure storage 3 4 Enjoy cross-platform interoperability based on 100% open source HDP
  • 41. Page 41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Q&A
  • 42. Thank You! Rommel Garcia – Solution Engineer Twitter: @rommelgarcia LinkedIn: /rommelgarcia Page 42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Editor's Notes

  1. So, where does Hadoop fit in the data center? This picture here is a very simple depiction of the typical data architecture in any organization.   - There are sources of data: ERP, CRM, other digital sources - That data is then stored in a data system: a data warehouse, MPP system, etc - Then an application of some kind accesses that data system: a packaged application such as Excel or Tableau, a custom application written by a developer, or even another business application   This has been the foundation of the data center for years. We have had some challenges with this architecture all along, however, we are seeing increased pressure to modify and improve this basic blueprint because A) this approach created silos of data and it was difficult to either share the data or get a single view of it B) these systems are costly to scale C) and they are also coupled to a very static schema. Changes to a data model are difficult if not imnpossible. This limits flexibility and iniight.   Finally, the emergence of NEW types of data as we digitize the world around us such as clickstream, machine sensor, etc, are growing at exponential rates. We are all becoming data driven organizations.   In fact that sheer volume of data is to grow 20X between 2013 and 2020 – and which puts tremendous pressure on this architecture. The old architecture is neither technologically nor commercially practical.
  2. YARN is relatively the element that enables the modern data architecture as it turns hadoop into a truly multi-purpose data platform with batch, interactive and real time workloads all running in a single cluster..   It enables users to: - Create a central cluster into which data can be stored and then accessed it using a range of processing engines: batch, interactive, real-time. - It is akin to the journey with virtualization: from a single virtual server to a pool of virtual infrastructure.   It is the architectural center of Hadoop - it provides the data operating system around which the core enterprise capabilities of security, governance and operations can be integrated - It is the integration point into which all data processing engines integrate – from the open source community but also from the commercial vendor ecosystem