SlideShare a Scribd company logo
1 of 42
Realtime Analytics in Hadoop 
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Rommel Garcia – Solution Engineer 
October 10, 2014
Hadoop 
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop provides 
• Terabytes to Petabytes of storage on commodity hardware (HDFS) 
• Massive parallel computation on enormous amount of data (YARN) 
Hadoop is essentially a supercomputer for the masses! 
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDFS: Scalable, Reliable, Secure Storage Platform 
The Storage Platform for the Modern Data Architecture 
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
YARN: Data Operating System 
B A B A C A 
C A B C B B A C 
HDFS 
(Hadoop Distributed File System) 
Reliable 
Highly Available &Fault Tolerant 
Protects against data loss & 
corruption 
Cost Effective 
Horizontally scales on 
Commodity Hardware 
Secure 
Strong access controls, integrated 
with authentication mechanisms 
Granular data access controls to 
datasets across users and groups 
NFS 
Source/Dest 
ination 
REST 
RPC 
Source/Dest 
ination 
Source/Dest 
ination 
Standards 
Based Data 
Interfaces 
Ingest and store any data in any format 
Flexible read access enables a variety 
of work loads
Hadoop 1 
Single Use Data Platform 
Hive Pig 
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Batch 
HADOOP 1 
Mapreduce 
Redundant, Reliable Storage 
(HDFS) 
Java
2006 2009 
MR-279: YARN 
Hadoop w/ MapReduce 
MapReduce 
Largely Batch Processing 
1 ° ° ° ° ° 
HDFS 
(Hadoop Distributed File System) 
° ° ° ° ° N 
Hadoop2 & YARN based Architecture 
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
YARN: Data Operating System 
1 ° ° ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° ° 
° 
N ° 
HDFS 
(Hadoop Distributed File System) 
Silo’d clusters 
Largely batch system 
Difficult to integrate 
Hadoop 2 & YARN 
Batch Interactive Real-Time 
Enabled the 
Modern Data 
Architecture 
October 23, 2013
Hadoop 
Multi Use Data Platform 
Batch, Interactive, Realtime, Online, Streaming, … 
Management & Shared Services 
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
HADOOP 2 
Efficient Cluster Resource 
(YARN) 
Redundant, Reliable Storage 
(HDFS) 
Standard Query 
Processing 
Hive 
Batch 
MapReduce 
Online Data 
Processing 
Interactive 
Tez 
Real Time Stream 
Processing Others
Why Are Enterprises Using Hadoop? 
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Traditional systems under pressure 
DATA SYSTEM APPLICATIONS 
Business 
Analytics 
Custom 
Applications 
RDBMS EDW MPP 
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Packaged 
Applications 
• Silos of Data 
• Costly to Scale 
• Constrained Schemas 
Clickstream 
Geolocation 
Sentiment, Web Data 
Sensor, Machine Data (IoT) 
Unstructured docs, emails 
Server logs 
SOURCES 
Existing Sources 
(CRM, ERP,…) 
New Data Types 
…and difficult to 
manage new data
Hadoop 2 and YARN enable the Modern Data Architecture 
Batch Interactive Real-Time 
HDFS 
(Hadoop Distributed File System) 
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Common data set, multiple applications 
• Optionally land all data in a single cluster 
• Batch, interactive & real-time use cases 
• Support multi-tenant access, processing 
& segmentation of data 
YARN: Architectural center of Hadoop 
• Consistent security, governance & operations 
• Ecosystem applications run natively in Hadoop 
SOURCES 
EXISTING 
Systems 
Clickstream Web 
&Social 
Geolocation Sensor 
& Machine 
Server 
Logs 
Unstructured 
DATA SYSTEM APPLICATIONS 
Business 
Analytics 
Custom 
Applications 
Packaged 
Applications 
RDBMS EDW MPP YARN: Data Operating System 
1 ° ° ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° ° N
Real-Time Use Cases 
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Realtime Analytics in… 
$ 
• Fraud Detection/Prevention • Cell tower diagnostics • Proactive Maintenance 
Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
• Bandwidth Allocation 
• Brand Sentiment Analysis 
• Localized, Personalized 
Promotions 
Financial 
Services 
Retail Telecom Manufacturing 
Healthcare 
Utilities, 
Oil & Gas 
Public 
Sector 
• Monitor patient vitals 
• Patient care and safety 
• Reduce re-admittance rates 
• Smart meter stream 
analysis 
• Proactive equipment repair 
• Power and consumption 
matching 
• Network intrusion detection 
and prevention 
• Disease outbreak detection 
Transportation 
• Unsafe driving detection and 
monitoring
Truck Demo: Real-Time Analytics 
Problem: 
• The only way to measure “safe driving” is through accident 
occurences. 
• There’s no realtime accident prevention mechanism in place 
Solution: 
• Use Hadoop to analyze driving violations in real-time 
• Provide a UI to view to real-time violation alerts 
• Provide a dashboard to review violation reports 
Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Demo Time ! 
Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Truck Demo Real-Time Hadoop Architecture 
Truck Events 
High Speed Ingestion 
Message Queue 
Distributed Processing 
Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Kafka 
Storm 
Show Driving Report 
HDFS/Hive HBase 
(ActiveMQ) 
Solr 
(Reporting 
Dashboard) 
Real-Time 
Monitoring App 
Truck Event Data Alerts Violations 
Show
Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Q&A
Hadoop 2.0 
Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Rommel Garcia – Solution Engineer 
October 10, 2014
Hadoop 2 Becoming A Critical Platform 
Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop 2 delivers a comprehensive data management platform 
Hadoop 2 Platform 
Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Provision, 
Manage & 
Monitor 
Ambari 
Zookeeper 
Scheduling 
Oozie 
Data Workflow, 
Lifecycle & 
Governance 
Falcon 
Sqoop 
Flume 
NFS 
WebHDFS 
In-Memory 
Spark 
YARN: Data Operating System 
DATA MANAGEMENT 
SECURITY 
BATCH, INTERACTIVE & REAL-TIME 
DATA ACCESS 
GOVERNANCE 
& INTEGRATION 
Authentication 
Authorization 
Accounting 
Data Protection 
Storage: HDFS 
Resources: YARN 
Access: Hive, … 
Pipeline: Falcon 
Cluster: Knox 
OPERATIONS 
Script 
Pig 
Search 
Solr 
SQL 
Hive 
HCatalog 
NoSQL 
HBase 
Accumulo 
Stream 
Storm 
Others 
ISV 
Engines 
1 ° ° ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° ° ° 
° 
° 
N 
HDFS 
(Hadoop Distributed File System) 
Deployment Choice 
Linux Windows On- 
Premise 
Cloud 
YARN is the architectural 
center of Hadoop 2 
• Enables batch, interactive 
and real-time workloads 
• Single SQL engine for both batch 
and interactive 
• Enable existing ISV apps to plug 
directly into Hadoop via YARN 
Provides comprehensive 
enterprise capabilities 
• Governance 
• Security 
• Operations 
The widest range of 
deployment options 
• Linux & Windows 
• On premise & cloud 
Tez Tez
YARN – Roadmap 
Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN Development Framework 
API 
Engine 
System 
YARN : Data Operating System 
1 ° ° ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° ° 
Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
° 
° ° 
° ° ° ° ° ° ° 
° ° ° ° ° ° N 
HDFS 
(Hadoop Distributed File System) 
Batch 
MapReduce 
Real-Time 
Slider 
Direct 
Java 
.NET 
Scripting 
Pig 
SQL 
Hive 
Cascading 
Java 
Scala 
NoSQL 
HBase 
Accumulo 
Stream 
Storm 
Other 
ISV 
Other 
ISV 
Applications 
Others 
Spark 
Other ISV 
New New 
New New 
Tez Tez Tez Tez New
YARN General Store – The Future 
• A Data Lake that has a General Store to continually serve you…. 
– App Store – YARN Ready Applications 
– Data Store – Where do I get the interesting data…Weather, Geo, ..etc. 
– View Store – How do I get UI’s to the cluster 
– Processing Store – Falcon, Pig...etc. for “standard” data sets or common “processing 
patterns” 
Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Argus– Security 
Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Argus: Security needs are changing 
Administration 
Centrally management & 
consistent security 
Authentication 
Authenticate users and systems 
Authorization 
Provision access to data 
Audit 
Maintain a record of data access 
Data Protection 
Protect data at rest and in motion 
Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Security needs are changing 
• YARN unlocks the data lake 
• Multi-tenant: Multiple applications for data access 
• Changing and complex compliance environment 
• ETL of non-sensitive data can yield sensitive data 
Summer 2014 
65% of clusters host 
multiple workloads 
Fall 2013 
Largely silo’d deployments 
with single workload clusters 
5 areas of security focus
Security in Hadoop with HDP + Argus (XA Secure) 
Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Authorization 
Restrict access to 
explicit data 
Audit 
Understand who 
did what 
Data Protection 
Encrypt data at 
rest & in motion 
• Kerberos in native 
Apache Hadoop 
• HTTP/REST API 
Secured with 
Apache Knox 
Gateway 
• HDFS Permissions, HDFS ACL, 
• Audit logs in with HDFS & MR 
• Hive ATZ-NG 
Authentication 
Who am I/prove it? 
• Wire encryption 
in Hadoop 
• Open Source 
Initiatives 
• Partner 
Solutions 
• HDFS, Hive and 
Hbase 
• Fine grain 
access control 
• RBAC 
• Centralized 
audit reporting 
• Policy and 
access history 
• Future 
Integration 
Argus Hadoop 2 
Centralized Security Administration 
• As-Is, works with 
current 
authentication 
methods
Hive– SQL In Hadoop & Roadmap 
Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hive: The De-Facto SQL Interface for Hadoop 
Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Page 27
Data Abstractions in Hive 
Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Partitions, buckets and skews facilitate 
faster, more direct data access. Cube, windowing, aggregation 
functions supported as well 
Page 28 
Database 
Table Table 
Partition Partition Partition 
Bucket 
Bucket 
Bucket 
Optional Per Table 
Unskewed Keys Skewed Keys
Stinger.Next - Roadmap 
Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Stinger.Next – Release Cycle 
Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hive Demo Using DBVisualizer or Excel? 
Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Falcon– Data Governance 
Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Data Pipeline Tracing 
Data pipeline 
dependencies 
Customer 
feed 
Purchase 
feed 
Product 
feed 
Store 
feed 
View dependencies 
between clusters, datasets 
and processes 
Data pipeline 
tagging 
Sensitive encrypted 
Add arbitrary tags to 
feeds & processes 
Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Credit 
feed 
Data pipeline 
audits 
Know who modified a 
dataset when and into 
what 
Data pipeline 
lineage 
File- 
1 
File- 
2 
File- 
3 
Analyze how a dataset 
reached a particular 
state
Example: Multi-Cluster Replication 
Primary Hadoop Cluster 
Raw Data 
Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Presented 
Data 
Cleansed 
Data 
Conformed 
Data 
Staged Data 
Presented 
Data 
Replication 
Failover Hadoop Cluster 
Replication 
Bi and Analytic Applications 
• Falcon manages workflow and replication 
• Enables business continuity without requiring full data reprocessing 
• Failover clusters can be smaller than primary clusters 
..and many more
Example: Retention 
Staged Data 
Retention 
Policy 
Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Presented 
Data 
Cleansed 
Data 
Conformed 
Data 
Retain 5 
Years 
Retain Last 
Copy Only 
Retain 3 
Years 
Retain 3 
Years 
• Sophisticated retention policies expressed in one place 
• Simplify data retention for audit, compliance, or for data re-processing
Ambari – Hadoop Cluster Monitoring 
Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Ambari Dashboard 
Page 37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Ambari 2H 2014 
1.7.0 (September) 1.8.0 (October) 2.0.0 (December) 
Features 
• Config versioning + history 
• Config <final> Properties 
• Flume Support 
• Ubuntu Support 
• ResourceManager HA 
• HDFS Rebalance 
• Ambari Views Framework 
• Slider Support 
Tech Preview 
• Windows Support 
• Ambari Shell 
Page 38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Features 
• ServiceX on YARN via Slider 
• Log Access + Search 
• Rack Awareness 
• Simplified Kerberos Setup 
• NameNode SafeMode 
• Ambari Shell GA 
Features 
• Automated Rolling Upgrades 
• Oozie HA 
• Ambari Alerts 
• Ambari Metrics 
• Windows Support GA
Hadoop 2 Deployment Options 
Page 39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Efficient Data Lakes can Span to the Cloud 
On-Premises Cloud 
Page 40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
HDP on Windows 
HDP on Linux 
Your deployment of Hadoop 
hosted as a VM in Azure 
HDP on Windows 
HDP on Linux 
Full control of HW and 
software configs 
1 2 
Analytics Platform System 
Turnkey Hadoop and 
relational warehouse appliance 
HDInsight 
Managed Hadoop Service 
Built on Azure storage 
3 4 
Enjoy cross-platform interoperability based on 100% open source HDP
Page 41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Q&A
Thank You! 
Rommel Garcia – Solution Engineer 
Twitter: @rommelgarcia 
LinkedIn: /rommelgarcia 
Page 42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

More Related Content

What's hot

Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopPOSSCON
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveHortonworks
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Hortonworks
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramHortonworks
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopHortonworks
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopHortonworks
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalHortonworks
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Hortonworks
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchHortonworks
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun ConnollyHortonworks
 

What's hot (20)

Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun Connolly
 

Viewers also liked

Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Rommel Garcia
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkGuido Schmutz
 
Hadoop 2.0 - The Next Level
Hadoop 2.0 - The Next LevelHadoop 2.0 - The Next Level
Hadoop 2.0 - The Next LevelSascha Dittmann
 
SAS Forum Switzerland 2015: Big Data - Guido Oswald
SAS Forum Switzerland 2015: Big Data - Guido OswaldSAS Forum Switzerland 2015: Big Data - Guido Oswald
SAS Forum Switzerland 2015: Big Data - Guido OswaldGuido Oswald
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
Hadoop Einführung @codecentric
Hadoop Einführung @codecentricHadoop Einführung @codecentric
Hadoop Einführung @codecentricimalik8088
 
Wer gewinnt das SQL-Rennen auf der Hadoop-Strecke?
Wer gewinnt das SQL-Rennen auf der Hadoop-Strecke?Wer gewinnt das SQL-Rennen auf der Hadoop-Strecke?
Wer gewinnt das SQL-Rennen auf der Hadoop-Strecke?inovex GmbH
 
Realtime BigData Step by Step mit Lambda, Kafka, Storm und Hadoop
Realtime BigData Step by Step mit Lambda, Kafka, Storm und HadoopRealtime BigData Step by Step mit Lambda, Kafka, Storm und Hadoop
Realtime BigData Step by Step mit Lambda, Kafka, Storm und HadoopValentin Zacharias
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
Microsoft on Big Data
Microsoft on Big DataMicrosoft on Big Data
Microsoft on Big DataYvette Teiken
 
Big Data Bullshit Bingo
Big Data Bullshit BingoBig Data Bullshit Bingo
Big Data Bullshit BingoDanny Linden
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Impetus Technologies
 
Real-time Market Basket Analysis for Retail with Hadoop
Real-time Market Basket Analysis for Retail with HadoopReal-time Market Basket Analysis for Retail with Hadoop
Real-time Market Basket Analysis for Retail with HadoopDataWorks Summit
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyNati Shalom
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWRpasalapudi
 

Viewers also liked (20)

Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
 
Hadoop 2.0 - The Next Level
Hadoop 2.0 - The Next LevelHadoop 2.0 - The Next Level
Hadoop 2.0 - The Next Level
 
SAS Forum Switzerland 2015: Big Data - Guido Oswald
SAS Forum Switzerland 2015: Big Data - Guido OswaldSAS Forum Switzerland 2015: Big Data - Guido Oswald
SAS Forum Switzerland 2015: Big Data - Guido Oswald
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Hadoop Einführung @codecentric
Hadoop Einführung @codecentricHadoop Einführung @codecentric
Hadoop Einführung @codecentric
 
Wer gewinnt das SQL-Rennen auf der Hadoop-Strecke?
Wer gewinnt das SQL-Rennen auf der Hadoop-Strecke?Wer gewinnt das SQL-Rennen auf der Hadoop-Strecke?
Wer gewinnt das SQL-Rennen auf der Hadoop-Strecke?
 
SAP HORTONWORKS
SAP HORTONWORKSSAP HORTONWORKS
SAP HORTONWORKS
 
Hdfs high availability
Hdfs high availabilityHdfs high availability
Hdfs high availability
 
Realtime BigData Step by Step mit Lambda, Kafka, Storm und Hadoop
Realtime BigData Step by Step mit Lambda, Kafka, Storm und HadoopRealtime BigData Step by Step mit Lambda, Kafka, Storm und Hadoop
Realtime BigData Step by Step mit Lambda, Kafka, Storm und Hadoop
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Microsoft on Big Data
Microsoft on Big DataMicrosoft on Big Data
Microsoft on Big Data
 
Big Data Bullshit Bingo
Big Data Bullshit BingoBig Data Bullshit Bingo
Big Data Bullshit Bingo
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
 
Real-time Market Basket Analysis for Retail with Hadoop
Real-time Market Basket Analysis for Retail with HadoopReal-time Market Basket Analysis for Retail with Hadoop
Real-time Market Basket Analysis for Retail with Hadoop
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWR
 

Similar to Realtime Analytics in Hadoop

Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014Hortonworks
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Hortonworks
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Hortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Hortonworks
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitDataWorks Summit
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitDataWorks Summit
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Joan Novino
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudHortonworks
 

Similar to Realtime Analytics in Hadoop (18)

Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
 
Hadoop In Action
Hadoop In ActionHadoop In Action
Hadoop In Action
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open Cloud
 

Recently uploaded

EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxnada99848
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 

Realtime Analytics in Hadoop

  • 1. Realtime Analytics in Hadoop Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Rommel Garcia – Solution Engineer October 10, 2014
  • 2. Hadoop Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 3. Hadoop provides • Terabytes to Petabytes of storage on commodity hardware (HDFS) • Massive parallel computation on enormous amount of data (YARN) Hadoop is essentially a supercomputer for the masses! Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 4. HDFS: Scalable, Reliable, Secure Storage Platform The Storage Platform for the Modern Data Architecture Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN: Data Operating System B A B A C A C A B C B B A C HDFS (Hadoop Distributed File System) Reliable Highly Available &Fault Tolerant Protects against data loss & corruption Cost Effective Horizontally scales on Commodity Hardware Secure Strong access controls, integrated with authentication mechanisms Granular data access controls to datasets across users and groups NFS Source/Dest ination REST RPC Source/Dest ination Source/Dest ination Standards Based Data Interfaces Ingest and store any data in any format Flexible read access enables a variety of work loads
  • 5. Hadoop 1 Single Use Data Platform Hive Pig Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Batch HADOOP 1 Mapreduce Redundant, Reliable Storage (HDFS) Java
  • 6. 2006 2009 MR-279: YARN Hadoop w/ MapReduce MapReduce Largely Batch Processing 1 ° ° ° ° ° HDFS (Hadoop Distributed File System) ° ° ° ° ° N Hadoop2 & YARN based Architecture Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N ° HDFS (Hadoop Distributed File System) Silo’d clusters Largely batch system Difficult to integrate Hadoop 2 & YARN Batch Interactive Real-Time Enabled the Modern Data Architecture October 23, 2013
  • 7. Hadoop Multi Use Data Platform Batch, Interactive, Realtime, Online, Streaming, … Management & Shared Services Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HADOOP 2 Efficient Cluster Resource (YARN) Redundant, Reliable Storage (HDFS) Standard Query Processing Hive Batch MapReduce Online Data Processing Interactive Tez Real Time Stream Processing Others
  • 8. Why Are Enterprises Using Hadoop? Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 9. Traditional systems under pressure DATA SYSTEM APPLICATIONS Business Analytics Custom Applications RDBMS EDW MPP Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Packaged Applications • Silos of Data • Costly to Scale • Constrained Schemas Clickstream Geolocation Sentiment, Web Data Sensor, Machine Data (IoT) Unstructured docs, emails Server logs SOURCES Existing Sources (CRM, ERP,…) New Data Types …and difficult to manage new data
  • 10. Hadoop 2 and YARN enable the Modern Data Architecture Batch Interactive Real-Time HDFS (Hadoop Distributed File System) Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Common data set, multiple applications • Optionally land all data in a single cluster • Batch, interactive & real-time use cases • Support multi-tenant access, processing & segmentation of data YARN: Architectural center of Hadoop • Consistent security, governance & operations • Ecosystem applications run natively in Hadoop SOURCES EXISTING Systems Clickstream Web &Social Geolocation Sensor & Machine Server Logs Unstructured DATA SYSTEM APPLICATIONS Business Analytics Custom Applications Packaged Applications RDBMS EDW MPP YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N
  • 11. Real-Time Use Cases Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 12. Realtime Analytics in… $ • Fraud Detection/Prevention • Cell tower diagnostics • Proactive Maintenance Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved • Bandwidth Allocation • Brand Sentiment Analysis • Localized, Personalized Promotions Financial Services Retail Telecom Manufacturing Healthcare Utilities, Oil & Gas Public Sector • Monitor patient vitals • Patient care and safety • Reduce re-admittance rates • Smart meter stream analysis • Proactive equipment repair • Power and consumption matching • Network intrusion detection and prevention • Disease outbreak detection Transportation • Unsafe driving detection and monitoring
  • 13. Truck Demo: Real-Time Analytics Problem: • The only way to measure “safe driving” is through accident occurences. • There’s no realtime accident prevention mechanism in place Solution: • Use Hadoop to analyze driving violations in real-time • Provide a UI to view to real-time violation alerts • Provide a dashboard to review violation reports Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 14. Demo Time ! Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 15. Truck Demo Real-Time Hadoop Architecture Truck Events High Speed Ingestion Message Queue Distributed Processing Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Kafka Storm Show Driving Report HDFS/Hive HBase (ActiveMQ) Solr (Reporting Dashboard) Real-Time Monitoring App Truck Event Data Alerts Violations Show
  • 16. Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Q&A
  • 17. Hadoop 2.0 Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Rommel Garcia – Solution Engineer October 10, 2014
  • 18. Hadoop 2 Becoming A Critical Platform Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 19. Hadoop 2 delivers a comprehensive data management platform Hadoop 2 Platform Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Data Workflow, Lifecycle & Governance Falcon Sqoop Flume NFS WebHDFS In-Memory Spark YARN: Data Operating System DATA MANAGEMENT SECURITY BATCH, INTERACTIVE & REAL-TIME DATA ACCESS GOVERNANCE & INTEGRATION Authentication Authorization Accounting Data Protection Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox OPERATIONS Script Pig Search Solr SQL Hive HCatalog NoSQL HBase Accumulo Stream Storm Others ISV Engines 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS (Hadoop Distributed File System) Deployment Choice Linux Windows On- Premise Cloud YARN is the architectural center of Hadoop 2 • Enables batch, interactive and real-time workloads • Single SQL engine for both batch and interactive • Enable existing ISV apps to plug directly into Hadoop via YARN Provides comprehensive enterprise capabilities • Governance • Security • Operations The widest range of deployment options • Linux & Windows • On premise & cloud Tez Tez
  • 20. YARN – Roadmap Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 21. YARN Development Framework API Engine System YARN : Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS (Hadoop Distributed File System) Batch MapReduce Real-Time Slider Direct Java .NET Scripting Pig SQL Hive Cascading Java Scala NoSQL HBase Accumulo Stream Storm Other ISV Other ISV Applications Others Spark Other ISV New New New New Tez Tez Tez Tez New
  • 22. YARN General Store – The Future • A Data Lake that has a General Store to continually serve you…. – App Store – YARN Ready Applications – Data Store – Where do I get the interesting data…Weather, Geo, ..etc. – View Store – How do I get UI’s to the cluster – Processing Store – Falcon, Pig...etc. for “standard” data sets or common “processing patterns” Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 23. Argus– Security Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 24. Argus: Security needs are changing Administration Centrally management & consistent security Authentication Authenticate users and systems Authorization Provision access to data Audit Maintain a record of data access Data Protection Protect data at rest and in motion Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Security needs are changing • YARN unlocks the data lake • Multi-tenant: Multiple applications for data access • Changing and complex compliance environment • ETL of non-sensitive data can yield sensitive data Summer 2014 65% of clusters host multiple workloads Fall 2013 Largely silo’d deployments with single workload clusters 5 areas of security focus
  • 25. Security in Hadoop with HDP + Argus (XA Secure) Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & in motion • Kerberos in native Apache Hadoop • HTTP/REST API Secured with Apache Knox Gateway • HDFS Permissions, HDFS ACL, • Audit logs in with HDFS & MR • Hive ATZ-NG Authentication Who am I/prove it? • Wire encryption in Hadoop • Open Source Initiatives • Partner Solutions • HDFS, Hive and Hbase • Fine grain access control • RBAC • Centralized audit reporting • Policy and access history • Future Integration Argus Hadoop 2 Centralized Security Administration • As-Is, works with current authentication methods
  • 26. Hive– SQL In Hadoop & Roadmap Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 27. Hive: The De-Facto SQL Interface for Hadoop Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Page 27
  • 28. Data Abstractions in Hive Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Partitions, buckets and skews facilitate faster, more direct data access. Cube, windowing, aggregation functions supported as well Page 28 Database Table Table Partition Partition Partition Bucket Bucket Bucket Optional Per Table Unskewed Keys Skewed Keys
  • 29. Stinger.Next - Roadmap Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 30. Stinger.Next – Release Cycle Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 31. Hive Demo Using DBVisualizer or Excel? Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 32. Falcon– Data Governance Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 33. Data Pipeline Tracing Data pipeline dependencies Customer feed Purchase feed Product feed Store feed View dependencies between clusters, datasets and processes Data pipeline tagging Sensitive encrypted Add arbitrary tags to feeds & processes Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Credit feed Data pipeline audits Know who modified a dataset when and into what Data pipeline lineage File- 1 File- 2 File- 3 Analyze how a dataset reached a particular state
  • 34. Example: Multi-Cluster Replication Primary Hadoop Cluster Raw Data Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Presented Data Cleansed Data Conformed Data Staged Data Presented Data Replication Failover Hadoop Cluster Replication Bi and Analytic Applications • Falcon manages workflow and replication • Enables business continuity without requiring full data reprocessing • Failover clusters can be smaller than primary clusters ..and many more
  • 35. Example: Retention Staged Data Retention Policy Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Presented Data Cleansed Data Conformed Data Retain 5 Years Retain Last Copy Only Retain 3 Years Retain 3 Years • Sophisticated retention policies expressed in one place • Simplify data retention for audit, compliance, or for data re-processing
  • 36. Ambari – Hadoop Cluster Monitoring Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 37. Ambari Dashboard Page 37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 38. Ambari 2H 2014 1.7.0 (September) 1.8.0 (October) 2.0.0 (December) Features • Config versioning + history • Config <final> Properties • Flume Support • Ubuntu Support • ResourceManager HA • HDFS Rebalance • Ambari Views Framework • Slider Support Tech Preview • Windows Support • Ambari Shell Page 38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Features • ServiceX on YARN via Slider • Log Access + Search • Rack Awareness • Simplified Kerberos Setup • NameNode SafeMode • Ambari Shell GA Features • Automated Rolling Upgrades • Oozie HA • Ambari Alerts • Ambari Metrics • Windows Support GA
  • 39. Hadoop 2 Deployment Options Page 39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 40. Efficient Data Lakes can Span to the Cloud On-Premises Cloud Page 40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDP on Windows HDP on Linux Your deployment of Hadoop hosted as a VM in Azure HDP on Windows HDP on Linux Full control of HW and software configs 1 2 Analytics Platform System Turnkey Hadoop and relational warehouse appliance HDInsight Managed Hadoop Service Built on Azure storage 3 4 Enjoy cross-platform interoperability based on 100% open source HDP
  • 41. Page 41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Q&A
  • 42. Thank You! Rommel Garcia – Solution Engineer Twitter: @rommelgarcia LinkedIn: /rommelgarcia Page 42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Editor's Notes

  1. So, where does Hadoop fit in the data center? This picture here is a very simple depiction of the typical data architecture in any organization.   - There are sources of data: ERP, CRM, other digital sources - That data is then stored in a data system: a data warehouse, MPP system, etc - Then an application of some kind accesses that data system: a packaged application such as Excel or Tableau, a custom application written by a developer, or even another business application   This has been the foundation of the data center for years. We have had some challenges with this architecture all along, however, we are seeing increased pressure to modify and improve this basic blueprint because A) this approach created silos of data and it was difficult to either share the data or get a single view of it B) these systems are costly to scale C) and they are also coupled to a very static schema. Changes to a data model are difficult if not imnpossible. This limits flexibility and iniight.   Finally, the emergence of NEW types of data as we digitize the world around us such as clickstream, machine sensor, etc, are growing at exponential rates. We are all becoming data driven organizations.   In fact that sheer volume of data is to grow 20X between 2013 and 2020 – and which puts tremendous pressure on this architecture. The old architecture is neither technologically nor commercially practical.
  2. YARN is relatively the element that enables the modern data architecture as it turns hadoop into a truly multi-purpose data platform with batch, interactive and real time workloads all running in a single cluster..   It enables users to: - Create a central cluster into which data can be stored and then accessed it using a range of processing engines: batch, interactive, real-time. - It is akin to the journey with virtualization: from a single virtual server to a pool of virtual infrastructure.   It is the architectural center of Hadoop - it provides the data operating system around which the core enterprise capabilities of security, governance and operations can be integrated - It is the integration point into which all data processing engines integrate – from the open source community but also from the commercial vendor ecosystem