Hadoop and Netezza - Co-existence or Competition?

•Download as PPTX, PDF•

9 likes•21,187 views

Hadoop is rapidly emerging as a viable platform for big data analytics. Thanks to early adoption by organizations like Yahoo and Facebook, and an active open source community, we have seen significant innovation around this platform. With support of relational constructs and a SQL-like query interface, many experts believe that Hadoop will subsume some of the data warehousing tasks at some point in the future. Even though Hadoop and parallel databases have some architectural similarities, they are designed to solve different problems. In this presentation, you will get introduced to Hadoop architecture, its salient differences from Netezza and typical use cases. You will learn about common co-existence deployment models that have been put into practice by Netezza's customers who have leveraged benefits from both these technologies. You will also understand Netezza's current support for Hadoop and future strategy.

Technology

Hadoop and Netezza Co-existence or competition? Krishnan Parasuraman, CTO - Digital Media, Netezza @kparasuraman Tweet about EnzeeUniverse using #enzee11

A brief history of wannabe RDBMS killers 5

6 Open Source Distributed Storage and Processing Engine Manage complex data – relational and non relational – in a single repository Fault tolerant distributed processing Self healing, distributed storage Abstraction for parallel computing + Store source data forever and analyze as and when needed Commodity hardware – inexpensive storage Process at source – eliminate data movement Oozie Workflow Sqoop Integration Zookeeper Service coordination Flume, Chukwa, Scribe Data collection

Hadoop: Origin and evolution 7 Apache: Hadoop project Google: MapReduce paper Apache: HBase project Apache: Lucene subproject Netezza : Hadoop Connector, MapReduce support Google: GFS paper Yahoo: 10K core cluster Google: Bigtable paper 2003 2009 2010 2004 2007 2008 2011 2005 2006 Open source dev momentum Early Research Initial success stories Commercialization

Common Perceptions 8 Cloud Large Volumes Ad-hoc queries Low cost Complex Analytics Unstructured

Parallel data warehouse systems 9 SQL Host controllers Network fabric Hosts FPGA CPU FPGA CPU FPGA CPU Massively parallel compute nodes Memory Memory Memory Storage Units

Hadoop 10 Map Reduce Master Node Job Tracker Name Node Network fabric Parallel compute nodes Task Tracker Task Tracker Data Node Data Node Task Tracker Data Node Storage Units

The similarities 11 Map Reduce Job Tracker Name Node Massive parallelism Execute code & algorithms next to data Task Tracker Task Tracker Data Node Data Node Task Tracker Data Node Scalable Highly Available

The differences 12 Map Reduce Schema on Read – Data loading is fast Job Tracker Name Node Batch mode data access Not intended for real time access Task Tracker Task Tracker Task Tracker Data Node Data Node Data Node Doesn’t support Random Access No joins, no query engine, no types, no SQL Data Loading = File copy Look Ma, No ETL

Where does it work well? 1. Queryable Archive: Moving computation is cheaper than moving data 2. Exploratory analysis: Relationships not defined yet; Can’t put in a process for ETL; Evolving schema 3. Complex data: Parallel ETL in Java 13

Imperatives for co-existence 14 ,[object Object]

Expressability of SQL coupled with flexibility of procedural code i.e. MapReduce

Low cost of storing and analyzing not-so-hot data

Parse and analyze complex data such as video and imagesData: Point of origination Files, Structured & Unstructured sources

Netezza-Hadoop: Co-existence use cases Create context (classification, text mining) Analyze unstructured data Analyze, report Parse, aggregate semi-structured data Active archival Long running queries Analyze, report structured data

Pattern 1: Data ingestion Hadoop Cluster Netezza Environment 3 4 2 NameNode JobTracker 1 Raw Weblogs DataNode TaskTracker DataNode TaskTracker DataNode TaskTracker

Pattern 2: Low cost storage and dynamic provisioning Amazon Cloud 2 3 Elastic MapReduce 1 Amazon S3

Pattern 3: Queryable archive 1 2 Data Sources

What's hot

Big data hadoop rdbmsArjen de Vries

عصر کلان داده، چرا و چگونه؟datastack

Big Data and HadoopFlavio Vit

Hadoop introduction , Why and What is Hadoop ?sudhakara st

Apache Hadoop - Big Data EngineeringBADR

Hadoop bigdata overviewharithakannan

Big data vahidamiri-datastack.irdatastack

Big data architecture on cloud computing infrastructuredatastack

Hadoop configuration & performance tuningVitthal Gogate

Apache HadoopAjit Koti

Big data pptThirunavukkarasu Ps

Module 01 - Understanding Big Data and Hadoop 1.x,2.xNPN Training

Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Cloudera, Inc.

HDFSVardhman Kale

Hadoop in three use casesJoey Echeverria

Hadoop and Hive in Enterprisesmarkgrover

What is hadoopAsis Mohanty

[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ～Ora...Insight Technology, Inc.

Azure_Business_OpportunityNojan Emad

Hadoop ABHIJEET RAJ

What's hot (20)

Big data hadoop rdbms

عصر کلان داده، چرا و چگونه؟

Big Data and Hadoop

Hadoop introduction , Why and What is Hadoop ?

Apache Hadoop - Big Data Engineering

Hadoop bigdata overview

Big data vahidamiri-datastack.ir

Big data architecture on cloud computing infrastructure

Hadoop configuration & performance tuning

Apache Hadoop

Big data ppt

Module 01 - Understanding Big Data and Hadoop 1.x,2.x

Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...

HDFS

Hadoop in three use cases

Hadoop and Hive in Enterprises

What is hadoop

[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ～Ora...

Azure_Business_Opportunity

Hadoop

Similar to Hadoop and Netezza - Co-existence or Competition?

Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling

Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera, Inc.

Big Data Unit 4 - HadoopRojaT4

Hadoop training in bangaloreKelly Technologies

RDBMS vs Hadoop vs SparkLaxmi8

HadoopZubair Arshad

Hadoop ppt1chariorienit

Data Analytics Meetup: Introduction to Azure Data Lake Storage CCG

Hopsworks - Self-Service Spark/Flink/Kafka/HadoopJim Dowling

P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.MaharajothiP

Agile data lake? An oxymoron?samthemonad

Introduction to Apache HadoopChristopher Pezza

Hadoop introductionSubhas Kumar Ghosh

HADOOPHarinder Kaur

Hadoop and object stores can we do it bettergvernik

Hadoop and object stores: Can we do it better?gvernik

Apache hadoop and hivesrikanthhadoop

Kafka & Hadoop in RakutenRakuten Group, Inc.

Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime

Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)

Similar to Hadoop and Netezza - Co-existence or Competition? (20)

Hopsworks in the cloud Berlin Buzzwords 2019

Cloudera Breakfast Series, Analytics Part 1: Use All Your Data

Big Data Unit 4 - Hadoop

Hadoop training in bangalore

RDBMS vs Hadoop vs Spark

Hadoop

Hadoop ppt1

Data Analytics Meetup: Introduction to Azure Data Lake Storage

Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop

P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.

Agile data lake? An oxymoron?

Introduction to Apache Hadoop

Hadoop introduction

HADOOP

Hadoop and object stores can we do it better

Hadoop and object stores: Can we do it better?

Apache hadoop and hive

Kafka & Hadoop in Rakuten

Cloudera Impala - San Diego Big Data Meetup August 13th 2014

Introduction to Apache Hadoop Eco-System

Recently uploaded

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Partners Life - Insurer Innovation Award 2024The Digital Insurer

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

How to convert PDF to text with Nanonetsnaman860154

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

A Call to Action for Generative AI in 2024Results

Slack Application Development 101 Slidespraypatel2

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Partners Life - Insurer Innovation Award 2024

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

How to convert PDF to text with Nanonets

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

Handwritten Text Recognition for manuscripts and early printed texts

A Call to Action for Generative AI in 2024

Slack Application Development 101 Slides

Injustice - Developers Among Us (SciFiDevCon 2024)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

IAC 2024 - IA Fast Track to Search Focused AI Solutions

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Automating Google Workspace (GWS) & more with Apps Script

Axa Assurance Maroc - Insurer Innovation Award 2024

Unblocking The Main Thread Solving ANRs and Frozen Frames

Hadoop and Netezza - Co-existence or Competition?

1. Hadoop and Netezza Co-existence or competition? Krishnan Parasuraman, CTO - Digital Media, Netezza @kparasuraman Tweet about EnzeeUniverse using #enzee11

2. The Buzz 2

3. 3

4. Fuelling the debate 4

5. A brief history of wannabe RDBMS killers 5

6. 6 Open Source Distributed Storage and Processing Engine Manage complex data – relational and non relational – in a single repository Fault tolerant distributed processing Self healing, distributed storage Abstraction for parallel computing + Store source data forever and analyze as and when needed Commodity hardware – inexpensive storage Process at source – eliminate data movement Oozie Workflow Sqoop Integration Zookeeper Service coordination Flume, Chukwa, Scribe Data collection

7. Hadoop: Origin and evolution 7 Apache: Hadoop project Google: MapReduce paper Apache: HBase project Apache: Lucene subproject Netezza : Hadoop Connector, MapReduce support Google: GFS paper Yahoo: 10K core cluster Google: Bigtable paper 2003 2009 2010 2004 2007 2008 2011 2005 2006 Open source dev momentum Early Research Initial success stories Commercialization

8. Common Perceptions 8 Cloud Large Volumes Ad-hoc queries Low cost Complex Analytics Unstructured

9. Parallel data warehouse systems 9 SQL Host controllers Network fabric Hosts FPGA CPU FPGA CPU FPGA CPU Massively parallel compute nodes Memory Memory Memory Storage Units

10. Hadoop 10 Map Reduce Master Node Job Tracker Name Node Network fabric Parallel compute nodes Task Tracker Task Tracker Data Node Data Node Task Tracker Data Node Storage Units

11. The similarities 11 Map Reduce Job Tracker Name Node Massive parallelism Execute code & algorithms next to data Task Tracker Task Tracker Data Node Data Node Task Tracker Data Node Scalable Highly Available

12. The differences 12 Map Reduce Schema on Read – Data loading is fast Job Tracker Name Node Batch mode data access Not intended for real time access Task Tracker Task Tracker Task Tracker Data Node Data Node Data Node Doesn’t support Random Access No joins, no query engine, no types, no SQL Data Loading = File copy Look Ma, No ETL

13. Where does it work well? 1. Queryable Archive: Moving computation is cheaper than moving data 2. Exploratory analysis: Relationships not defined yet; Can’t put in a process for ETL; Evolving schema 3. Complex data: Parallel ETL in Java 13

14.

15. Expressability of SQL coupled with flexibility of procedural code i.e. MapReduce

16. Low cost of storing and analyzing not-so-hot data

17. Parse and analyze complex data such as video and imagesData: Point of origination Files, Structured & Unstructured sources

18. Netezza-Hadoop: Co-existence use cases Create context (classification, text mining) Analyze unstructured data Analyze, report Parse, aggregate semi-structured data Active archival Long running queries Analyze, report structured data

19. Pattern 1: Data ingestion Hadoop Cluster Netezza Environment 3 4 2 NameNode JobTracker 1 Raw Weblogs DataNode TaskTracker DataNode TaskTracker DataNode TaskTracker

20. Pattern 2: Low cost storage and dynamic provisioning Amazon Cloud 2 3 Elastic MapReduce 1 Amazon S3

21. Pattern 3: Queryable archive 1 2 Data Sources

22. Pattern 4: Support low interaction partners 1 3 Data Sources 2

23.

24. Use Hadoop for ingesting/parsing web logs, offline analyticsHigh speed data loader (bidirectional) weblogs

25. Summary: Leveraging best of both worlds 21 1. Hadoop is not a replacement to a parallel datawarehouse 2. Hadoop and Netezza are complementary technologies 3. Don’t let the hype drive the need 4. We have only solved the integration problem

Hadoop and Netezza - Co-existence or Competition?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hadoop and Netezza - Co-existence or Competition?

Similar to Hadoop and Netezza - Co-existence or Competition? (20)

More from Krishnan Parasuraman

More from Krishnan Parasuraman (8)

Recently uploaded

Recently uploaded (20)

Hadoop and Netezza - Co-existence or Competition?