SlideShare a Scribd company logo
1 of 17
HDFS Hadoop Distributed File System
Introduction
Johan Louwers – Lead Architect Oracle Technology
2Copyright © 2014 Capgemini. All rights reserved.
Hadoop HDFS introduction
HDFS – Hadoop Distributed File System
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity
hardware. It has many similarities with existing distributed file systems. However, the differences from
other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed
on low-cost hardware. HDFS provides high throughput access to application data and is suitable for
applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming
access to file system data. HDFS was originally built as infrastructure for the Apache Nutch web search
engine project. HDFS is now an Apache Hadoop subproject. The project URL
is http://hadoop.apache.org/hdfs/.
3Copyright © 2014 Capgemini. All rights reserved.
Hadoop HDFS introduction
HDFS – Simple Cluster Setup
Simple HDFS Cluster Setup
A) HDFS cluster consisting out of a number of
commodity servers.
B) A single server containing both a “name
node” and a “data node”
C) Multiple servers containing a “data node”
B
C
A
4Copyright © 2014 Capgemini. All rights reserved.
Hadoop HDFS introduction
HDFS – introduction
HDFS Name Node
• Primary index of where data is stored within
the cluster.
• Primary entry point for all (applications)
clients who request access to HDFS.
• Advisable to size the Name Node bigger then
the Data Node server.
• Option to run a Data Node instance on the
same server as the Name Node.
• Hadoop 2.0.0 and higher provide the option to
have high available Name Node setup. Prior to
2.0.0 the name Node was a single point of
Failure.
A
5Copyright © 2014 Capgemini. All rights reserved.
Hadoop HDFS introduction
HDFS – introduction
HDFS Storage
• A (large) file is “chopped” into blocks.
• Blocks are written to the different data nodes
in the cluster.
• The name node keeps track of which block is
written to which node.
6Copyright © 2014 Capgemini. All rights reserved.
Hadoop HDFS introduction
HDFS – introduction
On startup, the NameNode enters a special state
called Safemode. Replication of data blocks does
not occur when the NameNode is in the Safemode
state.
HDFS Storage
• Data blocks are replicated over different nodes
in the cluster to ensure availability when a node
fails.
• Level of replication is by default 3. Configured
with the dfs.replication variable in the HDFS
configuration
7Copyright © 2014 Capgemini. All rights reserved.
Hadoop HDFS introduction
HDFS – introduction
HDFS Storage
• When operating a large cluster ensure that
you have enabled the rack aware option.
•Refer to the HADOOP-692 improvement for
more details: http://goo.gl/dQ012n
Thanks to ChrisDag for the image
Typically large Hadoop clusters are arranged in racks
and network traffic between different nodes with in the
same rack is much more desirable than network traffic
across the racks. In addition NameNode tries to place
replicas of block on multiple racks for improved fault
tolerance.
8Copyright © 2014 Capgemini. All rights reserved.
Hadoop HDFS introduction
HDFS – Oracle & Big Data
Oracle Big Data Appliance Introduction
• Oracle Big Data Appliance is a high-
performance, secure platform for running
diverse workloads on Hadoop and NoSQL
systems.
9Copyright © 2014 Capgemini. All rights reserved.
Hadoop HDFS introduction
HDFS – Oracle & Big Data
Oracle Big Data Appliance Introduction
• Oracle Big Data Appliance includes (almost
without the need to say it) a HDFS storage
component for storing data.
10Copyright © 2014 Capgemini. All rights reserved.
Hadoop HDFS introduction
HDFS – Oracle & Big Data
Oracle & Hadoop
• Oracle XQuery for Hadoop
11Copyright © 2014 Capgemini. All rights reserved.
Hadoop HDFS introduction
HDFS – Oracle & Big Data
Oracle & Hadoop
• Oracle SQL connector for HDFS
12Copyright © 2014 Capgemini. All rights reserved.
Hadoop HDFS introduction
HDFS – Oracle & Big Data
Oracle & Hadoop
• Oracle Loader for Hadoop
•Online mode
•Offline mode
13Copyright © 2014 Capgemini. All rights reserved.
Hadoop HDFS introduction
HDFS – Oracle & Big Data
Oracle & Hadoop
• Oracle Loader for Hadoop
•Online mode
•Offline mode
14Copyright © 2014 Capgemini. All rights reserved.
Hadoop HDFS introduction
HDFS – Oracle & Big Data
Oracle & Hadoop
• Oracle Big Data SQL
15Copyright © 2014 Capgemini. All rights reserved.
Hadoop HDFS introduction
HDFS – Oracle & Big Data
Oracle & Hadoop
• Oracle Big Data SQL
16Copyright © 2014 Capgemini. All rights reserved.
Hadoop HDFS introduction
Contact me
Johan Louwers
Capgemini Lead Architect Oracle Technology
• Mail : Johan.Louwers@capgemini.com
• Twitter : @johanlouwers
• Blog 1 : http://www.capgemini.com/blog/capgemini-oracle-blog
• Blog 2 : http://johanlouwers.blogspot.com
The information contained in this presentation is proprietary.
© 2014 Capgemini. All rights reserved.
Rightshore® is a trademark belonging to Capgemini.
www.capgemini.com
About Capgemini
With almost 140,000 people in over 40 countries, Capgemini is
one of the world's foremost providers of consulting, technology
and outsourcing services. The Group reported 2013 global
revenues of EUR 10.1 billion.
Together with its clients, Capgemini creates and delivers
business and technology solutions that fit their needs and drive
the results they want. A deeply multicultural organization,
Capgemini has developed its own way of working, the
Collaborative Business Experience™, and draws on
Rightshore®, its worldwide delivery model.
Learn more about us at www.capgemini.com.

More Related Content

What's hot

Redis memory optimization sripathi, CTO hashedin
Redis memory optimization   sripathi, CTO hashedinRedis memory optimization   sripathi, CTO hashedin
Redis memory optimization sripathi, CTO hashedinHashedIn Technologies
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Kite SDK: Working with Datasets
Kite SDK: Working with DatasetsKite SDK: Working with Datasets
Kite SDK: Working with DatasetsCloudera, Inc.
 
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, DatatypesHDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, DatatypesThe HDF-EOS Tools and Information Center
 
DataLogix Hadoop Solution
DataLogix Hadoop SolutionDataLogix Hadoop Solution
DataLogix Hadoop SolutionDataLogix B.V.
 
HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemSteve Loughran
 
Yahoo! - Arun Murthy - Hadoop World 2010
Yahoo! - Arun Murthy - Hadoop World 2010Yahoo! - Arun Murthy - Hadoop World 2010
Yahoo! - Arun Murthy - Hadoop World 2010Cloudera, Inc.
 
Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017HashedIn Technologies
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterEdureka!
 
SAS-Hadoop Foundation
SAS-Hadoop FoundationSAS-Hadoop Foundation
SAS-Hadoop FoundationAshish Jain
 
Kite SDK introduction for Portland Big Data
Kite SDK introduction for Portland Big DataKite SDK introduction for Portland Big Data
Kite SDK introduction for Portland Big Data_blue
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drillJulien Le Dem
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Community
 
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentKeynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentContinuent
 
Kite (Big Data Applications Meetup @ Cask)
Kite (Big Data Applications Meetup @ Cask)Kite (Big Data Applications Meetup @ Cask)
Kite (Big Data Applications Meetup @ Cask)_blue
 

What's hot (20)

Apache Kite
Apache KiteApache Kite
Apache Kite
 
Redis memory optimization sripathi, CTO hashedin
Redis memory optimization   sripathi, CTO hashedinRedis memory optimization   sripathi, CTO hashedin
Redis memory optimization sripathi, CTO hashedin
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Kite SDK: Working with Datasets
Kite SDK: Working with DatasetsKite SDK: Working with Datasets
Kite SDK: Working with Datasets
 
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, DatatypesHDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
 
DataLogix Hadoop Solution
DataLogix Hadoop SolutionDataLogix Hadoop Solution
DataLogix Hadoop Solution
 
HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed Filesystem
 
Yahoo! - Arun Murthy - Hadoop World 2010
Yahoo! - Arun Murthy - Hadoop World 2010Yahoo! - Arun Murthy - Hadoop World 2010
Yahoo! - Arun Murthy - Hadoop World 2010
 
Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop Cluster
 
SAS-Hadoop Foundation
SAS-Hadoop FoundationSAS-Hadoop Foundation
SAS-Hadoop Foundation
 
Kite SDK introduction for Portland Big Data
Kite SDK introduction for Portland Big DataKite SDK introduction for Portland Big Data
Kite SDK introduction for Portland Big Data
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
Lecture 2 part 2
Lecture 2 part 2Lecture 2 part 2
Lecture 2 part 2
 
Hadoop
HadoopHadoop
Hadoop
 
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentKeynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at Continuent
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Kite (Big Data Applications Meetup @ Cask)
Kite (Big Data Applications Meetup @ Cask)Kite (Big Data Applications Meetup @ Cask)
Kite (Big Data Applications Meetup @ Cask)
 
HDF5 I/O Performance
HDF5 I/O PerformanceHDF5 I/O Performance
HDF5 I/O Performance
 

Similar to Hadoop HDFS and Oracle

Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014hadooparchbook
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFSKavyaGo
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in AmritsarE2MATRIX
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in MohaliE2MATRIX
 
Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in LudhianaE2MATRIX
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemMahabubur Rahaman
 
HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAYthevijayps
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training Keylabs
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopCloudera, Inc.
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapakapa rohit
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemKoushik Mondal
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valleymarkgrover
 
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleIntroduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleSpringPeople
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Cloudera, Inc.
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialDaniel Abadi
 

Similar to Hadoop HDFS and Oracle (20)

Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Hadoop hdfs
Hadoop hdfsHadoop hdfs
Hadoop hdfs
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFS
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in Amritsar
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in Mohali
 
Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in Ludhiana
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAY
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache Hadoop
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapa
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleIntroduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeople
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
 
Hadoop Tutorial for Beginners
Hadoop Tutorial for BeginnersHadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 

More from Johan Louwers

Multi Domain REST API routing for Data Mesh based Data Products
Multi Domain REST API routing for Data Mesh based Data ProductsMulti Domain REST API routing for Data Mesh based Data Products
Multi Domain REST API routing for Data Mesh based Data ProductsJohan Louwers
 
TClab Dynamic Solar Panel Positioning Systems
TClab Dynamic Solar Panel Positioning SystemsTClab Dynamic Solar Panel Positioning Systems
TClab Dynamic Solar Panel Positioning SystemsJohan Louwers
 
Oracle Cloud With Azure DevOps Pipelines
Oracle Cloud With Azure DevOps PipelinesOracle Cloud With Azure DevOps Pipelines
Oracle Cloud With Azure DevOps PipelinesJohan Louwers
 
Oracle Cloud native functions - create application from cli
Oracle Cloud native functions - create application from cliOracle Cloud native functions - create application from cli
Oracle Cloud native functions - create application from cliJohan Louwers
 
Oracle Labs - research mission & project potfolio
Oracle Labs - research mission & project potfolioOracle Labs - research mission & project potfolio
Oracle Labs - research mission & project potfolioJohan Louwers
 
Install Redis on Oracle Linux
Install Redis on Oracle LinuxInstall Redis on Oracle Linux
Install Redis on Oracle LinuxJohan Louwers
 
Fn project quick installation guide
Fn project quick installation guideFn project quick installation guide
Fn project quick installation guideJohan Louwers
 
Oracle python pandas merge DataFrames
Oracle python pandas merge DataFramesOracle python pandas merge DataFrames
Oracle python pandas merge DataFramesJohan Louwers
 
import data from Oracle Database into Python Pandas Dataframe
import data from Oracle Database into Python Pandas Dataframeimport data from Oracle Database into Python Pandas Dataframe
import data from Oracle Database into Python Pandas DataframeJohan Louwers
 
Voice assistants for the insurance industry
Voice assistants for the insurance industry Voice assistants for the insurance industry
Voice assistants for the insurance industry Johan Louwers
 
Enterprise wide publish subscribe with Apache Kafka
Enterprise wide publish subscribe with Apache KafkaEnterprise wide publish subscribe with Apache Kafka
Enterprise wide publish subscribe with Apache KafkaJohan Louwers
 
Industry 4.0 and Oracle Cloud
Industry 4.0 and Oracle CloudIndustry 4.0 and Oracle Cloud
Industry 4.0 and Oracle CloudJohan Louwers
 
Docker and microservices - moving from a monolith to microservices
Docker and microservices - moving from a monolith to microservicesDocker and microservices - moving from a monolith to microservices
Docker and microservices - moving from a monolith to microservicesJohan Louwers
 
Cloud native applications for banking
Cloud native applications for bankingCloud native applications for banking
Cloud native applications for bankingJohan Louwers
 
Conversational retail
Conversational retailConversational retail
Conversational retailJohan Louwers
 
Oracle Cloudday security
Oracle Cloudday securityOracle Cloudday security
Oracle Cloudday securityJohan Louwers
 
Oracle Cloudday - the future of retail
Oracle Cloudday - the future of retailOracle Cloudday - the future of retail
Oracle Cloudday - the future of retailJohan Louwers
 
Capgemini Oracle Cloud Access Security Broker
Capgemini Oracle Cloud Access Security BrokerCapgemini Oracle Cloud Access Security Broker
Capgemini Oracle Cloud Access Security BrokerJohan Louwers
 
Microservices in the oracle cloud
Microservices in the oracle cloudMicroservices in the oracle cloud
Microservices in the oracle cloudJohan Louwers
 
Oracle cloud, private, public and hybrid
Oracle cloud, private, public and hybridOracle cloud, private, public and hybrid
Oracle cloud, private, public and hybridJohan Louwers
 

More from Johan Louwers (20)

Multi Domain REST API routing for Data Mesh based Data Products
Multi Domain REST API routing for Data Mesh based Data ProductsMulti Domain REST API routing for Data Mesh based Data Products
Multi Domain REST API routing for Data Mesh based Data Products
 
TClab Dynamic Solar Panel Positioning Systems
TClab Dynamic Solar Panel Positioning SystemsTClab Dynamic Solar Panel Positioning Systems
TClab Dynamic Solar Panel Positioning Systems
 
Oracle Cloud With Azure DevOps Pipelines
Oracle Cloud With Azure DevOps PipelinesOracle Cloud With Azure DevOps Pipelines
Oracle Cloud With Azure DevOps Pipelines
 
Oracle Cloud native functions - create application from cli
Oracle Cloud native functions - create application from cliOracle Cloud native functions - create application from cli
Oracle Cloud native functions - create application from cli
 
Oracle Labs - research mission & project potfolio
Oracle Labs - research mission & project potfolioOracle Labs - research mission & project potfolio
Oracle Labs - research mission & project potfolio
 
Install Redis on Oracle Linux
Install Redis on Oracle LinuxInstall Redis on Oracle Linux
Install Redis on Oracle Linux
 
Fn project quick installation guide
Fn project quick installation guideFn project quick installation guide
Fn project quick installation guide
 
Oracle python pandas merge DataFrames
Oracle python pandas merge DataFramesOracle python pandas merge DataFrames
Oracle python pandas merge DataFrames
 
import data from Oracle Database into Python Pandas Dataframe
import data from Oracle Database into Python Pandas Dataframeimport data from Oracle Database into Python Pandas Dataframe
import data from Oracle Database into Python Pandas Dataframe
 
Voice assistants for the insurance industry
Voice assistants for the insurance industry Voice assistants for the insurance industry
Voice assistants for the insurance industry
 
Enterprise wide publish subscribe with Apache Kafka
Enterprise wide publish subscribe with Apache KafkaEnterprise wide publish subscribe with Apache Kafka
Enterprise wide publish subscribe with Apache Kafka
 
Industry 4.0 and Oracle Cloud
Industry 4.0 and Oracle CloudIndustry 4.0 and Oracle Cloud
Industry 4.0 and Oracle Cloud
 
Docker and microservices - moving from a monolith to microservices
Docker and microservices - moving from a monolith to microservicesDocker and microservices - moving from a monolith to microservices
Docker and microservices - moving from a monolith to microservices
 
Cloud native applications for banking
Cloud native applications for bankingCloud native applications for banking
Cloud native applications for banking
 
Conversational retail
Conversational retailConversational retail
Conversational retail
 
Oracle Cloudday security
Oracle Cloudday securityOracle Cloudday security
Oracle Cloudday security
 
Oracle Cloudday - the future of retail
Oracle Cloudday - the future of retailOracle Cloudday - the future of retail
Oracle Cloudday - the future of retail
 
Capgemini Oracle Cloud Access Security Broker
Capgemini Oracle Cloud Access Security BrokerCapgemini Oracle Cloud Access Security Broker
Capgemini Oracle Cloud Access Security Broker
 
Microservices in the oracle cloud
Microservices in the oracle cloudMicroservices in the oracle cloud
Microservices in the oracle cloud
 
Oracle cloud, private, public and hybrid
Oracle cloud, private, public and hybridOracle cloud, private, public and hybrid
Oracle cloud, private, public and hybrid
 

Recently uploaded

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Recently uploaded (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Hadoop HDFS and Oracle

  • 1. HDFS Hadoop Distributed File System Introduction Johan Louwers – Lead Architect Oracle Technology
  • 2. 2Copyright © 2014 Capgemini. All rights reserved. Hadoop HDFS introduction HDFS – Hadoop Distributed File System The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS was originally built as infrastructure for the Apache Nutch web search engine project. HDFS is now an Apache Hadoop subproject. The project URL is http://hadoop.apache.org/hdfs/.
  • 3. 3Copyright © 2014 Capgemini. All rights reserved. Hadoop HDFS introduction HDFS – Simple Cluster Setup Simple HDFS Cluster Setup A) HDFS cluster consisting out of a number of commodity servers. B) A single server containing both a “name node” and a “data node” C) Multiple servers containing a “data node” B C A
  • 4. 4Copyright © 2014 Capgemini. All rights reserved. Hadoop HDFS introduction HDFS – introduction HDFS Name Node • Primary index of where data is stored within the cluster. • Primary entry point for all (applications) clients who request access to HDFS. • Advisable to size the Name Node bigger then the Data Node server. • Option to run a Data Node instance on the same server as the Name Node. • Hadoop 2.0.0 and higher provide the option to have high available Name Node setup. Prior to 2.0.0 the name Node was a single point of Failure. A
  • 5. 5Copyright © 2014 Capgemini. All rights reserved. Hadoop HDFS introduction HDFS – introduction HDFS Storage • A (large) file is “chopped” into blocks. • Blocks are written to the different data nodes in the cluster. • The name node keeps track of which block is written to which node.
  • 6. 6Copyright © 2014 Capgemini. All rights reserved. Hadoop HDFS introduction HDFS – introduction On startup, the NameNode enters a special state called Safemode. Replication of data blocks does not occur when the NameNode is in the Safemode state. HDFS Storage • Data blocks are replicated over different nodes in the cluster to ensure availability when a node fails. • Level of replication is by default 3. Configured with the dfs.replication variable in the HDFS configuration
  • 7. 7Copyright © 2014 Capgemini. All rights reserved. Hadoop HDFS introduction HDFS – introduction HDFS Storage • When operating a large cluster ensure that you have enabled the rack aware option. •Refer to the HADOOP-692 improvement for more details: http://goo.gl/dQ012n Thanks to ChrisDag for the image Typically large Hadoop clusters are arranged in racks and network traffic between different nodes with in the same rack is much more desirable than network traffic across the racks. In addition NameNode tries to place replicas of block on multiple racks for improved fault tolerance.
  • 8. 8Copyright © 2014 Capgemini. All rights reserved. Hadoop HDFS introduction HDFS – Oracle & Big Data Oracle Big Data Appliance Introduction • Oracle Big Data Appliance is a high- performance, secure platform for running diverse workloads on Hadoop and NoSQL systems.
  • 9. 9Copyright © 2014 Capgemini. All rights reserved. Hadoop HDFS introduction HDFS – Oracle & Big Data Oracle Big Data Appliance Introduction • Oracle Big Data Appliance includes (almost without the need to say it) a HDFS storage component for storing data.
  • 10. 10Copyright © 2014 Capgemini. All rights reserved. Hadoop HDFS introduction HDFS – Oracle & Big Data Oracle & Hadoop • Oracle XQuery for Hadoop
  • 11. 11Copyright © 2014 Capgemini. All rights reserved. Hadoop HDFS introduction HDFS – Oracle & Big Data Oracle & Hadoop • Oracle SQL connector for HDFS
  • 12. 12Copyright © 2014 Capgemini. All rights reserved. Hadoop HDFS introduction HDFS – Oracle & Big Data Oracle & Hadoop • Oracle Loader for Hadoop •Online mode •Offline mode
  • 13. 13Copyright © 2014 Capgemini. All rights reserved. Hadoop HDFS introduction HDFS – Oracle & Big Data Oracle & Hadoop • Oracle Loader for Hadoop •Online mode •Offline mode
  • 14. 14Copyright © 2014 Capgemini. All rights reserved. Hadoop HDFS introduction HDFS – Oracle & Big Data Oracle & Hadoop • Oracle Big Data SQL
  • 15. 15Copyright © 2014 Capgemini. All rights reserved. Hadoop HDFS introduction HDFS – Oracle & Big Data Oracle & Hadoop • Oracle Big Data SQL
  • 16. 16Copyright © 2014 Capgemini. All rights reserved. Hadoop HDFS introduction Contact me Johan Louwers Capgemini Lead Architect Oracle Technology • Mail : Johan.Louwers@capgemini.com • Twitter : @johanlouwers • Blog 1 : http://www.capgemini.com/blog/capgemini-oracle-blog • Blog 2 : http://johanlouwers.blogspot.com
  • 17. The information contained in this presentation is proprietary. © 2014 Capgemini. All rights reserved. Rightshore® is a trademark belonging to Capgemini. www.capgemini.com About Capgemini With almost 140,000 people in over 40 countries, Capgemini is one of the world's foremost providers of consulting, technology and outsourcing services. The Group reported 2013 global revenues of EUR 10.1 billion. Together with its clients, Capgemini creates and delivers business and technology solutions that fit their needs and drive the results they want. A deeply multicultural organization, Capgemini has developed its own way of working, the Collaborative Business Experience™, and draws on Rightshore®, its worldwide delivery model. Learn more about us at www.capgemini.com.