SlideShare a Scribd company logo
WHY HADOOP?
• PROSES DATA DENGAN UKURAN YANG SANGAT BESAR
• MAHALNYA HARGA MESIN YANG DAPAT MEMPROSES DATA BESAR DENGAN CEPAT
• EFISIENSI, RELIABEL, DAN MUDAH DIGUNAKAN
• OPEN SOURCE
HADOOP
• SOFTWARE OPENSOURCE DARI APACHE UNTUK KOMPUTASI TERDEISTRIBUSI YANG HANDAL DAN
SKALABILITAS TINGGI
• PEMROSESAN TERDISTRIBUSI DARI KUMPULAN DATA YANG BESAR PADACLUSTER DENGAN
MENGGUNAKAN PEMROGRAMAN SEDERHANA
• MEMILIKI KEMAMPUAN UNTUK MENDETEKSI DAN MENANGANI KEGAGALAN PADALAYER APLIKASI UNTUK
MEMBERIKAN LAYANAN HIGH-AVAILABILTY PADA SETIAP CLUSTER
HADOOP
• HDFS
• NAME NODE
• DATA NODE
• MAP/REDUCE
• JOB TRACKER
• TASK TRACKER
HDFS (HADOOP DISTRIBUTED FILE SYSTEM)
• TEMPAT PENYIMPANAN DATA PADAHADOOP TERDIRI DARI NODE-NODE PENYIMPANAN
• DAPAT MENYIMPAN DATA DALAM JUMLAH BESAR
• HIGH-AVAILABILITY (SETIAP DATA DIDUPLIKASI)
• DATA DIPECAH TERLEBIH DAHULU KE DALAM BENTUKBLOCK-BLOCK SEBELUM DIMASUKKAN KE DALAM
HDFS
• TERDIRI DARI DATANODE DAN NAMENODE
NAME NODE
• TEMPAT MENYIMPAN ALAMAT DATA YANG DIMASUKKAN PADA DATA NODE (META DATA)
• MANAGEMEN KONFIGURASI CLUSTER
• MAPPING BLOCK DATA PADA DATANODE
• SATU CLUSTER TERDAPAT 1 NAMENODE YANG BERJALAN
DATA NODE
• TEMPAT PENYIMPANAN BLOCK-BLOCK FILE
• SATU CLUSTER TERDIRI DARI BEBERAPA DATANODE
• BESAR BLOCK TERSERAH ADMIN (BIASANYA 64MB, 128MB, DST)
MAP/REDUCE
• PROGRAMMING MODEL UNTUK PENGOLAHAN DATA SECARA DISTRIBUSI
• PEMROSESAN DIPECAH MENJADI 2, TAHAPAN MAP DAN TAHAPAN REDUCE
WORD COUNT EXAMPLE
• MAPPER
• INPUT: VALUE: LINES OF TEXT OF INPUT
• OUTPUT: KEY: WORD, VALUE: 1
• REDUCER
• INPUT: KEY: WORD, VALUE: SET OF COUNTS
• OUTPUT: KEY: WORD, VALUE: SUM
• LAUNCHING PROGRAM
• DEFINES THIS JOB
• SUBMITS JOB TO CLUSTER
WORD COUNT DATAFLOW
MATUR TENGKYU

More Related Content

Viewers also liked

Usability test
Usability testUsability test
Usability test
AnsviaLab
 
casperjs presentation
 casperjs presentation casperjs presentation
casperjs presentationAnsviaLab
 
Material Design With Polymer
Material Design With PolymerMaterial Design With Polymer
Material Design With Polymer
AnsviaLab
 
The most technical mistakes in tech startup
The most technical mistakes in tech startupThe most technical mistakes in tech startup
The most technical mistakes in tech startup
AnsviaLab
 
Blackbox And Whitebox Testing
Blackbox And Whitebox TestingBlackbox And Whitebox Testing
Blackbox And Whitebox Testing
AnsviaLab
 
Mengamankan SSH ID
Mengamankan SSH IDMengamankan SSH ID
Mengamankan SSH ID
AnsviaLab
 
Artificial intelligence deep learning
Artificial intelligence deep learningArtificial intelligence deep learning
Artificial intelligence deep learning
AnsviaLab
 
Omni plan
Omni planOmni plan
Omni plan
AnsviaLab
 
Debian server
Debian serverDebian server
Debian server
AnsviaLab
 
Bagaimana menjadi system administrator yang baik
Bagaimana menjadi system administrator yang baikBagaimana menjadi system administrator yang baik
Bagaimana menjadi system administrator yang baik
AnsviaLab
 
Dynamic dns
Dynamic dnsDynamic dns
Dynamic dns
AnsviaLab
 
Intercept Analyze Data
Intercept Analyze DataIntercept Analyze Data
Intercept Analyze Data
AnsviaLab
 
Evaluasi user interface
Evaluasi user interfaceEvaluasi user interface
Evaluasi user interface
AnsviaLab
 
Content marketing
Content marketingContent marketing
Content marketing
AnsviaLab
 
Search engine optimization
Search engine optimizationSearch engine optimization
Search engine optimization
AnsviaLab
 
Best Practices For Writing Super Readable Code
Best Practices For Writing Super Readable CodeBest Practices For Writing Super Readable Code
Best Practices For Writing Super Readable Code
AnsviaLab
 
File carving
File carvingFile carving
File carving
AnsviaLab
 

Viewers also liked (20)

Usability test
Usability testUsability test
Usability test
 
Oop scala
Oop scalaOop scala
Oop scala
 
casperjs presentation
 casperjs presentation casperjs presentation
casperjs presentation
 
Material Design With Polymer
Material Design With PolymerMaterial Design With Polymer
Material Design With Polymer
 
The most technical mistakes in tech startup
The most technical mistakes in tech startupThe most technical mistakes in tech startup
The most technical mistakes in tech startup
 
Blackbox And Whitebox Testing
Blackbox And Whitebox TestingBlackbox And Whitebox Testing
Blackbox And Whitebox Testing
 
Mengamankan SSH ID
Mengamankan SSH IDMengamankan SSH ID
Mengamankan SSH ID
 
Artificial intelligence deep learning
Artificial intelligence deep learningArtificial intelligence deep learning
Artificial intelligence deep learning
 
Omni plan
Omni planOmni plan
Omni plan
 
Debian server
Debian serverDebian server
Debian server
 
Bagaimana menjadi system administrator yang baik
Bagaimana menjadi system administrator yang baikBagaimana menjadi system administrator yang baik
Bagaimana menjadi system administrator yang baik
 
Dynamic dns
Dynamic dnsDynamic dns
Dynamic dns
 
Seo
SeoSeo
Seo
 
CAPISTRANO
CAPISTRANOCAPISTRANO
CAPISTRANO
 
Intercept Analyze Data
Intercept Analyze DataIntercept Analyze Data
Intercept Analyze Data
 
Evaluasi user interface
Evaluasi user interfaceEvaluasi user interface
Evaluasi user interface
 
Content marketing
Content marketingContent marketing
Content marketing
 
Search engine optimization
Search engine optimizationSearch engine optimization
Search engine optimization
 
Best Practices For Writing Super Readable Code
Best Practices For Writing Super Readable CodeBest Practices For Writing Super Readable Code
Best Practices For Writing Super Readable Code
 
File carving
File carvingFile carving
File carving
 

Similar to Hadoop

Aspera - Bridging On Premise and Cloud Deployments for Broadcast IT
Aspera - Bridging On Premise and Cloud Deployments for Broadcast ITAspera - Bridging On Premise and Cloud Deployments for Broadcast IT
Aspera - Bridging On Premise and Cloud Deployments for Broadcast IT
François Quereuil
 
IBM Aspera - Moving the world’s data at maximum speed
IBM Aspera - Moving the world’s data at maximum speedIBM Aspera - Moving the world’s data at maximum speed
IBM Aspera - Moving the world’s data at maximum speed
Mohamed Morsi
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Stefan Lipp
 
Pachube: an open, easy to use, secure & scalable platform for building the 'I...
Pachube: an open, easy to use, secure & scalable platform for building the 'I...Pachube: an open, easy to use, secure & scalable platform for building the 'I...
Pachube: an open, easy to use, secure & scalable platform for building the 'I...
pachube
 
Hadoop
HadoopHadoop
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFiFrom Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Hp hadoop platform
Hp hadoop platformHp hadoop platform
Hp hadoop platform
Akshat Thakar
 
HUG France - Apache Drill
HUG France - Apache DrillHUG France - Apache Drill
HUG France - Apache Drill
MapR Technologies
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL David Smelker
 
Red Hat Storage Day Dallas - Storage for OpenShift Containers
Red Hat Storage Day Dallas - Storage for OpenShift Containers Red Hat Storage Day Dallas - Storage for OpenShift Containers
Red Hat Storage Day Dallas - Storage for OpenShift Containers
Red_Hat_Storage
 
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
DataWorks Summit
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
Amazon Web Services
 
Concepts on Hadoop
Concepts on HadoopConcepts on Hadoop
Concepts on Hadoop
Christopher Sharkey
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
Data Con LA
 
True Reusable Code - DevSum2016
True Reusable Code - DevSum2016True Reusable Code - DevSum2016
True Reusable Code - DevSum2016
Eduard Lazar
 
Building a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloadsBuilding a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloads
Alluxio, Inc.
 
Analytics using big data technologies
Analytics using big data technologiesAnalytics using big data technologies
Analytics using big data technologies
Balakrishnan Vinchu
 
Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Jean-Pierre König
 
Customer Applications Of Hadoop On Red Hat Storage Server
Customer Applications Of Hadoop On Red Hat Storage ServerCustomer Applications Of Hadoop On Red Hat Storage Server
Customer Applications Of Hadoop On Red Hat Storage Server
Red_Hat_Storage
 

Similar to Hadoop (20)

Aspera - Bridging On Premise and Cloud Deployments for Broadcast IT
Aspera - Bridging On Premise and Cloud Deployments for Broadcast ITAspera - Bridging On Premise and Cloud Deployments for Broadcast IT
Aspera - Bridging On Premise and Cloud Deployments for Broadcast IT
 
IBM Aspera - Moving the world’s data at maximum speed
IBM Aspera - Moving the world’s data at maximum speedIBM Aspera - Moving the world’s data at maximum speed
IBM Aspera - Moving the world’s data at maximum speed
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Pachube: an open, easy to use, secure & scalable platform for building the 'I...
Pachube: an open, easy to use, secure & scalable platform for building the 'I...Pachube: an open, easy to use, secure & scalable platform for building the 'I...
Pachube: an open, easy to use, secure & scalable platform for building the 'I...
 
Hadoop
HadoopHadoop
Hadoop
 
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFiFrom Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
 
Hp hadoop platform
Hp hadoop platformHp hadoop platform
Hp hadoop platform
 
HUG France - Apache Drill
HUG France - Apache DrillHUG France - Apache Drill
HUG France - Apache Drill
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
 
Red Hat Storage Day Dallas - Storage for OpenShift Containers
Red Hat Storage Day Dallas - Storage for OpenShift Containers Red Hat Storage Day Dallas - Storage for OpenShift Containers
Red Hat Storage Day Dallas - Storage for OpenShift Containers
 
BIG DATA ANALYSIS
BIG DATA ANALYSISBIG DATA ANALYSIS
BIG DATA ANALYSIS
 
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
Concepts on Hadoop
Concepts on HadoopConcepts on Hadoop
Concepts on Hadoop
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
True Reusable Code - DevSum2016
True Reusable Code - DevSum2016True Reusable Code - DevSum2016
True Reusable Code - DevSum2016
 
Building a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloadsBuilding a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloads
 
Analytics using big data technologies
Analytics using big data technologiesAnalytics using big data technologies
Analytics using big data technologies
 
Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013
 
Customer Applications Of Hadoop On Red Hat Storage Server
Customer Applications Of Hadoop On Red Hat Storage ServerCustomer Applications Of Hadoop On Red Hat Storage Server
Customer Applications Of Hadoop On Red Hat Storage Server
 

Recently uploaded

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 

Hadoop

  • 1.
  • 2. WHY HADOOP? • PROSES DATA DENGAN UKURAN YANG SANGAT BESAR • MAHALNYA HARGA MESIN YANG DAPAT MEMPROSES DATA BESAR DENGAN CEPAT • EFISIENSI, RELIABEL, DAN MUDAH DIGUNAKAN • OPEN SOURCE
  • 3. HADOOP • SOFTWARE OPENSOURCE DARI APACHE UNTUK KOMPUTASI TERDEISTRIBUSI YANG HANDAL DAN SKALABILITAS TINGGI • PEMROSESAN TERDISTRIBUSI DARI KUMPULAN DATA YANG BESAR PADACLUSTER DENGAN MENGGUNAKAN PEMROGRAMAN SEDERHANA • MEMILIKI KEMAMPUAN UNTUK MENDETEKSI DAN MENANGANI KEGAGALAN PADALAYER APLIKASI UNTUK MEMBERIKAN LAYANAN HIGH-AVAILABILTY PADA SETIAP CLUSTER
  • 4. HADOOP • HDFS • NAME NODE • DATA NODE • MAP/REDUCE • JOB TRACKER • TASK TRACKER
  • 5. HDFS (HADOOP DISTRIBUTED FILE SYSTEM) • TEMPAT PENYIMPANAN DATA PADAHADOOP TERDIRI DARI NODE-NODE PENYIMPANAN • DAPAT MENYIMPAN DATA DALAM JUMLAH BESAR • HIGH-AVAILABILITY (SETIAP DATA DIDUPLIKASI) • DATA DIPECAH TERLEBIH DAHULU KE DALAM BENTUKBLOCK-BLOCK SEBELUM DIMASUKKAN KE DALAM HDFS • TERDIRI DARI DATANODE DAN NAMENODE
  • 6. NAME NODE • TEMPAT MENYIMPAN ALAMAT DATA YANG DIMASUKKAN PADA DATA NODE (META DATA) • MANAGEMEN KONFIGURASI CLUSTER • MAPPING BLOCK DATA PADA DATANODE • SATU CLUSTER TERDAPAT 1 NAMENODE YANG BERJALAN
  • 7. DATA NODE • TEMPAT PENYIMPANAN BLOCK-BLOCK FILE • SATU CLUSTER TERDIRI DARI BEBERAPA DATANODE • BESAR BLOCK TERSERAH ADMIN (BIASANYA 64MB, 128MB, DST)
  • 8.
  • 9.
  • 10. MAP/REDUCE • PROGRAMMING MODEL UNTUK PENGOLAHAN DATA SECARA DISTRIBUSI • PEMROSESAN DIPECAH MENJADI 2, TAHAPAN MAP DAN TAHAPAN REDUCE
  • 11. WORD COUNT EXAMPLE • MAPPER • INPUT: VALUE: LINES OF TEXT OF INPUT • OUTPUT: KEY: WORD, VALUE: 1 • REDUCER • INPUT: KEY: WORD, VALUE: SET OF COUNTS • OUTPUT: KEY: WORD, VALUE: SUM • LAUNCHING PROGRAM • DEFINES THIS JOB • SUBMITS JOB TO CLUSTER