SlideShare a Scribd company logo
Background
● Big data challenge - (link)
User Case - Retails
●
知己知彼,百戰不殆 – Retail and The Big Data Evolution:
User Case - Retails
● Data is providing gains in three main ways: opening new channels,
tailoring my service, and driving revenue:
Pain Point
● Big data is the term for a collection of data sets so large and complex that it
becomes difficult to process using on-hand database management tools or
traditional data processing applications.
So what is Hadoop?
● Hadoop was created by Doug Cutting and Mike Cafarella.
● Hadoop provides the reliable shared storage and analysis system. (HDFS)
● It is designed to scale up from a single server to thousand of machines, with a
high degree of fault tolerance. (HDFS)
● It provides a programming model and an associated implementation for
processing and generating large data sets with a parallel, distributed
algorithm on a cluster. (MapReduce)
History
Hadoop Distributed FileSystem (HDFS)
● A given file is broken down into blocks (default=64MB), then blocks are replicated
across cluster (default=3).
● The number and size of file have no limition.
● HDFS allows you to put/get/delete files. (No update!)
● Follows the philosophy – "Write Once, Read Multiple Times!"
MapReduceFlow
Hadoop Ecosystem
● The Hadoop Ecosystem Table
Flume
● Flume is a distributed, reliable, and available service for efficiently collecting,
aggregating, and moving large amounts of log data.
● Each Flume agent has a source, a sink and a channel
Sqoop
● Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data
between Apache Hadoop and structured datastores such as relational databases.
Hive
● Apache Hive is a high-level abstraction on top of MapReduce
● Uses an SQL-like language called HiveQL
● Generates MapReduce jobs that run on the Hadoop cluster
● Originally developed by Facebook for data warehousing - Now an open-source
Apache project.
HBase
● Apache HBase™ is the Hadoop database, a distributed, scalable, big data store.
● Linear scalability, capabile of storing hundreds of terabytes of data
● Automatic and configurable sharding of tables
● Automatic failover support
● Block cache and Bloom Filters for real-time queries
● Provides realtime random read/write access to data stored in HDFS.
Pig
● Apache Pig is a platform for analyzing large data sets that consists of a high-level
language for expressing data analysis programs, coupled with infrastructure for
evaluating these programs.
● The data flow language (Pig Latin)
● The interactive shell where you can type Pig Latin statements (Grunt)
● The Pig interpreter and execution engine
Oozie
● Oozie is a 'workflow engine' which runs on a server and typically outside the
cluster. It can runs workflows of Hadoop jobs including Pig, Hive, Sqoop jobs and
submit those jobs to the cluster based on a workflow definition.
Why?
Why?
Appendix
● What is Hadoop?
● https://www.youtube.com/watch?v=4DgTLaFNQq0
● https://www.youtube.com/watch?v=9s-vSeWej1U
● Intro to MapReduce
● https://www.youtube.com/watch?v=HFplUBeBhcM
● What is Big Data? Big Data Explained (Hadoop & MapReduce)
● Big Data University - Hadoop Fundamentals I - v2
● Big Data Challenges
Appendix
● Hadoop Tutorial 1 - What is Hadoop?
● Hadoop Tutorial 2 - Challenges Created by Big Data
● Hadoop Tutorial 3 - History Behind Creation of Hadoop (Google, Yahoo, and Apac
● Hadoop Tutorial 4 - Overview of Hadoop Projects
● Hadoop Tutorial 5 - Steps to Install Hadoop on a Personal Computer (Windows/OS
● Hadoop Tutorial 6 - Downloading and Installing Oracle VirtualBox
● Hadoop Tutorial 7 - Downloading Hadoop Appliance for Oracle VirtualBox
●

More Related Content

What's hot

Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
Sohini~~ Music
 
Hadoop training
Hadoop trainingHadoop training
Hadoop training
TIB Academy
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
Arvind Kumar
 
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
AyeeshaParveen
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
Sandip Darwade
 
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
AyeeshaParveen
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
Ece Seçil AKBAŞ
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
Rahul Agarwal
 
Anju
AnjuAnju
Hadoop
Hadoop Hadoop
Hadoop
Shamama Kamal
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
Federico Cargnelutti
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
Siva Pandeti
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
Edureka!
 
Big data and tools
Big data and tools Big data and tools
Big data and tools
Shivam Shukla
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
maharajothip1
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
Manish Borkar
 
Apache Hadoop at 10
Apache Hadoop at 10Apache Hadoop at 10
Apache Hadoop at 10
Cloudera, Inc.
 
Bn1028 demo hadoop administration and development
Bn1028 demo  hadoop administration and developmentBn1028 demo  hadoop administration and development
Bn1028 demo hadoop administration and development
conline training
 

What's hot (19)

Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Hadoop training
Hadoop trainingHadoop training
Hadoop training
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Anju
AnjuAnju
Anju
 
Hadoop
Hadoop Hadoop
Hadoop
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
Big data and tools
Big data and tools Big data and tools
Big data and tools
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Apache Hadoop at 10
Apache Hadoop at 10Apache Hadoop at 10
Apache Hadoop at 10
 
Bn1028 demo hadoop administration and development
Bn1028 demo  hadoop administration and developmentBn1028 demo  hadoop administration and development
Bn1028 demo hadoop administration and development
 

Viewers also liked

Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
Tapan Avasthi
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Skillspeed
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
Avkash Chauhan
 
Hive tuning
Hive tuningHive tuning
Hive tuning
Michael Zhang
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
Lynn Langit
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
Zheng Shao
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
Carl Steinbach
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Hortonworks
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
Hortonworks
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Skillspeed
 

Viewers also liked (10)

Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 

Similar to Hadoop introduction

Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
Humoyun Ahmedov
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
Krisshhna Daasaarii
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
KrishnenduKrishh
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
Kibrom Gebrehiwot
 
What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?
tommychauhan
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
Mahabubur Rahaman
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
RajatTripathi34
 
CSB_community
CSB_communityCSB_community
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
Mohanasundaram Ponnusamy
 
Hadoop
HadoopHadoop
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
Hitendra Kumar
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoop
Omar Jaber
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
umapavankumar kethavarapu
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
Thanh Nguyen
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
Thanh Nguyen
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
amrutupre
 
Hadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute BeginnersHadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute Beginners
Sam Dias
 
Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache Hadoop
Sufi Nawaz
 

Similar to Hadoop introduction (20)

Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
CSB_community
CSB_communityCSB_community
CSB_community
 
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoop
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 
Hadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute BeginnersHadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute Beginners
 
Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache Hadoop
 

Recently uploaded

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 

Recently uploaded (20)

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 

Hadoop introduction

  • 1.
  • 2. Background ● Big data challenge - (link)
  • 3. User Case - Retails ● 知己知彼,百戰不殆 – Retail and The Big Data Evolution:
  • 4. User Case - Retails ● Data is providing gains in three main ways: opening new channels, tailoring my service, and driving revenue:
  • 5. Pain Point ● Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.
  • 6. So what is Hadoop? ● Hadoop was created by Doug Cutting and Mike Cafarella. ● Hadoop provides the reliable shared storage and analysis system. (HDFS) ● It is designed to scale up from a single server to thousand of machines, with a high degree of fault tolerance. (HDFS) ● It provides a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. (MapReduce)
  • 8. Hadoop Distributed FileSystem (HDFS) ● A given file is broken down into blocks (default=64MB), then blocks are replicated across cluster (default=3). ● The number and size of file have no limition. ● HDFS allows you to put/get/delete files. (No update!) ● Follows the philosophy – "Write Once, Read Multiple Times!"
  • 10. Hadoop Ecosystem ● The Hadoop Ecosystem Table
  • 11. Flume ● Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. ● Each Flume agent has a source, a sink and a channel
  • 12. Sqoop ● Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
  • 13. Hive ● Apache Hive is a high-level abstraction on top of MapReduce ● Uses an SQL-like language called HiveQL ● Generates MapReduce jobs that run on the Hadoop cluster ● Originally developed by Facebook for data warehousing - Now an open-source Apache project.
  • 14. HBase ● Apache HBase™ is the Hadoop database, a distributed, scalable, big data store. ● Linear scalability, capabile of storing hundreds of terabytes of data ● Automatic and configurable sharding of tables ● Automatic failover support ● Block cache and Bloom Filters for real-time queries ● Provides realtime random read/write access to data stored in HDFS.
  • 15. Pig ● Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. ● The data flow language (Pig Latin) ● The interactive shell where you can type Pig Latin statements (Grunt) ● The Pig interpreter and execution engine
  • 16. Oozie ● Oozie is a 'workflow engine' which runs on a server and typically outside the cluster. It can runs workflows of Hadoop jobs including Pig, Hive, Sqoop jobs and submit those jobs to the cluster based on a workflow definition.
  • 17. Why?
  • 18. Why?
  • 19. Appendix ● What is Hadoop? ● https://www.youtube.com/watch?v=4DgTLaFNQq0 ● https://www.youtube.com/watch?v=9s-vSeWej1U ● Intro to MapReduce ● https://www.youtube.com/watch?v=HFplUBeBhcM ● What is Big Data? Big Data Explained (Hadoop & MapReduce) ● Big Data University - Hadoop Fundamentals I - v2 ● Big Data Challenges
  • 20. Appendix ● Hadoop Tutorial 1 - What is Hadoop? ● Hadoop Tutorial 2 - Challenges Created by Big Data ● Hadoop Tutorial 3 - History Behind Creation of Hadoop (Google, Yahoo, and Apac ● Hadoop Tutorial 4 - Overview of Hadoop Projects ● Hadoop Tutorial 5 - Steps to Install Hadoop on a Personal Computer (Windows/OS ● Hadoop Tutorial 6 - Downloading and Installing Oracle VirtualBox ● Hadoop Tutorial 7 - Downloading Hadoop Appliance for Oracle VirtualBox ●