SlideShare a Scribd company logo
1 of 59
Download to read offline
Big Data Hadoop – Hands On Workshop
Data Processing Solutions – Comparison Guide
Big Data Workshop Series
Danairat T.
Results
Data Inputs
Cloud
1 2
Data Inputs
Results
Staging
Staging
Staging
Big
DWH
Data
Mart
Data
Mart
Data
Mart
Data
Mart
C
u
b
e
C
u
b
e
C
u
b
e
C
u
b
e
C
u
b
e
Staging
Analy
tic
Resul
ts
Layer
Cube
Layer
Data
Mart
Layer
Data
Warehouse
Layer
Data
Staging
Layer
Data
Source
Layer
1 2 3 4 5 6
Core Hadoop Traditional Data Warehouse
VS.
Big Data Hadoop
Solution 1. Core Hadoop processing
NO data staging transformation and NO data move required!!
Analytic Results
Data Inputs
Top Benefits
1. Cloud and IoT ready architecture roadmap
2. No data duplication with reduce cost of data store/storage
3. Fast data processing and all processing are built-in fault tolerant
4. Align with unify data architecture and data governance
5. Less steps of data processing comparing with traditional DWH
The Effort Investment:-
1. Learn core Hadoop
Cloud Ready
1 2
Big Data Hadoop
Solution 2. Using BI Tools to analyze Hadoop data
Required single transformation to CSV raw text and store in Hadoop HDFS for BI
Tools to connect and represent the visualization
Hadoop HDFS
(CSV Raw Text)
Data Inputs
Top Benefits
1. Lower cost with cloud/IoT ready architecture
2. Fast data processing and all processing are built-in fault tolerant
3. Less steps of data processing comparing with traditional DWH
The Effort Investment:-
1. Learn Hadoop
2. Require transformation to CSV
RAW text for BI Tools
Cloud Ready
1 2 3
Results
Big Data Hadoop
Solution 3. Creating data warehouse in Hadoop
Required single transformation with DWH set up on Hadoop
for BI Tools
Top Benefits
1. Lower cost with cloud/IoT ready architecture
2. Fast data processing and all processing are built-in fault tolerant
3. Less steps of data processing comparing with traditional DWH
The Effort Investment:-
1. Learn core Hadoop
2. Require transformation to CSV RAW
text for BI Tools
3. Require DWH on Hadoop set up
(Hive, Cassandra, HBase)
Hadoop HDFS
Data Inputs
Cloud Ready
Hadoop
DWH
Hive, (or
Cassandra,
Hbase)
1 2 3 4
Results
Big Data Hadoop
Solution 4. Implementing traditional data warehouse
Staging
Staging
Staging
The more data
grow, the
slower data
processing
Data Mart
Data Mart
Data Mart
Data Mart
Top Concerns from Traditional Data Warehouse Architecture
1. A lot of data duplication lead to cost of data store/storage issue
2. Very slow of data processing and need to restart/roll back the job if any failed
3. Data security issue due to keep data too many copies and various formats
Cube
Cube
Cube
Cube
Cube
Staging
Analytic
Results
Layer
Cube
Layer
Data Mart
Layer
Data
Warehouse
Layer
Data
Staging
Layer
Data Source
Layer
1 2 3 4 5 6
Data Inputs
Results
Big Data Hadoop
Benefits Comparison Summary
Benefits
Criteria
Solutions
Cloud
Ready
Archit
ecture
Built-In
Parallel
Proces
sing
IoT
Archite
cture
Roadma
p
Without
DB cube
investm
ent
Witho
ut data
mart
invest
ment
Without
DWH
investme
nt
Without
Staging
data
(RAW
Text)
Unstruct
ured and
RAW
Source
Content
Processin
g
1. Core
Hadoop
Yes Yes Yes Yes Yes Yes Yes Yes
2. Hadoop and
Pentaho/Power
BI
Yes Yes Yes Yes Yes Yes No
(require
CSV)
No
(require
CSV)
3. Hadoop and
Cognos,
RapidMiner,
BO, Cognos,
Tableau
Yes Yes Yes Yes Yes No
(require
Hive
connector)
No
(require
Hive
connector)
No
(require
Hive
connector)
4. Traditional
Data
Warehouse
No No No No No No No No
Big Data Hadoop
Appendix
Big Data Hadoop
Pentaho supports Big Data Inputs
Big Data Hadoop
PowerBI supports Big Data Inputs
Big Data Hadoop
Tableau supports Big Data Inputs
Big Data Hadoop
Rapid Miner supports Big Data Inputs
Big Data Hadoop
Hadoop Cluster Installation and Excel
Parser Processing
Big Data Hadoop
Clone hadoop master to slave1 and slave2
master
slave1
slave2
Big Data Hadoop
At master node: Edit host file
Big Data Hadoop
At master node : Copy key file to slave1 and slave2
scp /home/ubuntu/.ssh/id_dsa.pub ip-172-31-1-8:/home/ubuntu/.ssh/master.pub
scp /home/ubuntu/.ssh/id_dsa.pub 172.31.15.16:/home/ubuntu/.ssh/master.pub
Big Data Hadoop
After this slide, we will use 3 cascaded
windows to represent master node, slave1
node and slave2 node
master node
slave1 node
slave2 node
Big Data Hadoop
At slave1 and slave2: cat /home/ubuntu/.ssh/master.pub >> /home/ubuntu/.ssh/authorized_keys
Big Data Hadoop
At master: Test ssh to slave1 and slave 2
$ ssh ip-172-31-1-8
$ exit
$ ssh ip-172-31-15-16
$ exit
Big Data Hadoop
At master: add slave1 and slave2 to Hadoop slave file
Big Data Hadoop
At master: add slave1 and slave2 to Hadoop slave file
Big Data Hadoop
At master: edit hdfs-site.xml
Big Data Hadoop
At master: edit hdfs-site.xml for 2 replication servers
Big Data Hadoop
At all nodes: remove directories of namenode and datanode
Big Data Hadoop
At master: format namenode
Big Data Hadoop
At master: format namenode
Big Data Hadoop
At master: Execute start-dfs.sh
Big Data Hadoop
At slave1: Check jps result, you will see DataNode has been started
Big Data Hadoop
At slave2: Check jps result, you will see DataNode has been started
Big Data Hadoop
At master: Execute start-yarn.sh
Big Data Hadoop
At slave1: Check jps result, you will see NodeManager has been started
Big Data Hadoop
At slave2: Check jps result, you will see NodeManager has been started
Big Data Hadoop
Importing data into HDFS Cluster
Big Data Hadoop
At master: import data to hdfs
Big Data Hadoop
At slave1: review imported result data from hdfs
Big Data Hadoop
At slave2: review imported result data from hdfs
Big Data Hadoop
Running MapReduce in Cluster Mode
Big Data Hadoop
At master: execute YARN mapreduce program
Big Data Hadoop
At slave1, slave2: you will see Application Master and Yarn Child Container
Big Data Hadoop
At master: review output file from hdfs
Big Data Hadoop
At master: review output file from hdfs
Big Data Hadoop
At slave1, slave2: review output file from hdfs by using command:-
hdfs dfs -cat /outputs/wordcount_output_dir01/part-r-00000
Big Data Hadoop
At master: review output result data from
web console
Big Data Hadoop
At master: review output result data from
web console
Big Data Hadoop
At master: review output result data from
web console
Big Data Hadoop
At master: review output result data from
web console
Big Data Hadoop
Process Excel Worksheet
Big Data Hadoop
1. Create Java Class using POI Libs
Big Data Hadoop
2. Transversal Data in Excel Spreadsheet
Workbook workbook = new XSSFWorkbook(inputStream);
Sheet firstSheet = workbook.getSheetAt(0);
Iterator<Row> iterator = firstSheet.iterator();
while (iterator.hasNext()) {
Row nextRow = iterator.next();
Iterator<Cell> cellIterator = nextRow.cellIterator();
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
Big Data Hadoop
3. Extract Data from Excel Spreadsheet
switch (cell.getCellType()) {
case Cell.CELL_TYPE_STRING:
System.out.print(cell.getStringCellValue());
break;
case Cell.CELL_TYPE_BOOLEAN:
System.out.print(cell.getBooleanCellValue());
break;
case Cell.CELL_TYPE_NUMERIC:
System.out.print(cell.getNumericCellValue());
break;
}
For further integration into HDFS, please emit data to output collector.
Big Data Hadoop
4. Close Excel Spreadsheet
workbook.close();
inputStream.close();
Big Data Hadoop
Excel Processing Results in Hadoop
Big Data Hadoop
Stopping Hadoop Cluster
Big Data Hadoop
At master: execute stop-yarn.sh
Big Data Hadoop
At slave1: use jps to review NodeManager has been stopped
Big Data Hadoop
At slave2: use jps to review NodeManager has been stopped
Big Data Hadoop
At master: execute stop-dfs.sh
Big Data Hadoop
At slave1: use jps to review DataNode has been stopped
Big Data Hadoop
At slave2: use jps to review DataNode has been stopped
Big Data Hadoop
Thank you very much

More Related Content

What's hot

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingCloudera, Inc.
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introductionXuan-Chao Huang
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop IntroductionDzung Nguyen
 
Basics of big data analytics hadoop
Basics of big data analytics hadoopBasics of big data analytics hadoop
Basics of big data analytics hadoopAmbuj Kumar
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceeakasit_dpu
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar ReportAtul Kushwaha
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBaseHortonworks
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Mahantesh Angadi
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with HadoopOReillyStrata
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Edureka!
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 

What's hot (20)

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 
Basics of big data analytics hadoop
Basics of big data analytics hadoopBasics of big data analytics hadoop
Basics of big data analytics hadoop
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Hadoop Family and Ecosystem
Hadoop Family and EcosystemHadoop Family and Ecosystem
Hadoop Family and Ecosystem
 

Viewers also liked

Digital Transformation, Enterprise Architecture, Big Data by Danairat
Digital Transformation, Enterprise Architecture, Big Data by DanairatDigital Transformation, Enterprise Architecture, Big Data by Danairat
Digital Transformation, Enterprise Architecture, Big Data by DanairatDanairat Thanabodithammachari
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Project management for Big Data projects
Project management for Big Data projectsProject management for Big Data projects
Project management for Big Data projectsSandeep Kumar, PMP®
 
Perl for System Automation - 01 Advanced File Processing
Perl for System Automation - 01 Advanced File ProcessingPerl for System Automation - 01 Advanced File Processing
Perl for System Automation - 01 Advanced File ProcessingDanairat Thanabodithammachari
 
How to use Innovative Architectures for Digital Enterprises
How to use Innovative Architectures for Digital EnterprisesHow to use Innovative Architectures for Digital Enterprises
How to use Innovative Architectures for Digital EnterprisesCapgemini
 
IT Portfolio Management Using Enterprise Architecture and ITIL® Service Strategy
IT Portfolio Management Using Enterprise Architecture and ITIL® Service StrategyIT Portfolio Management Using Enterprise Architecture and ITIL® Service Strategy
IT Portfolio Management Using Enterprise Architecture and ITIL® Service StrategyNUS-ISS
 
Oracle Big Data Action Plan for Finance Professionals
Oracle Big Data Action Plan for Finance ProfessionalsOracle Big Data Action Plan for Finance Professionals
Oracle Big Data Action Plan for Finance ProfessionalsRich Clayton
 
Wavelength selection based on wavelength availability
Wavelength selection based on wavelength availabilityWavelength selection based on wavelength availability
Wavelength selection based on wavelength availabilityHrudya Balachandran
 
Aging RPG Programmers in Charge of Your IBM i?
Aging RPG Programmers in Charge of Your IBM i?Aging RPG Programmers in Charge of Your IBM i?
Aging RPG Programmers in Charge of Your IBM i?HelpSystems
 

Viewers also liked (17)

Digital Transformation, Enterprise Architecture, Big Data by Danairat
Digital Transformation, Enterprise Architecture, Big Data by DanairatDigital Transformation, Enterprise Architecture, Big Data by Danairat
Digital Transformation, Enterprise Architecture, Big Data by Danairat
 
JEE Programming - 03 Model View Controller
JEE Programming - 03 Model View ControllerJEE Programming - 03 Model View Controller
JEE Programming - 03 Model View Controller
 
Setting up Hadoop YARN Clustering
Setting up Hadoop YARN ClusteringSetting up Hadoop YARN Clustering
Setting up Hadoop YARN Clustering
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Perl Programming - 04 Programming Database
Perl Programming - 04 Programming DatabasePerl Programming - 04 Programming Database
Perl Programming - 04 Programming Database
 
Perl Programming - 03 Programming File
Perl Programming - 03 Programming FilePerl Programming - 03 Programming File
Perl Programming - 03 Programming File
 
Project management for Big Data projects
Project management for Big Data projectsProject management for Big Data projects
Project management for Big Data projects
 
Perl for System Automation - 01 Advanced File Processing
Perl for System Automation - 01 Advanced File ProcessingPerl for System Automation - 01 Advanced File Processing
Perl for System Automation - 01 Advanced File Processing
 
Perl Programming - 02 Regular Expression
Perl Programming - 02 Regular ExpressionPerl Programming - 02 Regular Expression
Perl Programming - 02 Regular Expression
 
JEE Programming - 05 JSP
JEE Programming - 05 JSPJEE Programming - 05 JSP
JEE Programming - 05 JSP
 
Perl Programming - 01 Basic Perl
Perl Programming - 01 Basic PerlPerl Programming - 01 Basic Perl
Perl Programming - 01 Basic Perl
 
How to use Innovative Architectures for Digital Enterprises
How to use Innovative Architectures for Digital EnterprisesHow to use Innovative Architectures for Digital Enterprises
How to use Innovative Architectures for Digital Enterprises
 
IT Portfolio Management Using Enterprise Architecture and ITIL® Service Strategy
IT Portfolio Management Using Enterprise Architecture and ITIL® Service StrategyIT Portfolio Management Using Enterprise Architecture and ITIL® Service Strategy
IT Portfolio Management Using Enterprise Architecture and ITIL® Service Strategy
 
Oracle Big Data Action Plan for Finance Professionals
Oracle Big Data Action Plan for Finance ProfessionalsOracle Big Data Action Plan for Finance Professionals
Oracle Big Data Action Plan for Finance Professionals
 
Open access and Big Data
Open access and Big DataOpen access and Big Data
Open access and Big Data
 
Wavelength selection based on wavelength availability
Wavelength selection based on wavelength availabilityWavelength selection based on wavelength availability
Wavelength selection based on wavelength availability
 
Aging RPG Programmers in Charge of Your IBM i?
Aging RPG Programmers in Charge of Your IBM i?Aging RPG Programmers in Charge of Your IBM i?
Aging RPG Programmers in Charge of Your IBM i?
 

Similar to Big data Hadoop Analytic and Data warehouse comparison guide

Big data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideBig data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideDanairat Thanabodithammachari
 
SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases  SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases DataWorks Summit
 
Hadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationHadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationVskills
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
 
Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Søren Lund
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderDmitry Makarchuk
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune amrutupre
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High AvailabilityCloudera, Inc.
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQpivotalny
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And HiveCloudera, Inc.
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherJanBask Training
 
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionHadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionEdureka!
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabsSiva Sankar
 
Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseHenk van der Valk
 

Similar to Big data Hadoop Analytic and Data warehouse comparison guide (20)

Big data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideBig data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guide
 
SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases  SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases
 
Hadoop content
Hadoop contentHadoop content
Hadoop content
 
Hadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationHadoop and Mapreduce Certification
Hadoop and Mapreduce Certification
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQ
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
 
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionHadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
 
Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL Polybase
 

More from Danairat Thanabodithammachari

Thailand State Enterprise - Business Architecture and SE-AM
Thailand State Enterprise - Business Architecture and SE-AMThailand State Enterprise - Business Architecture and SE-AM
Thailand State Enterprise - Business Architecture and SE-AMDanairat Thanabodithammachari
 
Agile Organization and Enterprise Architecture v1129 Danairat
Agile Organization and Enterprise Architecture v1129 DanairatAgile Organization and Enterprise Architecture v1129 Danairat
Agile Organization and Enterprise Architecture v1129 DanairatDanairat Thanabodithammachari
 
Enterprise Architecture and Agile Organization Management v1076 Danairat
Enterprise Architecture and Agile Organization Management v1076 DanairatEnterprise Architecture and Agile Organization Management v1076 Danairat
Enterprise Architecture and Agile Organization Management v1076 DanairatDanairat Thanabodithammachari
 
JEE Programming - 08 Enterprise Application Deployment
JEE Programming - 08 Enterprise Application DeploymentJEE Programming - 08 Enterprise Application Deployment
JEE Programming - 08 Enterprise Application DeploymentDanairat Thanabodithammachari
 
Glassfish JEE Server Administration - JEE Introduction
Glassfish JEE Server Administration - JEE IntroductionGlassfish JEE Server Administration - JEE Introduction
Glassfish JEE Server Administration - JEE IntroductionDanairat Thanabodithammachari
 
Glassfish JEE Server Administration - The Enterprise Server
Glassfish JEE Server Administration - The Enterprise ServerGlassfish JEE Server Administration - The Enterprise Server
Glassfish JEE Server Administration - The Enterprise ServerDanairat Thanabodithammachari
 
Glassfish JEE Server Administration - Module 4 Load Balancer
Glassfish JEE Server Administration - Module 4 Load BalancerGlassfish JEE Server Administration - Module 4 Load Balancer
Glassfish JEE Server Administration - Module 4 Load BalancerDanairat Thanabodithammachari
 

More from Danairat Thanabodithammachari (20)

Thailand State Enterprise - Business Architecture and SE-AM
Thailand State Enterprise - Business Architecture and SE-AMThailand State Enterprise - Business Architecture and SE-AM
Thailand State Enterprise - Business Architecture and SE-AM
 
Agile Management
Agile ManagementAgile Management
Agile Management
 
Agile Organization and Enterprise Architecture v1129 Danairat
Agile Organization and Enterprise Architecture v1129 DanairatAgile Organization and Enterprise Architecture v1129 Danairat
Agile Organization and Enterprise Architecture v1129 Danairat
 
Blockchain for Management
Blockchain for ManagementBlockchain for Management
Blockchain for Management
 
Enterprise Architecture and Agile Organization Management v1076 Danairat
Enterprise Architecture and Agile Organization Management v1076 DanairatEnterprise Architecture and Agile Organization Management v1076 Danairat
Enterprise Architecture and Agile Organization Management v1076 Danairat
 
Agile Enterprise Architecture - Danairat
Agile Enterprise Architecture - DanairatAgile Enterprise Architecture - Danairat
Agile Enterprise Architecture - Danairat
 
JEE Programming - 04 Java Servlets
JEE Programming - 04 Java ServletsJEE Programming - 04 Java Servlets
JEE Programming - 04 Java Servlets
 
JEE Programming - 08 Enterprise Application Deployment
JEE Programming - 08 Enterprise Application DeploymentJEE Programming - 08 Enterprise Application Deployment
JEE Programming - 08 Enterprise Application Deployment
 
JEE Programming - 07 EJB Programming
JEE Programming - 07 EJB ProgrammingJEE Programming - 07 EJB Programming
JEE Programming - 07 EJB Programming
 
JEE Programming - 06 Web Application Deployment
JEE Programming - 06 Web Application DeploymentJEE Programming - 06 Web Application Deployment
JEE Programming - 06 Web Application Deployment
 
JEE Programming - 01 Introduction
JEE Programming - 01 IntroductionJEE Programming - 01 Introduction
JEE Programming - 01 Introduction
 
JEE Programming - 02 The Containers
JEE Programming - 02 The ContainersJEE Programming - 02 The Containers
JEE Programming - 02 The Containers
 
Glassfish JEE Server Administration - JEE Introduction
Glassfish JEE Server Administration - JEE IntroductionGlassfish JEE Server Administration - JEE Introduction
Glassfish JEE Server Administration - JEE Introduction
 
Glassfish JEE Server Administration - The Enterprise Server
Glassfish JEE Server Administration - The Enterprise ServerGlassfish JEE Server Administration - The Enterprise Server
Glassfish JEE Server Administration - The Enterprise Server
 
Glassfish JEE Server Administration - Clustering
Glassfish JEE Server Administration - ClusteringGlassfish JEE Server Administration - Clustering
Glassfish JEE Server Administration - Clustering
 
Glassfish JEE Server Administration - Module 4 Load Balancer
Glassfish JEE Server Administration - Module 4 Load BalancerGlassfish JEE Server Administration - Module 4 Load Balancer
Glassfish JEE Server Administration - Module 4 Load Balancer
 
Java Programming - 07 java networking
Java Programming - 07 java networkingJava Programming - 07 java networking
Java Programming - 07 java networking
 
Java Programming - 08 java threading
Java Programming - 08 java threadingJava Programming - 08 java threading
Java Programming - 08 java threading
 
Java Programming - 06 java file io
Java Programming - 06 java file ioJava Programming - 06 java file io
Java Programming - 06 java file io
 
Java Programming - 05 access control in java
Java Programming - 05 access control in javaJava Programming - 05 access control in java
Java Programming - 05 access control in java
 

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Big data Hadoop Analytic and Data warehouse comparison guide

  • 1. Big Data Hadoop – Hands On Workshop Data Processing Solutions – Comparison Guide Big Data Workshop Series Danairat T. Results Data Inputs Cloud 1 2 Data Inputs Results Staging Staging Staging Big DWH Data Mart Data Mart Data Mart Data Mart C u b e C u b e C u b e C u b e C u b e Staging Analy tic Resul ts Layer Cube Layer Data Mart Layer Data Warehouse Layer Data Staging Layer Data Source Layer 1 2 3 4 5 6 Core Hadoop Traditional Data Warehouse VS.
  • 2. Big Data Hadoop Solution 1. Core Hadoop processing NO data staging transformation and NO data move required!! Analytic Results Data Inputs Top Benefits 1. Cloud and IoT ready architecture roadmap 2. No data duplication with reduce cost of data store/storage 3. Fast data processing and all processing are built-in fault tolerant 4. Align with unify data architecture and data governance 5. Less steps of data processing comparing with traditional DWH The Effort Investment:- 1. Learn core Hadoop Cloud Ready 1 2
  • 3. Big Data Hadoop Solution 2. Using BI Tools to analyze Hadoop data Required single transformation to CSV raw text and store in Hadoop HDFS for BI Tools to connect and represent the visualization Hadoop HDFS (CSV Raw Text) Data Inputs Top Benefits 1. Lower cost with cloud/IoT ready architecture 2. Fast data processing and all processing are built-in fault tolerant 3. Less steps of data processing comparing with traditional DWH The Effort Investment:- 1. Learn Hadoop 2. Require transformation to CSV RAW text for BI Tools Cloud Ready 1 2 3 Results
  • 4. Big Data Hadoop Solution 3. Creating data warehouse in Hadoop Required single transformation with DWH set up on Hadoop for BI Tools Top Benefits 1. Lower cost with cloud/IoT ready architecture 2. Fast data processing and all processing are built-in fault tolerant 3. Less steps of data processing comparing with traditional DWH The Effort Investment:- 1. Learn core Hadoop 2. Require transformation to CSV RAW text for BI Tools 3. Require DWH on Hadoop set up (Hive, Cassandra, HBase) Hadoop HDFS Data Inputs Cloud Ready Hadoop DWH Hive, (or Cassandra, Hbase) 1 2 3 4 Results
  • 5. Big Data Hadoop Solution 4. Implementing traditional data warehouse Staging Staging Staging The more data grow, the slower data processing Data Mart Data Mart Data Mart Data Mart Top Concerns from Traditional Data Warehouse Architecture 1. A lot of data duplication lead to cost of data store/storage issue 2. Very slow of data processing and need to restart/roll back the job if any failed 3. Data security issue due to keep data too many copies and various formats Cube Cube Cube Cube Cube Staging Analytic Results Layer Cube Layer Data Mart Layer Data Warehouse Layer Data Staging Layer Data Source Layer 1 2 3 4 5 6 Data Inputs Results
  • 6. Big Data Hadoop Benefits Comparison Summary Benefits Criteria Solutions Cloud Ready Archit ecture Built-In Parallel Proces sing IoT Archite cture Roadma p Without DB cube investm ent Witho ut data mart invest ment Without DWH investme nt Without Staging data (RAW Text) Unstruct ured and RAW Source Content Processin g 1. Core Hadoop Yes Yes Yes Yes Yes Yes Yes Yes 2. Hadoop and Pentaho/Power BI Yes Yes Yes Yes Yes Yes No (require CSV) No (require CSV) 3. Hadoop and Cognos, RapidMiner, BO, Cognos, Tableau Yes Yes Yes Yes Yes No (require Hive connector) No (require Hive connector) No (require Hive connector) 4. Traditional Data Warehouse No No No No No No No No
  • 8. Big Data Hadoop Pentaho supports Big Data Inputs
  • 9. Big Data Hadoop PowerBI supports Big Data Inputs
  • 10. Big Data Hadoop Tableau supports Big Data Inputs
  • 11. Big Data Hadoop Rapid Miner supports Big Data Inputs
  • 12. Big Data Hadoop Hadoop Cluster Installation and Excel Parser Processing
  • 13. Big Data Hadoop Clone hadoop master to slave1 and slave2 master slave1 slave2
  • 14. Big Data Hadoop At master node: Edit host file
  • 15. Big Data Hadoop At master node : Copy key file to slave1 and slave2 scp /home/ubuntu/.ssh/id_dsa.pub ip-172-31-1-8:/home/ubuntu/.ssh/master.pub scp /home/ubuntu/.ssh/id_dsa.pub 172.31.15.16:/home/ubuntu/.ssh/master.pub
  • 16. Big Data Hadoop After this slide, we will use 3 cascaded windows to represent master node, slave1 node and slave2 node master node slave1 node slave2 node
  • 17. Big Data Hadoop At slave1 and slave2: cat /home/ubuntu/.ssh/master.pub >> /home/ubuntu/.ssh/authorized_keys
  • 18. Big Data Hadoop At master: Test ssh to slave1 and slave 2 $ ssh ip-172-31-1-8 $ exit $ ssh ip-172-31-15-16 $ exit
  • 19. Big Data Hadoop At master: add slave1 and slave2 to Hadoop slave file
  • 20. Big Data Hadoop At master: add slave1 and slave2 to Hadoop slave file
  • 21. Big Data Hadoop At master: edit hdfs-site.xml
  • 22. Big Data Hadoop At master: edit hdfs-site.xml for 2 replication servers
  • 23. Big Data Hadoop At all nodes: remove directories of namenode and datanode
  • 24. Big Data Hadoop At master: format namenode
  • 25. Big Data Hadoop At master: format namenode
  • 26. Big Data Hadoop At master: Execute start-dfs.sh
  • 27. Big Data Hadoop At slave1: Check jps result, you will see DataNode has been started
  • 28. Big Data Hadoop At slave2: Check jps result, you will see DataNode has been started
  • 29. Big Data Hadoop At master: Execute start-yarn.sh
  • 30. Big Data Hadoop At slave1: Check jps result, you will see NodeManager has been started
  • 31. Big Data Hadoop At slave2: Check jps result, you will see NodeManager has been started
  • 32. Big Data Hadoop Importing data into HDFS Cluster
  • 33. Big Data Hadoop At master: import data to hdfs
  • 34. Big Data Hadoop At slave1: review imported result data from hdfs
  • 35. Big Data Hadoop At slave2: review imported result data from hdfs
  • 36. Big Data Hadoop Running MapReduce in Cluster Mode
  • 37. Big Data Hadoop At master: execute YARN mapreduce program
  • 38. Big Data Hadoop At slave1, slave2: you will see Application Master and Yarn Child Container
  • 39. Big Data Hadoop At master: review output file from hdfs
  • 40. Big Data Hadoop At master: review output file from hdfs
  • 41. Big Data Hadoop At slave1, slave2: review output file from hdfs by using command:- hdfs dfs -cat /outputs/wordcount_output_dir01/part-r-00000
  • 42. Big Data Hadoop At master: review output result data from web console
  • 43. Big Data Hadoop At master: review output result data from web console
  • 44. Big Data Hadoop At master: review output result data from web console
  • 45. Big Data Hadoop At master: review output result data from web console
  • 46. Big Data Hadoop Process Excel Worksheet
  • 47. Big Data Hadoop 1. Create Java Class using POI Libs
  • 48. Big Data Hadoop 2. Transversal Data in Excel Spreadsheet Workbook workbook = new XSSFWorkbook(inputStream); Sheet firstSheet = workbook.getSheetAt(0); Iterator<Row> iterator = firstSheet.iterator(); while (iterator.hasNext()) { Row nextRow = iterator.next(); Iterator<Cell> cellIterator = nextRow.cellIterator(); while (cellIterator.hasNext()) { Cell cell = cellIterator.next();
  • 49. Big Data Hadoop 3. Extract Data from Excel Spreadsheet switch (cell.getCellType()) { case Cell.CELL_TYPE_STRING: System.out.print(cell.getStringCellValue()); break; case Cell.CELL_TYPE_BOOLEAN: System.out.print(cell.getBooleanCellValue()); break; case Cell.CELL_TYPE_NUMERIC: System.out.print(cell.getNumericCellValue()); break; } For further integration into HDFS, please emit data to output collector.
  • 50. Big Data Hadoop 4. Close Excel Spreadsheet workbook.close(); inputStream.close();
  • 51. Big Data Hadoop Excel Processing Results in Hadoop
  • 52. Big Data Hadoop Stopping Hadoop Cluster
  • 53. Big Data Hadoop At master: execute stop-yarn.sh
  • 54. Big Data Hadoop At slave1: use jps to review NodeManager has been stopped
  • 55. Big Data Hadoop At slave2: use jps to review NodeManager has been stopped
  • 56. Big Data Hadoop At master: execute stop-dfs.sh
  • 57. Big Data Hadoop At slave1: use jps to review DataNode has been stopped
  • 58. Big Data Hadoop At slave2: use jps to review DataNode has been stopped
  • 59. Big Data Hadoop Thank you very much