This document outlines an introductory workshop on big data held by the BigData Community. The workshop agenda includes an introduction to big data and the Hadoop ecosystem, demonstrations of Hadoop installation in standalone and pseudo-distributed modes, and a hands-on Java application example. Attendees are guided through setting up a test environment, downloading and configuring Hadoop, and testing the installation. The goal is to provide 120 students and 5 universities with an awareness of big data science and engineering through hands-on training.
6. Introduction
6BidData Community : Intro to BigData Workshop
Objectives
● Training 120 4th year student from
Engineering and computer science faculties.
● Running 5 awareness sessions targeting five
universities in the first phase.
9. Introduction
9BidData Community : Intro to BigData Workshop
Call for volunteers
Business
Development
Administration
Development
Administrative
& Logistics
10. Introduction
10BidData Community : Intro to BigData Workshop
Today RoadMap
1. Big Data Introduction
2. Hadoop installation
3. Small Java application
15. BigData & Ecosystem
15BidData Community : Intro to BigData Workshop
What is the maximum file size
you have?
Movies/Files/Streaming video that you have used?
What is the maximum download
speed you get?
How much time to just transfer?
16. BigData & Ecosystem
16BidData Community : Intro to BigData Workshop
How do you process
Distributed System
Massive data?
Distributed Computing System
17. BigData & Ecosystem
17BidData Community : Intro to BigData Workshop
Now, What is difference between
Big DataMassive Data and
18. BigData & Ecosystem
18BidData Community : Intro to BigData Workshop
From Where Data are Generated?
Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Mobile devices
(tracking all objects all the time)
Sensor technology and networks
(measuring all kinds of data)
24. BigData & Ecosystem
24BidData Community : Intro to BigData Workshop
Hadoop !=
Database
Hadoop
Hadoop is a
Distributed Storage and
Computation Framework
43. What’s Hadoop?
Is a Java-based programming framework that supports the
processing of large data sets in a distributed computing
environment.
43BidData Community : Intro to BigData Workshop
BigData practice
44. Hadoop installation modes
○ Stand alone mode.
○ Pseudo distributed mode.
○ Fully distributed mode.
44BidData Community : Intro to BigData Workshop
BigData practice
56. Download & Install
○ In the Linux Terminal, Write:
“ wget http://supergsego.com/apache/hadoop/common/hadoop-1.2.1/hadoop-
1.2.1-bin.tar.gz ” & hit ENTER
56BidData Community : Intro to BigData Workshop
BigData practice
57. Editing .bashrc file
○ #gedite ~/.bashrc
○ Add the following lines at the end of the file
57BidData Community : Intro to BigData Workshop
BigData practice
58. Main Installation
○ #tar –zxvf hadoop-1.2.1-bin.tar.gz
58BidData Community : Intro to BigData Workshop
BigData practice
59. Editing hadoop-env.sh
○ #gedite /opt/hadoop/conf/hadoop-env.sh
59BidData Community : Intro to BigData Workshop
BigData practice
60. Editing conf/*-site.xml files
○ 1- “Core-site.xml” File:
○ #gedit /opt/hadoop/conf/core-site.xml
60BidData Community : Intro to BigData Workshop
BigData practice
61. Editing conf/*-site.xml files
○ 2-”Mapred-site.xml” File
○ #gedit /opt/hadoop/conf/mapred-site.xml
61BidData Community : Intro to BigData Workshop
BigData practice
62. Editing conf/*-site.xml files
○ 3-”hdfs-site.xml” File.
○ #gedit /opt/hadoop/conf/hdfs-site.xml
62BidData Community : Intro to BigData Workshop
BigData practice
63. Formatting Namenode F.S.
○ #hadoop namenode –format
63BidData Community : Intro to BigData Workshop
BigData practice
64. Firing Hadoop Deamons
○ #start-all.sh
64BidData Community : Intro to BigData Workshop
BigData practice