SlideShare a Scribd company logo
Data Science Bootcamp, Day 2
Presented By:
Chetan Khatri, Volunteer Teaching assistant, Data Science Lab.
Guidance By:
Prof. Devji D. Chhanga, University of Kachchh.
Agenda
Understanding Git.
Understanding Apache Maven.
Hello World Java Program with Apache Maven.
Understanding of Hadoop Administrative Commands.
WordCount Hadoop Program on Hadoop Cluster with Maven.
Git with Github
● Github: Repository storage where you can store your source code and share
with team member work interactively.
● Installation: sudo apt-get install git
● Steps TODO:
1. Create Repository
2. Clone - Copy someone else's repository
3. Commit - Ready to submit your code to repository.
Let’s have Demo with Git
● Create Repository at Github named hadoopdemo
● Cloning Repository: git clone https://github.com/dskskv/hadoopdemo.git
● Configure github with your credentials:
git config --global user.email "you@example.com"
git config --global user.name "Your Name"
Add individual file: sudo git add README.md
for adding every files: sudo git add .
Let’s have Demo with Git (Conti…)
commit command - sudo git commit -m "Comment anything"
Submit request to github repository with whatever has been added:
sudo git origin master
pull - is to get latest updated code from repository
Example : git pull https://github.com/dskskv/hadoopdemo.git
Git Branches:
Are Different Modules of the Repository, Such as Development, Test, Production phase of the
software development.
Master branch has always updated code.
Understanding Apache Maven
Apache Maven is Build Tool for Java, where you can use Other Artifacts(Jar files
written by someone else) and build your Jar file which contains all other’s
added before.
Maven Life Cycle:
Create Maven Project
Update Maven Project
Write Java Code
Maven Clean
Maven Build (For building your Jar file)
Understanding Hadoop Administrative Commands
1. Cloning github cccs936 repository
git clone https://github.com/dskskv/CCCS936.git
2. Start Hadoop Cluster
sbin/start-dfs.sh
sbin/start-yarn.sh
3. Check Hadoop Version
hadoop version
4. Check all the options under hadoop command
hadoop
5. Create Directory as "dskskv" at HDFS
hadoop fs -mkdir /dskskv
Understanding Hadoop Administrative Commands
6. List out the contents of dskskv object inside HDFS
hadoop fs -ls /dskskv
7. Create Text file
sudo gedit inputfile.txt
8. Put text file inside HDFS block
hadoop fs -put inputfile.txt /dskskv
9. Read the content of HDFS textfile object
hadoop fs -cat /dskskv/inputfile.txt
Understanding Hadoop Administrative Commands
10. hadoop deprecated, use hdfs also for the same operations.
hdfs dfs -mkdir /chetan
hdfs dfs -put inputfile.txt /chetan
hdfs dfs -cat /chetan/inputfile.txt
11. Deleting file from HDFS
hadoop fs -rm /dskskv/inputfile.txt
12. Deleting Directory from HDFS
hadoop fs -rm -r /dskskv
WordCount Hadoop Program on Hadoop Cluster with
Maven
1) Login as a Hadoop User:
su hduser
2) Start hadoop deamon services
sbin/start-dfs.sh
sbin/start-yarn.sh
3) Check whether all deamon services are up or not
jps
4) Create directory in HDFS, Note: make sure wherever you are in the console , Hadoop user should
have previlegies to access it.
hadoop fs -mkdir /input
5) Transfer textfile to HDFS
hadoop fs -put inputfile.txt /input
WordCount Hadoop Program on Hadoop Cluster with
Maven
6) Check whether file is transferred successfully
hadoop fs -ls /input
7) execute hadoop job by providing Hadoop Program executable Jar file and input directory path where
text file is there and output directory path where you are looking to store process data.
hadoop jar WordCountDSKSKV-0.0.1-SNAPSHOT.jar /input /output
8) Check Processed Directory has processed files ?
hadoop fs -ls /output
9) Read your desired output from Hadoop Job.
hadoop fs -cat /output/part-r-00000

More Related Content

What's hot

時代在變 Docker 要會:台北 Docker 一日入門篇
時代在變 Docker 要會:台北 Docker 一日入門篇時代在變 Docker 要會:台北 Docker 一日入門篇
時代在變 Docker 要會:台北 Docker 一日入門篇
Philip Zheng
 
容器與資料科學應用
容器與資料科學應用容器與資料科學應用
容器與資料科學應用
Philip Zheng
 
桃園市教育局Docker技術入門與實作
桃園市教育局Docker技術入門與實作桃園市教育局Docker技術入門與實作
桃園市教育局Docker技術入門與實作
Philip Zheng
 
Drush in the Composer Era
Drush in the Composer EraDrush in the Composer Era
Drush in the Composer Era
Pantheon
 
Using Containers for Continuous Integration and Continuous Delivery
Using Containers for Continuous Integration and Continuous DeliveryUsing Containers for Continuous Integration and Continuous Delivery
Using Containers for Continuous Integration and Continuous Delivery
Carlos Sanchez
 
Docker for mere mortals
Docker for mere mortalsDocker for mere mortals
Docker for mere mortals
Henryk Konsek
 
Introduzione a GitHub Actions (beta)
Introduzione a GitHub Actions (beta)Introduzione a GitHub Actions (beta)
Introduzione a GitHub Actions (beta)
Giulio Vian
 
Карманный PaaS с Dokku (Александр Белецкий)
Карманный PaaS с Dokku (Александр Белецкий)Карманный PaaS с Dokku (Александр Белецкий)
Карманный PaaS с Dokku (Александр Белецкий)
GeeksLab Odessa
 
Production FS: Adapt or die - Claudia Beresford & Tiago Scolar
Production FS: Adapt or die - Claudia Beresford & Tiago ScolarProduction FS: Adapt or die - Claudia Beresford & Tiago Scolar
Production FS: Adapt or die - Claudia Beresford & Tiago Scolar
Paris Container Day
 
Guava
GuavaGuava
Guava
fbenault
 
Cloud Foundry V2 | Intermediate Deep Dive
Cloud Foundry V2 | Intermediate Deep DiveCloud Foundry V2 | Intermediate Deep Dive
Cloud Foundry V2 | Intermediate Deep Dive
Kazuto Kusama
 
Docker研習營
Docker研習營Docker研習營
Docker研習營
Philip Zheng
 
Introduction to Project atomic (CentOS Dojo Bangalore)
Introduction to Project atomic (CentOS Dojo Bangalore)Introduction to Project atomic (CentOS Dojo Bangalore)
Introduction to Project atomic (CentOS Dojo Bangalore)
Lalatendu Mohanty
 
Drupal in Libraries
Drupal in LibrariesDrupal in Libraries
Drupal in Libraries
Cary Gordon
 
Composer Tools & Frameworks for Drupal
Composer Tools & Frameworks for DrupalComposer Tools & Frameworks for Drupal
Composer Tools & Frameworks for Drupal
Pantheon
 
Using Git with Drupal
Using Git with DrupalUsing Git with Drupal
Using Git with Drupal
Ryan Cross
 
Plone in news media
Plone in news mediaPlone in news media
Plone in news media
Héctor Velarde
 
Challenges of container configuration
Challenges of container configurationChallenges of container configuration
Challenges of container configuration
lutter
 
Automating Mendix application deployments with Nix
Automating Mendix application deployments with NixAutomating Mendix application deployments with Nix
Automating Mendix application deployments with Nix
Sander van der Burg
 
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQDocker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
Erica Windisch
 

What's hot (20)

時代在變 Docker 要會:台北 Docker 一日入門篇
時代在變 Docker 要會:台北 Docker 一日入門篇時代在變 Docker 要會:台北 Docker 一日入門篇
時代在變 Docker 要會:台北 Docker 一日入門篇
 
容器與資料科學應用
容器與資料科學應用容器與資料科學應用
容器與資料科學應用
 
桃園市教育局Docker技術入門與實作
桃園市教育局Docker技術入門與實作桃園市教育局Docker技術入門與實作
桃園市教育局Docker技術入門與實作
 
Drush in the Composer Era
Drush in the Composer EraDrush in the Composer Era
Drush in the Composer Era
 
Using Containers for Continuous Integration and Continuous Delivery
Using Containers for Continuous Integration and Continuous DeliveryUsing Containers for Continuous Integration and Continuous Delivery
Using Containers for Continuous Integration and Continuous Delivery
 
Docker for mere mortals
Docker for mere mortalsDocker for mere mortals
Docker for mere mortals
 
Introduzione a GitHub Actions (beta)
Introduzione a GitHub Actions (beta)Introduzione a GitHub Actions (beta)
Introduzione a GitHub Actions (beta)
 
Карманный PaaS с Dokku (Александр Белецкий)
Карманный PaaS с Dokku (Александр Белецкий)Карманный PaaS с Dokku (Александр Белецкий)
Карманный PaaS с Dokku (Александр Белецкий)
 
Production FS: Adapt or die - Claudia Beresford & Tiago Scolar
Production FS: Adapt or die - Claudia Beresford & Tiago ScolarProduction FS: Adapt or die - Claudia Beresford & Tiago Scolar
Production FS: Adapt or die - Claudia Beresford & Tiago Scolar
 
Guava
GuavaGuava
Guava
 
Cloud Foundry V2 | Intermediate Deep Dive
Cloud Foundry V2 | Intermediate Deep DiveCloud Foundry V2 | Intermediate Deep Dive
Cloud Foundry V2 | Intermediate Deep Dive
 
Docker研習營
Docker研習營Docker研習營
Docker研習營
 
Introduction to Project atomic (CentOS Dojo Bangalore)
Introduction to Project atomic (CentOS Dojo Bangalore)Introduction to Project atomic (CentOS Dojo Bangalore)
Introduction to Project atomic (CentOS Dojo Bangalore)
 
Drupal in Libraries
Drupal in LibrariesDrupal in Libraries
Drupal in Libraries
 
Composer Tools & Frameworks for Drupal
Composer Tools & Frameworks for DrupalComposer Tools & Frameworks for Drupal
Composer Tools & Frameworks for Drupal
 
Using Git with Drupal
Using Git with DrupalUsing Git with Drupal
Using Git with Drupal
 
Plone in news media
Plone in news mediaPlone in news media
Plone in news media
 
Challenges of container configuration
Challenges of container configurationChallenges of container configuration
Challenges of container configuration
 
Automating Mendix application deployments with Nix
Automating Mendix application deployments with NixAutomating Mendix application deployments with Nix
Automating Mendix application deployments with Nix
 
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQDocker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
 

Viewers also liked

Data science bootcamp day 3
Data science bootcamp day 3Data science bootcamp day 3
Data science bootcamp day 3
Chetan Khatri
 
Data science bootcamp day1
Data science bootcamp day1Data science bootcamp day1
Data science bootcamp day1
Chetan Khatri
 
Pycon india-2016-success-story
Pycon india-2016-success-storyPycon india-2016-success-story
Pycon india-2016-success-story
Chetan Khatri
 
Alumni talk-university-of-kachchh
Alumni talk-university-of-kachchhAlumni talk-university-of-kachchh
Alumni talk-university-of-kachchh
Chetan Khatri
 
Internet of things initiative-cskskv
Internet of things   initiative-cskskvInternet of things   initiative-cskskv
Internet of things initiative-cskskv
Chetan Khatri
 
Think Machine Learning with Scikit-Learn (Python)
Think Machine Learning with Scikit-Learn (Python)Think Machine Learning with Scikit-Learn (Python)
Think Machine Learning with Scikit-Learn (Python)
Chetan Khatri
 
Data Analytics with Pandas and Numpy - Python
Data Analytics with Pandas and Numpy - PythonData Analytics with Pandas and Numpy - Python
Data Analytics with Pandas and Numpy - Python
Chetan Khatri
 
Feature Release
Feature ReleaseFeature Release
Feature Release
Charlisa Bailey
 
Filme terror 2013
Filme terror 2013Filme terror 2013
Filme terror 2013
Rafael Wolf
 
8617 Taylor Road
8617 Taylor Road8617 Taylor Road
8617 Taylor Road
Christopher Sandine
 
News Release
News ReleaseNews Release
News Release
Charlisa Bailey
 
Job fair at seattle
Job fair at seattleJob fair at seattle
Job fair at seattle
zeenatkassam
 
Publication plan slideshare
Publication plan slidesharePublication plan slideshare
Publication plan slideshare
caitlinhardinASmedia
 
Mart6ha
Mart6haMart6ha
Romance powerpoint
Romance powerpointRomance powerpoint
Romance powerpoint
betha2media
 
Davidson Capital - NOAH15 London
Davidson Capital - NOAH15 LondonDavidson Capital - NOAH15 London
Davidson Capital - NOAH15 London
NOAH Advisors
 
Numpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunchingNumpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunching
Data Science London
 

Viewers also liked (17)

Data science bootcamp day 3
Data science bootcamp day 3Data science bootcamp day 3
Data science bootcamp day 3
 
Data science bootcamp day1
Data science bootcamp day1Data science bootcamp day1
Data science bootcamp day1
 
Pycon india-2016-success-story
Pycon india-2016-success-storyPycon india-2016-success-story
Pycon india-2016-success-story
 
Alumni talk-university-of-kachchh
Alumni talk-university-of-kachchhAlumni talk-university-of-kachchh
Alumni talk-university-of-kachchh
 
Internet of things initiative-cskskv
Internet of things   initiative-cskskvInternet of things   initiative-cskskv
Internet of things initiative-cskskv
 
Think Machine Learning with Scikit-Learn (Python)
Think Machine Learning with Scikit-Learn (Python)Think Machine Learning with Scikit-Learn (Python)
Think Machine Learning with Scikit-Learn (Python)
 
Data Analytics with Pandas and Numpy - Python
Data Analytics with Pandas and Numpy - PythonData Analytics with Pandas and Numpy - Python
Data Analytics with Pandas and Numpy - Python
 
Feature Release
Feature ReleaseFeature Release
Feature Release
 
Filme terror 2013
Filme terror 2013Filme terror 2013
Filme terror 2013
 
8617 Taylor Road
8617 Taylor Road8617 Taylor Road
8617 Taylor Road
 
News Release
News ReleaseNews Release
News Release
 
Job fair at seattle
Job fair at seattleJob fair at seattle
Job fair at seattle
 
Publication plan slideshare
Publication plan slidesharePublication plan slideshare
Publication plan slideshare
 
Mart6ha
Mart6haMart6ha
Mart6ha
 
Romance powerpoint
Romance powerpointRomance powerpoint
Romance powerpoint
 
Davidson Capital - NOAH15 London
Davidson Capital - NOAH15 LondonDavidson Capital - NOAH15 London
Davidson Capital - NOAH15 London
 
Numpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunchingNumpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunching
 

Similar to Data science bootcamp day2

Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab Assignment
Farzad Nozarian
 
Word count program execution steps in hadoop
Word count program execution steps in hadoopWord count program execution steps in hadoop
Word count program execution steps in hadoop
jijukjoseph
 
Big data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with InstallationBig data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with Installation
mellempudilavanya999
 
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
Leons Petražickis
 
Run wordcount job (hadoop)
Run wordcount job (hadoop)Run wordcount job (hadoop)
Run wordcount job (hadoop)
valeri kopaleishvili
 
Hdfs lab hands-on
Hdfs lab hands-on Hdfs lab hands-on
Hdfs lab hands-on
GrowwithParmesh
 
2012 09-08-josug-jeff
2012 09-08-josug-jeff2012 09-08-josug-jeff
2012 09-08-josug-jeff
Zheng (Jeff) Xu
 
Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands
SimoniShah6
 
Assignment 1 MapReduce With Hadoop
Assignment 1  MapReduce With HadoopAssignment 1  MapReduce With Hadoop
Assignment 1 MapReduce With Hadoop
Allison Thompson
 
Lean Drupal Repositories with Composer and Drush
Lean Drupal Repositories with Composer and DrushLean Drupal Repositories with Composer and Drush
Lean Drupal Repositories with Composer and Drush
Pantheon
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
BIGDATA  ANALYTICS LAB MANUAL final.pdfBIGDATA  ANALYTICS LAB MANUAL final.pdf
BIGDATA ANALYTICS LAB MANUAL final.pdf
ANJALAI AMMAL MAHALINGAM ENGINEERING COLLEGE
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
puneet yadav
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation
Mahantesh Angadi
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Nag Arvind Gudiseva
 
Hdfs java api
Hdfs java apiHdfs java api
Hdfs java api
Trieu Dao Minh
 
Hadoop_content_by_sasidhar2
Hadoop_content_by_sasidhar2Hadoop_content_by_sasidhar2
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows
habeebulla g
 
Hadoop installation steps
Hadoop installation stepsHadoop installation steps
Hadoop installation steps
Mayank Sharma
 
GitHub Basics - Derek Bable
GitHub Basics - Derek BableGitHub Basics - Derek Bable
GitHub Basics - Derek Bable
"FENG "GEORGE"" YU
 
Single node setup
Single node setupSingle node setup
Single node setup
KBCHOW123
 

Similar to Data science bootcamp day2 (20)

Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab Assignment
 
Word count program execution steps in hadoop
Word count program execution steps in hadoopWord count program execution steps in hadoop
Word count program execution steps in hadoop
 
Big data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with InstallationBig data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with Installation
 
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
 
Run wordcount job (hadoop)
Run wordcount job (hadoop)Run wordcount job (hadoop)
Run wordcount job (hadoop)
 
Hdfs lab hands-on
Hdfs lab hands-on Hdfs lab hands-on
Hdfs lab hands-on
 
2012 09-08-josug-jeff
2012 09-08-josug-jeff2012 09-08-josug-jeff
2012 09-08-josug-jeff
 
Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands
 
Assignment 1 MapReduce With Hadoop
Assignment 1  MapReduce With HadoopAssignment 1  MapReduce With Hadoop
Assignment 1 MapReduce With Hadoop
 
Lean Drupal Repositories with Composer and Drush
Lean Drupal Repositories with Composer and DrushLean Drupal Repositories with Composer and Drush
Lean Drupal Repositories with Composer and Drush
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
BIGDATA  ANALYTICS LAB MANUAL final.pdfBIGDATA  ANALYTICS LAB MANUAL final.pdf
BIGDATA ANALYTICS LAB MANUAL final.pdf
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
 
Hdfs java api
Hdfs java apiHdfs java api
Hdfs java api
 
Hadoop_content_by_sasidhar2
Hadoop_content_by_sasidhar2Hadoop_content_by_sasidhar2
Hadoop_content_by_sasidhar2
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows
 
Hadoop installation steps
Hadoop installation stepsHadoop installation steps
Hadoop installation steps
 
GitHub Basics - Derek Bable
GitHub Basics - Derek BableGitHub Basics - Derek Bable
GitHub Basics - Derek Bable
 
Single node setup
Single node setupSingle node setup
Single node setup
 

More from Chetan Khatri

Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Chetan Khatri
 
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
Chetan Khatri
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
Chetan Khatri
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
Chetan Khatri
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
Chetan Khatri
 
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_productionPyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
Chetan Khatri
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Chetan Khatri
 
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
Chetan Khatri
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
Chetan Khatri
 
An Introduction to Spark with Scala
An Introduction to Spark with ScalaAn Introduction to Spark with Scala
An Introduction to Spark with Scala
Chetan Khatri
 
HBase with Apache Spark POC Demo
HBase with Apache Spark POC DemoHBase with Apache Spark POC Demo
HBase with Apache Spark POC Demo
Chetan Khatri
 
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
Chetan Khatri
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
Chetan Khatri
 
Fossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatriFossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatri
Chetan Khatri
 
Fossasia ai-ml technologies and application for product development-chetan kh...
Fossasia ai-ml technologies and application for product development-chetan kh...Fossasia ai-ml technologies and application for product development-chetan kh...
Fossasia ai-ml technologies and application for product development-chetan kh...
Chetan Khatri
 
An Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningAn Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learning
Chetan Khatri
 
Introduction to Computer Science
Introduction to Computer ScienceIntroduction to Computer Science
Introduction to Computer Science
Chetan Khatri
 
An introduction to Git with Atlassian Suite
An introduction to Git with Atlassian SuiteAn introduction to Git with Atlassian Suite
An introduction to Git with Atlassian Suite
Chetan Khatri
 
Think machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetanThink machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetan
Chetan Khatri
 
A step towards machine learning at accionlabs
A step towards machine learning at accionlabsA step towards machine learning at accionlabs
A step towards machine learning at accionlabs
Chetan Khatri
 

More from Chetan Khatri (20)

Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
 
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
 
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_productionPyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
 
An Introduction to Spark with Scala
An Introduction to Spark with ScalaAn Introduction to Spark with Scala
An Introduction to Spark with Scala
 
HBase with Apache Spark POC Demo
HBase with Apache Spark POC DemoHBase with Apache Spark POC Demo
HBase with Apache Spark POC Demo
 
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
 
Fossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatriFossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatri
 
Fossasia ai-ml technologies and application for product development-chetan kh...
Fossasia ai-ml technologies and application for product development-chetan kh...Fossasia ai-ml technologies and application for product development-chetan kh...
Fossasia ai-ml technologies and application for product development-chetan kh...
 
An Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningAn Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learning
 
Introduction to Computer Science
Introduction to Computer ScienceIntroduction to Computer Science
Introduction to Computer Science
 
An introduction to Git with Atlassian Suite
An introduction to Git with Atlassian SuiteAn introduction to Git with Atlassian Suite
An introduction to Git with Atlassian Suite
 
Think machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetanThink machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetan
 
A step towards machine learning at accionlabs
A step towards machine learning at accionlabsA step towards machine learning at accionlabs
A step towards machine learning at accionlabs
 

Recently uploaded

一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
lzdvtmy8
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
asyed10
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 

Recently uploaded (20)

一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 

Data science bootcamp day2

  • 1. Data Science Bootcamp, Day 2 Presented By: Chetan Khatri, Volunteer Teaching assistant, Data Science Lab. Guidance By: Prof. Devji D. Chhanga, University of Kachchh.
  • 2. Agenda Understanding Git. Understanding Apache Maven. Hello World Java Program with Apache Maven. Understanding of Hadoop Administrative Commands. WordCount Hadoop Program on Hadoop Cluster with Maven.
  • 3. Git with Github ● Github: Repository storage where you can store your source code and share with team member work interactively. ● Installation: sudo apt-get install git ● Steps TODO: 1. Create Repository 2. Clone - Copy someone else's repository 3. Commit - Ready to submit your code to repository.
  • 4. Let’s have Demo with Git ● Create Repository at Github named hadoopdemo ● Cloning Repository: git clone https://github.com/dskskv/hadoopdemo.git ● Configure github with your credentials: git config --global user.email "you@example.com" git config --global user.name "Your Name" Add individual file: sudo git add README.md for adding every files: sudo git add .
  • 5. Let’s have Demo with Git (Conti…) commit command - sudo git commit -m "Comment anything" Submit request to github repository with whatever has been added: sudo git origin master pull - is to get latest updated code from repository Example : git pull https://github.com/dskskv/hadoopdemo.git Git Branches: Are Different Modules of the Repository, Such as Development, Test, Production phase of the software development. Master branch has always updated code.
  • 6. Understanding Apache Maven Apache Maven is Build Tool for Java, where you can use Other Artifacts(Jar files written by someone else) and build your Jar file which contains all other’s added before. Maven Life Cycle: Create Maven Project Update Maven Project Write Java Code Maven Clean Maven Build (For building your Jar file)
  • 7. Understanding Hadoop Administrative Commands 1. Cloning github cccs936 repository git clone https://github.com/dskskv/CCCS936.git 2. Start Hadoop Cluster sbin/start-dfs.sh sbin/start-yarn.sh 3. Check Hadoop Version hadoop version 4. Check all the options under hadoop command hadoop 5. Create Directory as "dskskv" at HDFS hadoop fs -mkdir /dskskv
  • 8. Understanding Hadoop Administrative Commands 6. List out the contents of dskskv object inside HDFS hadoop fs -ls /dskskv 7. Create Text file sudo gedit inputfile.txt 8. Put text file inside HDFS block hadoop fs -put inputfile.txt /dskskv 9. Read the content of HDFS textfile object hadoop fs -cat /dskskv/inputfile.txt
  • 9. Understanding Hadoop Administrative Commands 10. hadoop deprecated, use hdfs also for the same operations. hdfs dfs -mkdir /chetan hdfs dfs -put inputfile.txt /chetan hdfs dfs -cat /chetan/inputfile.txt 11. Deleting file from HDFS hadoop fs -rm /dskskv/inputfile.txt 12. Deleting Directory from HDFS hadoop fs -rm -r /dskskv
  • 10. WordCount Hadoop Program on Hadoop Cluster with Maven 1) Login as a Hadoop User: su hduser 2) Start hadoop deamon services sbin/start-dfs.sh sbin/start-yarn.sh 3) Check whether all deamon services are up or not jps 4) Create directory in HDFS, Note: make sure wherever you are in the console , Hadoop user should have previlegies to access it. hadoop fs -mkdir /input 5) Transfer textfile to HDFS hadoop fs -put inputfile.txt /input
  • 11. WordCount Hadoop Program on Hadoop Cluster with Maven 6) Check whether file is transferred successfully hadoop fs -ls /input 7) execute hadoop job by providing Hadoop Program executable Jar file and input directory path where text file is there and output directory path where you are looking to store process data. hadoop jar WordCountDSKSKV-0.0.1-SNAPSHOT.jar /input /output 8) Check Processed Directory has processed files ? hadoop fs -ls /output 9) Read your desired output from Hadoop Job. hadoop fs -cat /output/part-r-00000