SlideShare a Scribd company logo
Data Science Bootcamp, Day 2
Presented By:
Chetan Khatri, Volunteer Teaching assistant, Data Science Lab.
Guidance By:
Prof. Devji D. Chhanga, University of Kachchh.
Agenda
Understanding Git.
Understanding Apache Maven.
Hello World Java Program with Apache Maven.
Understanding of Hadoop Administrative Commands.
WordCount Hadoop Program on Hadoop Cluster with Maven.
Git with Github
● Github: Repository storage where you can store your source code and share
with team member work interactively.
● Installation: sudo apt-get install git
● Steps TODO:
1. Create Repository
2. Clone - Copy someone else's repository
3. Commit - Ready to submit your code to repository.
Let’s have Demo with Git
● Create Repository at Github named hadoopdemo
● Cloning Repository: git clone https://github.com/dskskv/hadoopdemo.git
● Configure github with your credentials:
git config --global user.email "you@example.com"
git config --global user.name "Your Name"
Add individual file: sudo git add README.md
for adding every files: sudo git add .
Let’s have Demo with Git (Conti…)
commit command - sudo git commit -m "Comment anything"
Submit request to github repository with whatever has been added:
sudo git origin master
pull - is to get latest updated code from repository
Example : git pull https://github.com/dskskv/hadoopdemo.git
Git Branches:
Are Different Modules of the Repository, Such as Development, Test, Production phase of the
software development.
Master branch has always updated code.
Understanding Apache Maven
Apache Maven is Build Tool for Java, where you can use Other Artifacts(Jar files
written by someone else) and build your Jar file which contains all other’s
added before.
Maven Life Cycle:
Create Maven Project
Update Maven Project
Write Java Code
Maven Clean
Maven Build (For building your Jar file)
Understanding Hadoop Administrative Commands
1. Cloning github cccs936 repository
git clone https://github.com/dskskv/CCCS936.git
2. Start Hadoop Cluster
sbin/start-dfs.sh
sbin/start-yarn.sh
3. Check Hadoop Version
hadoop version
4. Check all the options under hadoop command
hadoop
5. Create Directory as "dskskv" at HDFS
hadoop fs -mkdir /dskskv
Understanding Hadoop Administrative Commands
6. List out the contents of dskskv object inside HDFS
hadoop fs -ls /dskskv
7. Create Text file
sudo gedit inputfile.txt
8. Put text file inside HDFS block
hadoop fs -put inputfile.txt /dskskv
9. Read the content of HDFS textfile object
hadoop fs -cat /dskskv/inputfile.txt
Understanding Hadoop Administrative Commands
10. hadoop deprecated, use hdfs also for the same operations.
hdfs dfs -mkdir /chetan
hdfs dfs -put inputfile.txt /chetan
hdfs dfs -cat /chetan/inputfile.txt
11. Deleting file from HDFS
hadoop fs -rm /dskskv/inputfile.txt
12. Deleting Directory from HDFS
hadoop fs -rm -r /dskskv
WordCount Hadoop Program on Hadoop Cluster with
Maven
1) Login as a Hadoop User:
su hduser
2) Start hadoop deamon services
sbin/start-dfs.sh
sbin/start-yarn.sh
3) Check whether all deamon services are up or not
jps
4) Create directory in HDFS, Note: make sure wherever you are in the console , Hadoop user should
have previlegies to access it.
hadoop fs -mkdir /input
5) Transfer textfile to HDFS
hadoop fs -put inputfile.txt /input
WordCount Hadoop Program on Hadoop Cluster with
Maven
6) Check whether file is transferred successfully
hadoop fs -ls /input
7) execute hadoop job by providing Hadoop Program executable Jar file and input directory path where
text file is there and output directory path where you are looking to store process data.
hadoop jar WordCountDSKSKV-0.0.1-SNAPSHOT.jar /input /output
8) Check Processed Directory has processed files ?
hadoop fs -ls /output
9) Read your desired output from Hadoop Job.
hadoop fs -cat /output/part-r-00000

More Related Content

What's hot

時代在變 Docker 要會:台北 Docker 一日入門篇
時代在變 Docker 要會:台北 Docker 一日入門篇時代在變 Docker 要會:台北 Docker 一日入門篇
時代在變 Docker 要會:台北 Docker 一日入門篇
Philip Zheng
 
容器與資料科學應用
容器與資料科學應用容器與資料科學應用
容器與資料科學應用
Philip Zheng
 
桃園市教育局Docker技術入門與實作
桃園市教育局Docker技術入門與實作桃園市教育局Docker技術入門與實作
桃園市教育局Docker技術入門與實作
Philip Zheng
 
Drush in the Composer Era
Drush in the Composer EraDrush in the Composer Era
Drush in the Composer Era
Pantheon
 
Using Containers for Continuous Integration and Continuous Delivery
Using Containers for Continuous Integration and Continuous DeliveryUsing Containers for Continuous Integration and Continuous Delivery
Using Containers for Continuous Integration and Continuous Delivery
Carlos Sanchez
 
Docker for mere mortals
Docker for mere mortalsDocker for mere mortals
Docker for mere mortals
Henryk Konsek
 
Introduzione a GitHub Actions (beta)
Introduzione a GitHub Actions (beta)Introduzione a GitHub Actions (beta)
Introduzione a GitHub Actions (beta)
Giulio Vian
 
Карманный PaaS с Dokku (Александр Белецкий)
Карманный PaaS с Dokku (Александр Белецкий)Карманный PaaS с Dokku (Александр Белецкий)
Карманный PaaS с Dokku (Александр Белецкий)GeeksLab Odessa
 
Production FS: Adapt or die - Claudia Beresford & Tiago Scolar
Production FS: Adapt or die - Claudia Beresford & Tiago ScolarProduction FS: Adapt or die - Claudia Beresford & Tiago Scolar
Production FS: Adapt or die - Claudia Beresford & Tiago Scolar
Paris Container Day
 
Cloud Foundry V2 | Intermediate Deep Dive
Cloud Foundry V2 | Intermediate Deep DiveCloud Foundry V2 | Intermediate Deep Dive
Cloud Foundry V2 | Intermediate Deep DiveKazuto Kusama
 
Docker研習營
Docker研習營Docker研習營
Docker研習營
Philip Zheng
 
Introduction to Project atomic (CentOS Dojo Bangalore)
Introduction to Project atomic (CentOS Dojo Bangalore)Introduction to Project atomic (CentOS Dojo Bangalore)
Introduction to Project atomic (CentOS Dojo Bangalore)
Lalatendu Mohanty
 
Drupal in Libraries
Drupal in LibrariesDrupal in Libraries
Drupal in Libraries
Cary Gordon
 
Composer Tools & Frameworks for Drupal
Composer Tools & Frameworks for DrupalComposer Tools & Frameworks for Drupal
Composer Tools & Frameworks for Drupal
Pantheon
 
Using Git with Drupal
Using Git with DrupalUsing Git with Drupal
Using Git with DrupalRyan Cross
 
Plone in news media
Plone in news mediaPlone in news media
Plone in news media
Héctor Velarde
 
Challenges of container configuration
Challenges of container configurationChallenges of container configuration
Challenges of container configuration
lutter
 
Automating Mendix application deployments with Nix
Automating Mendix application deployments with NixAutomating Mendix application deployments with Nix
Automating Mendix application deployments with Nix
Sander van der Burg
 
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQDocker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
Erica Windisch
 

What's hot (20)

時代在變 Docker 要會:台北 Docker 一日入門篇
時代在變 Docker 要會:台北 Docker 一日入門篇時代在變 Docker 要會:台北 Docker 一日入門篇
時代在變 Docker 要會:台北 Docker 一日入門篇
 
容器與資料科學應用
容器與資料科學應用容器與資料科學應用
容器與資料科學應用
 
桃園市教育局Docker技術入門與實作
桃園市教育局Docker技術入門與實作桃園市教育局Docker技術入門與實作
桃園市教育局Docker技術入門與實作
 
Drush in the Composer Era
Drush in the Composer EraDrush in the Composer Era
Drush in the Composer Era
 
Using Containers for Continuous Integration and Continuous Delivery
Using Containers for Continuous Integration and Continuous DeliveryUsing Containers for Continuous Integration and Continuous Delivery
Using Containers for Continuous Integration and Continuous Delivery
 
Docker for mere mortals
Docker for mere mortalsDocker for mere mortals
Docker for mere mortals
 
Introduzione a GitHub Actions (beta)
Introduzione a GitHub Actions (beta)Introduzione a GitHub Actions (beta)
Introduzione a GitHub Actions (beta)
 
Карманный PaaS с Dokku (Александр Белецкий)
Карманный PaaS с Dokku (Александр Белецкий)Карманный PaaS с Dokku (Александр Белецкий)
Карманный PaaS с Dokku (Александр Белецкий)
 
Production FS: Adapt or die - Claudia Beresford & Tiago Scolar
Production FS: Adapt or die - Claudia Beresford & Tiago ScolarProduction FS: Adapt or die - Claudia Beresford & Tiago Scolar
Production FS: Adapt or die - Claudia Beresford & Tiago Scolar
 
Guava
GuavaGuava
Guava
 
Cloud Foundry V2 | Intermediate Deep Dive
Cloud Foundry V2 | Intermediate Deep DiveCloud Foundry V2 | Intermediate Deep Dive
Cloud Foundry V2 | Intermediate Deep Dive
 
Docker研習營
Docker研習營Docker研習營
Docker研習營
 
Introduction to Project atomic (CentOS Dojo Bangalore)
Introduction to Project atomic (CentOS Dojo Bangalore)Introduction to Project atomic (CentOS Dojo Bangalore)
Introduction to Project atomic (CentOS Dojo Bangalore)
 
Drupal in Libraries
Drupal in LibrariesDrupal in Libraries
Drupal in Libraries
 
Composer Tools & Frameworks for Drupal
Composer Tools & Frameworks for DrupalComposer Tools & Frameworks for Drupal
Composer Tools & Frameworks for Drupal
 
Using Git with Drupal
Using Git with DrupalUsing Git with Drupal
Using Git with Drupal
 
Plone in news media
Plone in news mediaPlone in news media
Plone in news media
 
Challenges of container configuration
Challenges of container configurationChallenges of container configuration
Challenges of container configuration
 
Automating Mendix application deployments with Nix
Automating Mendix application deployments with NixAutomating Mendix application deployments with Nix
Automating Mendix application deployments with Nix
 
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQDocker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
 

Viewers also liked

Data science bootcamp day 3
Data science bootcamp day 3Data science bootcamp day 3
Data science bootcamp day 3
Chetan Khatri
 
Data science bootcamp day1
Data science bootcamp day1Data science bootcamp day1
Data science bootcamp day1
Chetan Khatri
 
Pycon india-2016-success-story
Pycon india-2016-success-storyPycon india-2016-success-story
Pycon india-2016-success-story
Chetan Khatri
 
Alumni talk-university-of-kachchh
Alumni talk-university-of-kachchhAlumni talk-university-of-kachchh
Alumni talk-university-of-kachchh
Chetan Khatri
 
Internet of things initiative-cskskv
Internet of things   initiative-cskskvInternet of things   initiative-cskskv
Internet of things initiative-cskskv
Chetan Khatri
 
Think Machine Learning with Scikit-Learn (Python)
Think Machine Learning with Scikit-Learn (Python)Think Machine Learning with Scikit-Learn (Python)
Think Machine Learning with Scikit-Learn (Python)
Chetan Khatri
 
Data Analytics with Pandas and Numpy - Python
Data Analytics with Pandas and Numpy - PythonData Analytics with Pandas and Numpy - Python
Data Analytics with Pandas and Numpy - Python
Chetan Khatri
 
Filme terror 2013
Filme terror 2013Filme terror 2013
Filme terror 2013Rafael Wolf
 
Job fair at seattle
Job fair at seattleJob fair at seattle
Job fair at seattlezeenatkassam
 
Romance powerpoint
Romance powerpointRomance powerpoint
Romance powerpointbetha2media
 
Davidson Capital - NOAH15 London
Davidson Capital - NOAH15 LondonDavidson Capital - NOAH15 London
Davidson Capital - NOAH15 London
NOAH Advisors
 
Numpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunchingNumpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunching
Data Science London
 

Viewers also liked (17)

Data science bootcamp day 3
Data science bootcamp day 3Data science bootcamp day 3
Data science bootcamp day 3
 
Data science bootcamp day1
Data science bootcamp day1Data science bootcamp day1
Data science bootcamp day1
 
Pycon india-2016-success-story
Pycon india-2016-success-storyPycon india-2016-success-story
Pycon india-2016-success-story
 
Alumni talk-university-of-kachchh
Alumni talk-university-of-kachchhAlumni talk-university-of-kachchh
Alumni talk-university-of-kachchh
 
Internet of things initiative-cskskv
Internet of things   initiative-cskskvInternet of things   initiative-cskskv
Internet of things initiative-cskskv
 
Think Machine Learning with Scikit-Learn (Python)
Think Machine Learning with Scikit-Learn (Python)Think Machine Learning with Scikit-Learn (Python)
Think Machine Learning with Scikit-Learn (Python)
 
Data Analytics with Pandas and Numpy - Python
Data Analytics with Pandas and Numpy - PythonData Analytics with Pandas and Numpy - Python
Data Analytics with Pandas and Numpy - Python
 
Feature Release
Feature ReleaseFeature Release
Feature Release
 
Filme terror 2013
Filme terror 2013Filme terror 2013
Filme terror 2013
 
8617 Taylor Road
8617 Taylor Road8617 Taylor Road
8617 Taylor Road
 
News Release
News ReleaseNews Release
News Release
 
Job fair at seattle
Job fair at seattleJob fair at seattle
Job fair at seattle
 
Publication plan slideshare
Publication plan slidesharePublication plan slideshare
Publication plan slideshare
 
Mart6ha
Mart6haMart6ha
Mart6ha
 
Romance powerpoint
Romance powerpointRomance powerpoint
Romance powerpoint
 
Davidson Capital - NOAH15 London
Davidson Capital - NOAH15 LondonDavidson Capital - NOAH15 London
Davidson Capital - NOAH15 London
 
Numpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunchingNumpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunching
 

Similar to Data science bootcamp day2

Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab Assignment
Farzad Nozarian
 
Word count program execution steps in hadoop
Word count program execution steps in hadoopWord count program execution steps in hadoop
Word count program execution steps in hadoop
jijukjoseph
 
Big data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with InstallationBig data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with Installation
mellempudilavanya999
 
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
Leons Petražickis
 
Hdfs lab hands-on
Hdfs lab hands-on Hdfs lab hands-on
Hdfs lab hands-on
GrowwithParmesh
 
Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands
SimoniShah6
 
Assignment 1 MapReduce With Hadoop
Assignment 1  MapReduce With HadoopAssignment 1  MapReduce With Hadoop
Assignment 1 MapReduce With Hadoop
Allison Thompson
 
Lean Drupal Repositories with Composer and Drush
Lean Drupal Repositories with Composer and DrushLean Drupal Repositories with Composer and Drush
Lean Drupal Repositories with Composer and Drush
Pantheon
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
BIGDATA  ANALYTICS LAB MANUAL final.pdfBIGDATA  ANALYTICS LAB MANUAL final.pdf
BIGDATA ANALYTICS LAB MANUAL final.pdf
ANJALAI AMMAL MAHALINGAM ENGINEERING COLLEGE
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
puneet yadav
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation
Mahantesh Angadi
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Nag Arvind Gudiseva
 
Hdfs java api
Hdfs java apiHdfs java api
Hdfs java api
Trieu Dao Minh
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows
habeebulla g
 
Hadoop installation steps
Hadoop installation stepsHadoop installation steps
Hadoop installation steps
Mayank Sharma
 
GitHub Basics - Derek Bable
GitHub Basics - Derek BableGitHub Basics - Derek Bable
GitHub Basics - Derek Bable
"FENG "GEORGE"" YU
 
Single node setup
Single node setupSingle node setup
Single node setupKBCHOW123
 

Similar to Data science bootcamp day2 (20)

Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab Assignment
 
Word count program execution steps in hadoop
Word count program execution steps in hadoopWord count program execution steps in hadoop
Word count program execution steps in hadoop
 
Big data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with InstallationBig data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with Installation
 
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
 
Run wordcount job (hadoop)
Run wordcount job (hadoop)Run wordcount job (hadoop)
Run wordcount job (hadoop)
 
Hdfs lab hands-on
Hdfs lab hands-on Hdfs lab hands-on
Hdfs lab hands-on
 
2012 09-08-josug-jeff
2012 09-08-josug-jeff2012 09-08-josug-jeff
2012 09-08-josug-jeff
 
Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands
 
Assignment 1 MapReduce With Hadoop
Assignment 1  MapReduce With HadoopAssignment 1  MapReduce With Hadoop
Assignment 1 MapReduce With Hadoop
 
Lean Drupal Repositories with Composer and Drush
Lean Drupal Repositories with Composer and DrushLean Drupal Repositories with Composer and Drush
Lean Drupal Repositories with Composer and Drush
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
BIGDATA  ANALYTICS LAB MANUAL final.pdfBIGDATA  ANALYTICS LAB MANUAL final.pdf
BIGDATA ANALYTICS LAB MANUAL final.pdf
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
 
Hdfs java api
Hdfs java apiHdfs java api
Hdfs java api
 
Hadoop_content_by_sasidhar2
Hadoop_content_by_sasidhar2Hadoop_content_by_sasidhar2
Hadoop_content_by_sasidhar2
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows
 
Hadoop installation steps
Hadoop installation stepsHadoop installation steps
Hadoop installation steps
 
GitHub Basics - Derek Bable
GitHub Basics - Derek BableGitHub Basics - Derek Bable
GitHub Basics - Derek Bable
 
Single node setup
Single node setupSingle node setup
Single node setup
 

More from Chetan Khatri

Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Chetan Khatri
 
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
Chetan Khatri
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
Chetan Khatri
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
Chetan Khatri
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
Chetan Khatri
 
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_productionPyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
Chetan Khatri
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Chetan Khatri
 
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
Chetan Khatri
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
Chetan Khatri
 
An Introduction to Spark with Scala
An Introduction to Spark with ScalaAn Introduction to Spark with Scala
An Introduction to Spark with Scala
Chetan Khatri
 
HBase with Apache Spark POC Demo
HBase with Apache Spark POC DemoHBase with Apache Spark POC Demo
HBase with Apache Spark POC Demo
Chetan Khatri
 
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
Chetan Khatri
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
Chetan Khatri
 
Fossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatriFossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatri
Chetan Khatri
 
Fossasia ai-ml technologies and application for product development-chetan kh...
Fossasia ai-ml technologies and application for product development-chetan kh...Fossasia ai-ml technologies and application for product development-chetan kh...
Fossasia ai-ml technologies and application for product development-chetan kh...
Chetan Khatri
 
An Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningAn Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learning
Chetan Khatri
 
Introduction to Computer Science
Introduction to Computer ScienceIntroduction to Computer Science
Introduction to Computer Science
Chetan Khatri
 
An introduction to Git with Atlassian Suite
An introduction to Git with Atlassian SuiteAn introduction to Git with Atlassian Suite
An introduction to Git with Atlassian Suite
Chetan Khatri
 
Think machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetanThink machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetan
Chetan Khatri
 
A step towards machine learning at accionlabs
A step towards machine learning at accionlabsA step towards machine learning at accionlabs
A step towards machine learning at accionlabs
Chetan Khatri
 

More from Chetan Khatri (20)

Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
 
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
 
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_productionPyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
 
An Introduction to Spark with Scala
An Introduction to Spark with ScalaAn Introduction to Spark with Scala
An Introduction to Spark with Scala
 
HBase with Apache Spark POC Demo
HBase with Apache Spark POC DemoHBase with Apache Spark POC Demo
HBase with Apache Spark POC Demo
 
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
 
Fossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatriFossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatri
 
Fossasia ai-ml technologies and application for product development-chetan kh...
Fossasia ai-ml technologies and application for product development-chetan kh...Fossasia ai-ml technologies and application for product development-chetan kh...
Fossasia ai-ml technologies and application for product development-chetan kh...
 
An Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningAn Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learning
 
Introduction to Computer Science
Introduction to Computer ScienceIntroduction to Computer Science
Introduction to Computer Science
 
An introduction to Git with Atlassian Suite
An introduction to Git with Atlassian SuiteAn introduction to Git with Atlassian Suite
An introduction to Git with Atlassian Suite
 
Think machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetanThink machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetan
 
A step towards machine learning at accionlabs
A step towards machine learning at accionlabsA step towards machine learning at accionlabs
A step towards machine learning at accionlabs
 

Recently uploaded

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 

Recently uploaded (20)

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 

Data science bootcamp day2

  • 1. Data Science Bootcamp, Day 2 Presented By: Chetan Khatri, Volunteer Teaching assistant, Data Science Lab. Guidance By: Prof. Devji D. Chhanga, University of Kachchh.
  • 2. Agenda Understanding Git. Understanding Apache Maven. Hello World Java Program with Apache Maven. Understanding of Hadoop Administrative Commands. WordCount Hadoop Program on Hadoop Cluster with Maven.
  • 3. Git with Github ● Github: Repository storage where you can store your source code and share with team member work interactively. ● Installation: sudo apt-get install git ● Steps TODO: 1. Create Repository 2. Clone - Copy someone else's repository 3. Commit - Ready to submit your code to repository.
  • 4. Let’s have Demo with Git ● Create Repository at Github named hadoopdemo ● Cloning Repository: git clone https://github.com/dskskv/hadoopdemo.git ● Configure github with your credentials: git config --global user.email "you@example.com" git config --global user.name "Your Name" Add individual file: sudo git add README.md for adding every files: sudo git add .
  • 5. Let’s have Demo with Git (Conti…) commit command - sudo git commit -m "Comment anything" Submit request to github repository with whatever has been added: sudo git origin master pull - is to get latest updated code from repository Example : git pull https://github.com/dskskv/hadoopdemo.git Git Branches: Are Different Modules of the Repository, Such as Development, Test, Production phase of the software development. Master branch has always updated code.
  • 6. Understanding Apache Maven Apache Maven is Build Tool for Java, where you can use Other Artifacts(Jar files written by someone else) and build your Jar file which contains all other’s added before. Maven Life Cycle: Create Maven Project Update Maven Project Write Java Code Maven Clean Maven Build (For building your Jar file)
  • 7. Understanding Hadoop Administrative Commands 1. Cloning github cccs936 repository git clone https://github.com/dskskv/CCCS936.git 2. Start Hadoop Cluster sbin/start-dfs.sh sbin/start-yarn.sh 3. Check Hadoop Version hadoop version 4. Check all the options under hadoop command hadoop 5. Create Directory as "dskskv" at HDFS hadoop fs -mkdir /dskskv
  • 8. Understanding Hadoop Administrative Commands 6. List out the contents of dskskv object inside HDFS hadoop fs -ls /dskskv 7. Create Text file sudo gedit inputfile.txt 8. Put text file inside HDFS block hadoop fs -put inputfile.txt /dskskv 9. Read the content of HDFS textfile object hadoop fs -cat /dskskv/inputfile.txt
  • 9. Understanding Hadoop Administrative Commands 10. hadoop deprecated, use hdfs also for the same operations. hdfs dfs -mkdir /chetan hdfs dfs -put inputfile.txt /chetan hdfs dfs -cat /chetan/inputfile.txt 11. Deleting file from HDFS hadoop fs -rm /dskskv/inputfile.txt 12. Deleting Directory from HDFS hadoop fs -rm -r /dskskv
  • 10. WordCount Hadoop Program on Hadoop Cluster with Maven 1) Login as a Hadoop User: su hduser 2) Start hadoop deamon services sbin/start-dfs.sh sbin/start-yarn.sh 3) Check whether all deamon services are up or not jps 4) Create directory in HDFS, Note: make sure wherever you are in the console , Hadoop user should have previlegies to access it. hadoop fs -mkdir /input 5) Transfer textfile to HDFS hadoop fs -put inputfile.txt /input
  • 11. WordCount Hadoop Program on Hadoop Cluster with Maven 6) Check whether file is transferred successfully hadoop fs -ls /input 7) execute hadoop job by providing Hadoop Program executable Jar file and input directory path where text file is there and output directory path where you are looking to store process data. hadoop jar WordCountDSKSKV-0.0.1-SNAPSHOT.jar /input /output 8) Check Processed Directory has processed files ? hadoop fs -ls /output 9) Read your desired output from Hadoop Job. hadoop fs -cat /output/part-r-00000