SlideShare a Scribd company logo
WANT $100K
JOB?
Harvard Business Review ranks Data Science to
be
the sexiest job in the 21st century
Glassdoor ranks Data Scientist to be
the best job!
HADOOP JOBS
VACANCY PERCENTILE
PAY SCALE
AVG INDIAN SALARY
TOP LANGUAGES OR COURSES
FOR DATA SCIENCE
APPOINTED BY EX US PRESIDENT
BARACK.H.OBAMA
AVERAGE SALARIES OF DATA ANALYST
ACROSS INDIA
h

a

d

o

o

p
OVERVIEW
• Operating System: Cross Platform

• Type: Distributed File System

• License: Apache License 2.0

• Website: hadoop.apache.org

• Written in: Java
HADOOP FRAMEWORK
• Hadoop Common – contains libraries and utilities needed by other Hadoop modules
• Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on
commodity machines, providing very high aggregate bandwidth across the cluster
• Hadoop YARN – a platform responsible for managing computing resources in clusters
and using them for scheduling users' applications
• Hadoop MapReduce – an implementation of the MapReduce programming model for
large-scale data processing
ARCHITECTURE
Hadoop Common Package
comprises of:
• File system and operating system level
abstractions
• MapReduce engine (either MapReduce/
MR1 or YARN/MR2)
• Hadoop Distributed File System(HDFS)
Hadoop requires Java Runtime
Environment (JRE) 1.6 or higher
The standard startup and shutdown scripts
require that Secure Shell(SSH) be set up
between nodes in the cluster
MapReduce is a framework for
processing parallelizable problems
across large datasets using a large
number of computers (nodes),
collectively referred to as a cluster(if
all nodes are on the same local
network and use similar hardware) or
a grid.
Processing can occur on data stored
either in a filesystem(unstructured) or
in a database (structured).
MapReduce can take advantage of the
locality of data, processing it near the
place it is stored in order to minimize
communication overhead.
MapReduce Engine
MapReduce is as a 5-step parallel
and distributed computation:
1 Prepare the Map() input – the "MapReduce
system" designates Map processors,
assigns the input key value K1 that each
processor would work on, and provides that
processor with all the input data associated
with that key value.
2 Run the user-provided Map() code – Map() is
run exactly once for each K1 key value,
generating output organized by key values
K2.
3 "Shuffle" the Map output to the Reduce
processors – the MapReduce system
designates Reduce processors, assigns the
K2 key value each processor should work
on, and provides that processor with all the
Map-generated data associated with that
key value.
4 Run the user-provided Reduce() code –
Reduce() is run exactly once for each K2 key
value produced by the Map step.
5 Produce the final output – the MapReduce
system collects all the Reduce output, and
sorts it by K2 to produce the final outcome.
Map Reduce Algorithm
Managers in India
As stated in the article
The story of Jonathan Goldman illustrates, their greatest opportunity to add value is not in
creating reports or presentations for senior executives but in innovating with customer-facing
products and processes.
In India there are so many E-Commerce companies that are in their development stage. Like the
data scientists of big companies like Intuit, Google, GE, Zynga have worked their way out to
optimize the service contracts and maintenance intervals for industrial products, either by core
search, ad servicing algorithms, MapReduce algorithm etc..
So the Managers in India should focus more on their data analytical skills mentioned in the
above slides
instead of creating reports for senior executives.
They should be thorough with Machine Learning, R, Python, Apache Hadoop Packages,
Mathematical and Statistical knowledge.
This will help them in their business life as data science is the sexiest job of 21st century!!!!!
Bibliography
• Harvard Business Review

• wikipedia.org

• https://www.sas.com/en_ae/insights/big-data/
hadoop.html

• Apache Hadoop

More Related Content

What's hot

The Past, Present and Future of Big Data @LinkedIn
The Past, Present and Future of Big Data @LinkedInThe Past, Present and Future of Big Data @LinkedIn
The Past, Present and Future of Big Data @LinkedIn
Suja Viswesan
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
Jay Nagar
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
 
About Streaming Data Solutions for Hadoop
About Streaming Data Solutions for HadoopAbout Streaming Data Solutions for Hadoop
About Streaming Data Solutions for Hadoop
Lynn Langit
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS...
Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS...Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS...
Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS...
Amazon Web Services
 
Big data and tools
Big data and tools Big data and tools
Big data and tools
Shivam Shukla
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPT
Anand Pandey
 
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
DataWorks Summit/Hadoop Summit
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
HARMAN Services
 
Hadoop J.G.Rohini II M.Sc.,computer science Bon secours college for women
Hadoop J.G.Rohini II M.Sc.,computer science Bon secours college for womenHadoop J.G.Rohini II M.Sc.,computer science Bon secours college for women
Hadoop J.G.Rohini II M.Sc.,computer science Bon secours college for women
rohinig10
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
AgnihotriGhosh2
 
Big data hadoop
Big data  hadoopBig data  hadoop
Big data hadoop
Vikram Dey
 
Hourglass: a Library for Incremental Processing on Hadoop
Hourglass: a Library for Incremental Processing on HadoopHourglass: a Library for Incremental Processing on Hadoop
Hourglass: a Library for Incremental Processing on Hadoop
Matthew Hayes
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
Geoffrey Fox
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And HiveCloudera, Inc.
 
Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
Arjen de Vries
 
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
VMware Tanzu
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
 
The Microsoft BigData Story
The Microsoft BigData StoryThe Microsoft BigData Story
The Microsoft BigData Story
Lynn Langit
 

What's hot (20)

The Past, Present and Future of Big Data @LinkedIn
The Past, Present and Future of Big Data @LinkedInThe Past, Present and Future of Big Data @LinkedIn
The Past, Present and Future of Big Data @LinkedIn
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
About Streaming Data Solutions for Hadoop
About Streaming Data Solutions for HadoopAbout Streaming Data Solutions for Hadoop
About Streaming Data Solutions for Hadoop
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS...
Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS...Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS...
Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS...
 
Big data and tools
Big data and tools Big data and tools
Big data and tools
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPT
 
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
 
Hadoop J.G.Rohini II M.Sc.,computer science Bon secours college for women
Hadoop J.G.Rohini II M.Sc.,computer science Bon secours college for womenHadoop J.G.Rohini II M.Sc.,computer science Bon secours college for women
Hadoop J.G.Rohini II M.Sc.,computer science Bon secours college for women
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
 
Big data hadoop
Big data  hadoopBig data  hadoop
Big data hadoop
 
Hourglass: a Library for Incremental Processing on Hadoop
Hourglass: a Library for Incremental Processing on HadoopHourglass: a Library for Incremental Processing on Hadoop
Hourglass: a Library for Incremental Processing on Hadoop
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
 
Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
 
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
The Microsoft BigData Story
The Microsoft BigData StoryThe Microsoft BigData Story
The Microsoft BigData Story
 

Similar to Data scientist a perfect job

B04 06 0918
B04 06 0918B04 06 0918
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
Kumari Surabhi
 
hadoop
hadoophadoop
hadoop
Deep Mehta
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
Abhishek Roy
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
Thanusha154
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
Gregg Barrett
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
Harika583
 
IRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache PigIRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET Journal
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
Editor IJCATR
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases report
Ahmad El Tawil
 
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
Anand Haridass
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Deanna Kosaraju
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Ahmed Elsayed
 
Big data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-contentBig data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-content
Training Institute
 
Cascading concurrent yahoo lunch_nlearn
Cascading concurrent   yahoo lunch_nlearnCascading concurrent   yahoo lunch_nlearn
Cascading concurrent yahoo lunch_nlearn
Cascading
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET Journal
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
GERARDO BARBERENA
 

Similar to Data scientist a perfect job (20)

B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
 
hadoop
hadoophadoop
hadoop
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
 
IRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache PigIRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache Pig
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases report
 
BigData_Krishna Kumar Sharma
BigData_Krishna Kumar SharmaBigData_Krishna Kumar Sharma
BigData_Krishna Kumar Sharma
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
 
Hadoop
HadoopHadoop
Hadoop
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19
 
Big data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-contentBig data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-content
 
Cascading concurrent yahoo lunch_nlearn
Cascading concurrent   yahoo lunch_nlearnCascading concurrent   yahoo lunch_nlearn
Cascading concurrent yahoo lunch_nlearn
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOP
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 

More from Sidharth Raj Agarwal

You may not need big data after all
You may not need big data after allYou may not need big data after all
You may not need big data after all
Sidharth Raj Agarwal
 
Big data hype
Big data hypeBig data hype
Big data hype
Sidharth Raj Agarwal
 
How to use data to make hit tv show
How to use data to make hit tv showHow to use data to make hit tv show
How to use data to make hit tv show
Sidharth Raj Agarwal
 
Big data revolution in healthcare
Big data revolution in healthcareBig data revolution in healthcare
Big data revolution in healthcare
Sidharth Raj Agarwal
 
Leader’s guide to data analytics
Leader’s guide to data  analyticsLeader’s guide to data  analytics
Leader’s guide to data analytics
Sidharth Raj Agarwal
 
How to spot bad statistics
How to spot bad statisticsHow to spot bad statistics
How to spot bad statistics
Sidharth Raj Agarwal
 
Predictive analytics
Predictive analyticsPredictive analytics
Predictive analytics
Sidharth Raj Agarwal
 
Difference that good statistics can make
Difference that good statistics can make Difference that good statistics can make
Difference that good statistics can make
Sidharth Raj Agarwal
 
Data is worthless if you don’t communicate it
Data is worthless if you don’t communicate itData is worthless if you don’t communicate it
Data is worthless if you don’t communicate it
Sidharth Raj Agarwal
 
Data visualization TED Talk
Data visualization TED TalkData visualization TED Talk
Data visualization TED Talk
Sidharth Raj Agarwal
 
Are you a data driven
Are you a data drivenAre you a data driven
Are you a data driven
Sidharth Raj Agarwal
 
Make data more human
Make data more humanMake data more human
Make data more human
Sidharth Raj Agarwal
 
How to think like a data scientist
How to think like a data scientistHow to think like a data scientist
How to think like a data scientist
Sidharth Raj Agarwal
 
Big data : A TED TALK
Big data : A TED TALKBig data : A TED TALK
Big data : A TED TALK
Sidharth Raj Agarwal
 
Tedtalk presentation
Tedtalk presentationTedtalk presentation
Tedtalk presentation
Sidharth Raj Agarwal
 

More from Sidharth Raj Agarwal (15)

You may not need big data after all
You may not need big data after allYou may not need big data after all
You may not need big data after all
 
Big data hype
Big data hypeBig data hype
Big data hype
 
How to use data to make hit tv show
How to use data to make hit tv showHow to use data to make hit tv show
How to use data to make hit tv show
 
Big data revolution in healthcare
Big data revolution in healthcareBig data revolution in healthcare
Big data revolution in healthcare
 
Leader’s guide to data analytics
Leader’s guide to data  analyticsLeader’s guide to data  analytics
Leader’s guide to data analytics
 
How to spot bad statistics
How to spot bad statisticsHow to spot bad statistics
How to spot bad statistics
 
Predictive analytics
Predictive analyticsPredictive analytics
Predictive analytics
 
Difference that good statistics can make
Difference that good statistics can make Difference that good statistics can make
Difference that good statistics can make
 
Data is worthless if you don’t communicate it
Data is worthless if you don’t communicate itData is worthless if you don’t communicate it
Data is worthless if you don’t communicate it
 
Data visualization TED Talk
Data visualization TED TalkData visualization TED Talk
Data visualization TED Talk
 
Are you a data driven
Are you a data drivenAre you a data driven
Are you a data driven
 
Make data more human
Make data more humanMake data more human
Make data more human
 
How to think like a data scientist
How to think like a data scientistHow to think like a data scientist
How to think like a data scientist
 
Big data : A TED TALK
Big data : A TED TALKBig data : A TED TALK
Big data : A TED TALK
 
Tedtalk presentation
Tedtalk presentationTedtalk presentation
Tedtalk presentation
 

Recently uploaded

Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 

Recently uploaded (20)

Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 

Data scientist a perfect job

  • 2.
  • 3. Harvard Business Review ranks Data Science to be the sexiest job in the 21st century Glassdoor ranks Data Scientist to be the best job!
  • 4. HADOOP JOBS VACANCY PERCENTILE PAY SCALE AVG INDIAN SALARY
  • 5. TOP LANGUAGES OR COURSES FOR DATA SCIENCE APPOINTED BY EX US PRESIDENT BARACK.H.OBAMA AVERAGE SALARIES OF DATA ANALYST ACROSS INDIA
  • 6.
  • 8. OVERVIEW • Operating System: Cross Platform • Type: Distributed File System • License: Apache License 2.0 • Website: hadoop.apache.org • Written in: Java
  • 9. HADOOP FRAMEWORK • Hadoop Common – contains libraries and utilities needed by other Hadoop modules • Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster • Hadoop YARN – a platform responsible for managing computing resources in clusters and using them for scheduling users' applications • Hadoop MapReduce – an implementation of the MapReduce programming model for large-scale data processing
  • 10. ARCHITECTURE Hadoop Common Package comprises of: • File system and operating system level abstractions • MapReduce engine (either MapReduce/ MR1 or YARN/MR2) • Hadoop Distributed File System(HDFS) Hadoop requires Java Runtime Environment (JRE) 1.6 or higher The standard startup and shutdown scripts require that Secure Shell(SSH) be set up between nodes in the cluster
  • 11. MapReduce is a framework for processing parallelizable problems across large datasets using a large number of computers (nodes), collectively referred to as a cluster(if all nodes are on the same local network and use similar hardware) or a grid. Processing can occur on data stored either in a filesystem(unstructured) or in a database (structured). MapReduce can take advantage of the locality of data, processing it near the place it is stored in order to minimize communication overhead. MapReduce Engine
  • 12. MapReduce is as a 5-step parallel and distributed computation: 1 Prepare the Map() input – the "MapReduce system" designates Map processors, assigns the input key value K1 that each processor would work on, and provides that processor with all the input data associated with that key value. 2 Run the user-provided Map() code – Map() is run exactly once for each K1 key value, generating output organized by key values K2. 3 "Shuffle" the Map output to the Reduce processors – the MapReduce system designates Reduce processors, assigns the K2 key value each processor should work on, and provides that processor with all the Map-generated data associated with that key value. 4 Run the user-provided Reduce() code – Reduce() is run exactly once for each K2 key value produced by the Map step. 5 Produce the final output – the MapReduce system collects all the Reduce output, and sorts it by K2 to produce the final outcome. Map Reduce Algorithm
  • 13. Managers in India As stated in the article The story of Jonathan Goldman illustrates, their greatest opportunity to add value is not in creating reports or presentations for senior executives but in innovating with customer-facing products and processes. In India there are so many E-Commerce companies that are in their development stage. Like the data scientists of big companies like Intuit, Google, GE, Zynga have worked their way out to optimize the service contracts and maintenance intervals for industrial products, either by core search, ad servicing algorithms, MapReduce algorithm etc.. So the Managers in India should focus more on their data analytical skills mentioned in the above slides instead of creating reports for senior executives. They should be thorough with Machine Learning, R, Python, Apache Hadoop Packages, Mathematical and Statistical knowledge. This will help them in their business life as data science is the sexiest job of 21st century!!!!!
  • 14.
  • 15. Bibliography • Harvard Business Review • wikipedia.org • https://www.sas.com/en_ae/insights/big-data/ hadoop.html • Apache Hadoop