SlideShare a Scribd company logo
1 of 15
Download to read offline
WANT $100K
JOB?
Harvard Business Review ranks Data Science to
be
the sexiest job in the 21st century
Glassdoor ranks Data Scientist to be
the best job!
HADOOP JOBS
VACANCY PERCENTILE
PAY SCALE
AVG INDIAN SALARY
TOP LANGUAGES OR COURSES
FOR DATA SCIENCE
APPOINTED BY EX US PRESIDENT
BARACK.H.OBAMA
AVERAGE SALARIES OF DATA ANALYST
ACROSS INDIA
h

a

d

o

o

p
OVERVIEW
• Operating System: Cross Platform

• Type: Distributed File System

• License: Apache License 2.0

• Website: hadoop.apache.org

• Written in: Java
HADOOP FRAMEWORK
• Hadoop Common – contains libraries and utilities needed by other Hadoop modules
• Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on
commodity machines, providing very high aggregate bandwidth across the cluster
• Hadoop YARN – a platform responsible for managing computing resources in clusters
and using them for scheduling users' applications
• Hadoop MapReduce – an implementation of the MapReduce programming model for
large-scale data processing
ARCHITECTURE
Hadoop Common Package
comprises of:
• File system and operating system level
abstractions
• MapReduce engine (either MapReduce/
MR1 or YARN/MR2)
• Hadoop Distributed File System(HDFS)
Hadoop requires Java Runtime
Environment (JRE) 1.6 or higher
The standard startup and shutdown scripts
require that Secure Shell(SSH) be set up
between nodes in the cluster
MapReduce is a framework for
processing parallelizable problems
across large datasets using a large
number of computers (nodes),
collectively referred to as a cluster(if
all nodes are on the same local
network and use similar hardware) or
a grid.
Processing can occur on data stored
either in a filesystem(unstructured) or
in a database (structured).
MapReduce can take advantage of the
locality of data, processing it near the
place it is stored in order to minimize
communication overhead.
MapReduce Engine
MapReduce is as a 5-step parallel
and distributed computation:
1 Prepare the Map() input – the "MapReduce
system" designates Map processors,
assigns the input key value K1 that each
processor would work on, and provides that
processor with all the input data associated
with that key value.
2 Run the user-provided Map() code – Map() is
run exactly once for each K1 key value,
generating output organized by key values
K2.
3 "Shuffle" the Map output to the Reduce
processors – the MapReduce system
designates Reduce processors, assigns the
K2 key value each processor should work
on, and provides that processor with all the
Map-generated data associated with that
key value.
4 Run the user-provided Reduce() code –
Reduce() is run exactly once for each K2 key
value produced by the Map step.
5 Produce the final output – the MapReduce
system collects all the Reduce output, and
sorts it by K2 to produce the final outcome.
Map Reduce Algorithm
Managers in India
As stated in the article
The story of Jonathan Goldman illustrates, their greatest opportunity to add value is not in
creating reports or presentations for senior executives but in innovating with customer-facing
products and processes.
In India there are so many E-Commerce companies that are in their development stage. Like the
data scientists of big companies like Intuit, Google, GE, Zynga have worked their way out to
optimize the service contracts and maintenance intervals for industrial products, either by core
search, ad servicing algorithms, MapReduce algorithm etc..
So the Managers in India should focus more on their data analytical skills mentioned in the
above slides
instead of creating reports for senior executives.
They should be thorough with Machine Learning, R, Python, Apache Hadoop Packages,
Mathematical and Statistical knowledge.
This will help them in their business life as data science is the sexiest job of 21st century!!!!!
Bibliography
• Harvard Business Review

• wikipedia.org

• https://www.sas.com/en_ae/insights/big-data/
hadoop.html

• Apache Hadoop

More Related Content

What's hot

Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Cloudera, Inc.
 

What's hot (20)

The Past, Present and Future of Big Data @LinkedIn
The Past, Present and Future of Big Data @LinkedInThe Past, Present and Future of Big Data @LinkedIn
The Past, Present and Future of Big Data @LinkedIn
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
About Streaming Data Solutions for Hadoop
About Streaming Data Solutions for HadoopAbout Streaming Data Solutions for Hadoop
About Streaming Data Solutions for Hadoop
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS...
Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS...Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS...
Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS...
 
Big data and tools
Big data and tools Big data and tools
Big data and tools
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPT
 
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
 
Hadoop J.G.Rohini II M.Sc.,computer science Bon secours college for women
Hadoop J.G.Rohini II M.Sc.,computer science Bon secours college for womenHadoop J.G.Rohini II M.Sc.,computer science Bon secours college for women
Hadoop J.G.Rohini II M.Sc.,computer science Bon secours college for women
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
 
Big data hadoop
Big data  hadoopBig data  hadoop
Big data hadoop
 
Hourglass: a Library for Incremental Processing on Hadoop
Hourglass: a Library for Incremental Processing on HadoopHourglass: a Library for Incremental Processing on Hadoop
Hourglass: a Library for Incremental Processing on Hadoop
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
 
Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
 
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
The Microsoft BigData Story
The Microsoft BigData StoryThe Microsoft BigData Story
The Microsoft BigData Story
 

Similar to Data scientist a perfect job

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Deanna Kosaraju
 

Similar to Data scientist a perfect job (20)

B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
 
hadoop
hadoophadoop
hadoop
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
 
IRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache PigIRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache Pig
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases report
 
BigData_Krishna Kumar Sharma
BigData_Krishna Kumar SharmaBigData_Krishna Kumar Sharma
BigData_Krishna Kumar Sharma
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
 
Hadoop
HadoopHadoop
Hadoop
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19
 
Big data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-contentBig data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-content
 
Cascading concurrent yahoo lunch_nlearn
Cascading concurrent   yahoo lunch_nlearnCascading concurrent   yahoo lunch_nlearn
Cascading concurrent yahoo lunch_nlearn
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOP
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 

More from Sidharth Raj Agarwal (15)

You may not need big data after all
You may not need big data after allYou may not need big data after all
You may not need big data after all
 
Big data hype
Big data hypeBig data hype
Big data hype
 
How to use data to make hit tv show
How to use data to make hit tv showHow to use data to make hit tv show
How to use data to make hit tv show
 
Big data revolution in healthcare
Big data revolution in healthcareBig data revolution in healthcare
Big data revolution in healthcare
 
Leader’s guide to data analytics
Leader’s guide to data  analyticsLeader’s guide to data  analytics
Leader’s guide to data analytics
 
How to spot bad statistics
How to spot bad statisticsHow to spot bad statistics
How to spot bad statistics
 
Predictive analytics
Predictive analyticsPredictive analytics
Predictive analytics
 
Difference that good statistics can make
Difference that good statistics can make Difference that good statistics can make
Difference that good statistics can make
 
Data is worthless if you don’t communicate it
Data is worthless if you don’t communicate itData is worthless if you don’t communicate it
Data is worthless if you don’t communicate it
 
Data visualization TED Talk
Data visualization TED TalkData visualization TED Talk
Data visualization TED Talk
 
Are you a data driven
Are you a data drivenAre you a data driven
Are you a data driven
 
Make data more human
Make data more humanMake data more human
Make data more human
 
How to think like a data scientist
How to think like a data scientistHow to think like a data scientist
How to think like a data scientist
 
Big data : A TED TALK
Big data : A TED TALKBig data : A TED TALK
Big data : A TED TALK
 
Tedtalk presentation
Tedtalk presentationTedtalk presentation
Tedtalk presentation
 

Recently uploaded

如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
Amil baba
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
pyhepag
 
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
mikehavy0
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
great91
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Valters Lauzums
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
ppy8zfkfm
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
pyhepag
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cyebo
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 

Recently uploaded (20)

如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 

Data scientist a perfect job

  • 2.
  • 3. Harvard Business Review ranks Data Science to be the sexiest job in the 21st century Glassdoor ranks Data Scientist to be the best job!
  • 4. HADOOP JOBS VACANCY PERCENTILE PAY SCALE AVG INDIAN SALARY
  • 5. TOP LANGUAGES OR COURSES FOR DATA SCIENCE APPOINTED BY EX US PRESIDENT BARACK.H.OBAMA AVERAGE SALARIES OF DATA ANALYST ACROSS INDIA
  • 6.
  • 8. OVERVIEW • Operating System: Cross Platform • Type: Distributed File System • License: Apache License 2.0 • Website: hadoop.apache.org • Written in: Java
  • 9. HADOOP FRAMEWORK • Hadoop Common – contains libraries and utilities needed by other Hadoop modules • Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster • Hadoop YARN – a platform responsible for managing computing resources in clusters and using them for scheduling users' applications • Hadoop MapReduce – an implementation of the MapReduce programming model for large-scale data processing
  • 10. ARCHITECTURE Hadoop Common Package comprises of: • File system and operating system level abstractions • MapReduce engine (either MapReduce/ MR1 or YARN/MR2) • Hadoop Distributed File System(HDFS) Hadoop requires Java Runtime Environment (JRE) 1.6 or higher The standard startup and shutdown scripts require that Secure Shell(SSH) be set up between nodes in the cluster
  • 11. MapReduce is a framework for processing parallelizable problems across large datasets using a large number of computers (nodes), collectively referred to as a cluster(if all nodes are on the same local network and use similar hardware) or a grid. Processing can occur on data stored either in a filesystem(unstructured) or in a database (structured). MapReduce can take advantage of the locality of data, processing it near the place it is stored in order to minimize communication overhead. MapReduce Engine
  • 12. MapReduce is as a 5-step parallel and distributed computation: 1 Prepare the Map() input – the "MapReduce system" designates Map processors, assigns the input key value K1 that each processor would work on, and provides that processor with all the input data associated with that key value. 2 Run the user-provided Map() code – Map() is run exactly once for each K1 key value, generating output organized by key values K2. 3 "Shuffle" the Map output to the Reduce processors – the MapReduce system designates Reduce processors, assigns the K2 key value each processor should work on, and provides that processor with all the Map-generated data associated with that key value. 4 Run the user-provided Reduce() code – Reduce() is run exactly once for each K2 key value produced by the Map step. 5 Produce the final output – the MapReduce system collects all the Reduce output, and sorts it by K2 to produce the final outcome. Map Reduce Algorithm
  • 13. Managers in India As stated in the article The story of Jonathan Goldman illustrates, their greatest opportunity to add value is not in creating reports or presentations for senior executives but in innovating with customer-facing products and processes. In India there are so many E-Commerce companies that are in their development stage. Like the data scientists of big companies like Intuit, Google, GE, Zynga have worked their way out to optimize the service contracts and maintenance intervals for industrial products, either by core search, ad servicing algorithms, MapReduce algorithm etc.. So the Managers in India should focus more on their data analytical skills mentioned in the above slides instead of creating reports for senior executives. They should be thorough with Machine Learning, R, Python, Apache Hadoop Packages, Mathematical and Statistical knowledge. This will help them in their business life as data science is the sexiest job of 21st century!!!!!
  • 14.
  • 15. Bibliography • Harvard Business Review • wikipedia.org • https://www.sas.com/en_ae/insights/big-data/ hadoop.html • Apache Hadoop