© SpringPeople Software Private Limited, All Rights Reserved. 
Introduction to Big Data Analytics Hadoop
© SpringPeople Software Private Limited, All Rights Reserved. 
What is Big Data? 
Bigdataisapopulartermusedtodescribetheexponentialgrowthandavailabilityofdata,bothstructuredandunstructured.Andbigdatamaybeasimportanttobusiness–andsociety–astheInternethasbecome.
© SpringPeople Software Private Limited, All Rights Reserved. 
What Is Hadoop? 
Itisafree,Java-basedprogrammingframeworkthatsupportstheprocessingoflargedatasetsinadistributedcomputingenvironment.ItispartoftheApacheprojectsponsoredbytheApacheSoftwareFoundation.
© SpringPeople Software Private Limited, All Rights Reserved. 
What is HDFS? 
•The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. 
•HDFS is like the bucket of the Hadoop system: You dump in your data and it sits there all nice and cozy until you want to do something with it, whether that's running an analysis on it within Hadoop or capturing and exporting a set of data to another tool and performing the analysis there.
© SpringPeople Software Private Limited, All Rights Reserved. 
Architecture Of HDFS
© SpringPeople Software Private Limited, All Rights Reserved. 
About Map Reduce 
MapReduceis a software framework that allows developers to write programs that process massive amounts of unstructured data in parallel across a distributed cluster of processors or stand-alone computers. 
The framework is divided into two parts: 
•Map, afunctionthat parcels out work to different nodes in the distributed cluster. 
•Reduce, another function that collates the work and resolves the results into a single value.
© SpringPeople Software Private Limited, All Rights Reserved. 
Pig Latin Statement 
APigLatinstatementisacommandthatproducesaRelation.Arelationissimplyadatabagwithaname. Thatnameiscalledtherelation'salias.ThesimplestPigLatinstatementisLOAD,whichreadsarelationfromafileinthefilesystem.OtherPigLatinstatementsprocessoneormoreinputrelations,andproduceanewrelationasaresult.
© SpringPeople Software Private Limited, All Rights Reserved. 
Data Preparation & Management 
•Types of variables 
•Identifying the business Y 
•Basic Statistics 
•Merging and Appending data –Primary key concept 
•Missing values 
•Outliers
© SpringPeople Software Private Limited, All Rights Reserved. 
Data Visualization 
•Data visualization is the presentation of data in a pictorial or graphical format. For centuries, people have depended on visual representations such as charts and maps to understand information more easily and quickly. 
•Visualizations help people see things that were not obvious to them before. Even when data volumes are very large, patterns can be spotted quickly and easily. 
•Visualizations convey information in a universal manner and make it simple to share ideas with others.
© SpringPeople Software Private Limited, All Rights Reserved. 
Normal Distribution 
•A normal distribution is an arrangement of a data set in which most values cluster in the middle of the range and the rest taper off symmetrically toward either extreme. 
•Normal distribution curves are sometimes designed with ahistograminside the curve. The graphs are commonly used in mathematics, statistics and corporatedata analytics.
© SpringPeople Software Private Limited, All Rights Reserved. 
Hypothesis Testing 
Hypothesistestingreferstotheprocessofchoosingbetweencompetinghypothesesaboutaprobabilitydistribution,basedonobserveddatafromthedistribution. 
Thetwomaintypesoftesting:- 
•T Test 
•Annova
© SpringPeople Software Private Limited, All Rights Reserved. 
Deductive VsInductive Reasoning 
•Deductive reasoninghappens when a researcher works from the more general information to the more specific. Sometimes this is called the “top-down” approach because the researcher starts at the top with a very broad spectrum of information and they work their way down to a specific conclusion. 
•Inductive reasoningworks the opposite way, moving from specific observations to broader generalizations and theories. This is sometimes called a “bottom up” approach. The researcher begins with specific observations and measures, begins to then detect patterns and regularities, formulate some tentative hypotheses to explore, and finally ends up developing some general conclusions or theories.
© SpringPeople Software Private Limited, All Rights Reserved. 
Become Big Data ExpertIn Just 2 days 
BigData Analytics on Hadoop will teach you all you need to learn about BigData Analytics on Hadoop. 
More Details
© SpringPeople Software Private Limited, All Rights Reserved. 
Suggested Audience 
•Data analysts / Data scientists who want to know how to use their expertise on Big Data 
•Database Managers with a knowledge of Hadoop / Java who want to know what to do next in their career and how to manage and draw insights from their data 
•Consultants who want to know what Big Data analytics is. 
Syllabus
© SpringPeople Software Private Limited, All Rights Reserved. 
For further info/assistance contact 
training@springpeople.com 
+91 80 656 79700 
www.springpeople.com 
Our Partners

Introduction To Big Data Analytics On Hadoop - SpringPeople

  • 1.
    © SpringPeople SoftwarePrivate Limited, All Rights Reserved. Introduction to Big Data Analytics Hadoop
  • 2.
    © SpringPeople SoftwarePrivate Limited, All Rights Reserved. What is Big Data? Bigdataisapopulartermusedtodescribetheexponentialgrowthandavailabilityofdata,bothstructuredandunstructured.Andbigdatamaybeasimportanttobusiness–andsociety–astheInternethasbecome.
  • 3.
    © SpringPeople SoftwarePrivate Limited, All Rights Reserved. What Is Hadoop? Itisafree,Java-basedprogrammingframeworkthatsupportstheprocessingoflargedatasetsinadistributedcomputingenvironment.ItispartoftheApacheprojectsponsoredbytheApacheSoftwareFoundation.
  • 4.
    © SpringPeople SoftwarePrivate Limited, All Rights Reserved. What is HDFS? •The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. •HDFS is like the bucket of the Hadoop system: You dump in your data and it sits there all nice and cozy until you want to do something with it, whether that's running an analysis on it within Hadoop or capturing and exporting a set of data to another tool and performing the analysis there.
  • 5.
    © SpringPeople SoftwarePrivate Limited, All Rights Reserved. Architecture Of HDFS
  • 6.
    © SpringPeople SoftwarePrivate Limited, All Rights Reserved. About Map Reduce MapReduceis a software framework that allows developers to write programs that process massive amounts of unstructured data in parallel across a distributed cluster of processors or stand-alone computers. The framework is divided into two parts: •Map, afunctionthat parcels out work to different nodes in the distributed cluster. •Reduce, another function that collates the work and resolves the results into a single value.
  • 7.
    © SpringPeople SoftwarePrivate Limited, All Rights Reserved. Pig Latin Statement APigLatinstatementisacommandthatproducesaRelation.Arelationissimplyadatabagwithaname. Thatnameiscalledtherelation'salias.ThesimplestPigLatinstatementisLOAD,whichreadsarelationfromafileinthefilesystem.OtherPigLatinstatementsprocessoneormoreinputrelations,andproduceanewrelationasaresult.
  • 8.
    © SpringPeople SoftwarePrivate Limited, All Rights Reserved. Data Preparation & Management •Types of variables •Identifying the business Y •Basic Statistics •Merging and Appending data –Primary key concept •Missing values •Outliers
  • 9.
    © SpringPeople SoftwarePrivate Limited, All Rights Reserved. Data Visualization •Data visualization is the presentation of data in a pictorial or graphical format. For centuries, people have depended on visual representations such as charts and maps to understand information more easily and quickly. •Visualizations help people see things that were not obvious to them before. Even when data volumes are very large, patterns can be spotted quickly and easily. •Visualizations convey information in a universal manner and make it simple to share ideas with others.
  • 10.
    © SpringPeople SoftwarePrivate Limited, All Rights Reserved. Normal Distribution •A normal distribution is an arrangement of a data set in which most values cluster in the middle of the range and the rest taper off symmetrically toward either extreme. •Normal distribution curves are sometimes designed with ahistograminside the curve. The graphs are commonly used in mathematics, statistics and corporatedata analytics.
  • 11.
    © SpringPeople SoftwarePrivate Limited, All Rights Reserved. Hypothesis Testing Hypothesistestingreferstotheprocessofchoosingbetweencompetinghypothesesaboutaprobabilitydistribution,basedonobserveddatafromthedistribution. Thetwomaintypesoftesting:- •T Test •Annova
  • 12.
    © SpringPeople SoftwarePrivate Limited, All Rights Reserved. Deductive VsInductive Reasoning •Deductive reasoninghappens when a researcher works from the more general information to the more specific. Sometimes this is called the “top-down” approach because the researcher starts at the top with a very broad spectrum of information and they work their way down to a specific conclusion. •Inductive reasoningworks the opposite way, moving from specific observations to broader generalizations and theories. This is sometimes called a “bottom up” approach. The researcher begins with specific observations and measures, begins to then detect patterns and regularities, formulate some tentative hypotheses to explore, and finally ends up developing some general conclusions or theories.
  • 13.
    © SpringPeople SoftwarePrivate Limited, All Rights Reserved. Become Big Data ExpertIn Just 2 days BigData Analytics on Hadoop will teach you all you need to learn about BigData Analytics on Hadoop. More Details
  • 14.
    © SpringPeople SoftwarePrivate Limited, All Rights Reserved. Suggested Audience •Data analysts / Data scientists who want to know how to use their expertise on Big Data •Database Managers with a knowledge of Hadoop / Java who want to know what to do next in their career and how to manage and draw insights from their data •Consultants who want to know what Big Data analytics is. Syllabus
  • 15.
    © SpringPeople SoftwarePrivate Limited, All Rights Reserved. For further info/assistance contact training@springpeople.com +91 80 656 79700 www.springpeople.com Our Partners