SlideShare a Scribd company logo
www.sensaran.wordpress.com
Building Big Data Analytics with
Apache Hadoop for
Beginner’s
ABOUT ME
www.sensaran.wordpress.com
 Myself Senthil Kumar Srinivasan
 11+ yrs. of experience in programming industry
 Working as senior consultant in capgemini
as UI Architect since 2011.
 Catch me @
 www.sensaran.wordpress.com
ROAD MAP
About Me
Targeted - Audience ?
Big-Data & Introduction ?
Big-Data’s and 4 v’s ?
About Apache Hadoop
Hadoop features
Big-Data’s and sources.
Traditional/Big- Data Approach
www.sensaran.wordpress.com
Quiz/Question
TARGETED AUDIENCE ?
 Professionals aspiring for a career in Big
Data analytics using Apache Hadoop.
 Analytics professionals, IT professionals,
ETL developers, project managers, and
testing novices and experts.
 Other aspirants and students looking forward to gain a thorough
understanding of the implementation of Hadoop framework.
www.sensaran.wordpress.com
5
 Big data is a collection of data sets that are large and complex in
nature.
 They constitute both structured and unstructured data that grow
large so fast that they are not manageable by traditional relational
database systems or conventional statistical tools.
 Today’s cheap commodity hardware, cloud architectures and open
source software bring big data processing into the reach of the less
well-resourced
WHAT IS BIG DATA ?
www.sensaran.wordpress.com
6
 Due to the advent of new technologies devices and social
networking sites , the amount of data produced by mankind will be
increasing every year
 Amount of data which we produced till 2003 was 5 billion gigabytes
if you pile up the data in form of disk it will acquire entire football
ground
 currently we generating same amount of data in every two days
 90% of data’s was generated in last few years
How Big is “Big Data”?
www.sensaran.wordpress.com
“In its raw form, oil has little value. Once processed and
refined, it helps power the world.”
—Ann Winblad
www.sensaran.wordpress.com
How Analysis of Big Data useful for
organization.
www.sensaran.wordpress.com
www.sensaran.wordpress.com
 Discovering what we do not know from data
 For instance , nowadays people rely on Facebook or twitter or
buying products in ebay,Flipkart,Amazon…etc
 Communicating relevant business stories from data
 Building confidence in decisions that drive business value
 Creating data for products that have business impact now
BYTES IN COMPUTER STORAGE
 Byte
 Kilobyte (KB)
 Megabyte (MB)
 Gigabyte (GB)
 Terabyte (TB)
 Petabyte (PB)
 Zettabyte (ZB)
 Yottabyte (YB)
www.sensaran.wordpress.com
Hobbyist
 Byte -- Byte is 8 bits.
 Kilobyte -- 1,024 bytes.
 Megabyte -- 1,024 Kilobytes.
 Gigabyte -- 1,024 Megabytes.
 Terabyte -- 1,024 Gigabytes.
 Petabyte -- 1,024 Terabytes.
 Exabyte -- 1,024 Petabytes.
 Zettabyte -- 1,024 Exabytes.
 Yottabyte -- 1,024 Zettabytes.
www.sensaran.wordpress.com
Hobbyist
 Byte -- Byte is 8 bits.
 Kilobyte -- 1,024 bytes.
 Megabyte -- 1,024 Kilobytes.
 Gigabyte -- 1,024 Megabytes.
 Terabyte -- 1,024 Gigabytes.
 Petabyte -- 1,024 Terabytes.
 Exabyte -- 1,024 Petabytes.
 Zettabyte -- 1,024 Exabytes.
 Yottabyte -- 1,024 Zettabytes.
www.sensaran.wordpress.com
Desktop
Hobbyist
 Byte -- Byte is 8 bits.
 Kilobyte -- 1,024 bytes.
 Megabyte -- 1,024 Kilobytes.
 Gigabyte -- 1,024 Megabytes.
 Terabyte -- 1,024 Gigabytes.
 Petabyte -- 1,024 Terabytes.
 Exabyte -- 1,024 Petabytes.
 Zettabyte -- 1,024 Exabytes.
 Yottabyte -- 1,024 Zettabytes.
www.sensaran.wordpress.com
Desktop
Internet
Hobbyist
 Byte -- Byte is 8 bits.
 Kilobyte -- 1,024 bytes.
 Megabyte -- 1,024 Kilobytes.
 Gigabyte -- 1,024 Megabytes.
 Terabyte -- 1,024 Gigabytes.
 Petabyte -- 1,024 Terabytes.
 Exabyte -- 1,024 Petabytes.
 Zettabyte -- 1,024 Exabytes.
 Yottabyte -- 1,024 Zettabytes.
www.sensaran.wordpress.com
Desktop
Internet
Big Data
Hobbyist
 Byte -- Byte is 8 bits.
 Kilobyte -- 1,024 bytes.
 Megabyte -- 1,024 Kilobytes.
 Gigabyte -- 1,024 Megabytes.
 Terabyte -- 1,024 Gigabytes.
 Petabyte -- 1,024 Terabytes.
 Exabyte -- 1,024 Petabytes.
 Zettabyte -- 1,024 Exabytes.
 Yottabyte -- 1,024 Zettabytes.
www.sensaran.wordpress.com
Desktop
Internet
Hobbyist
 Byte -- Byte is 8 bits.
 Kilobyte -- 1,024 bytes.
 Megabyte -- 1,024 Kilobytes.
 Gigabyte -- 1,024 Megabytes.
 Terabyte -- 1,024 Gigabytes.
 Petabyte -- 1,024 Terabytes.
 Exabyte -- 1,024 Petabytes.
 Zettabyte -- 1,024 Exabytes.
 Yottabyte -- 1,024 Zettabytes.
Desktop
Internet
Our Future
TYPES OF DATA’S
www.sensaran.wordpress.com
FOUR CHARACTERISTICS OF BIG DATA 4V’s
www.sensaran.wordpress.com
BIG DATA - VOLUME
 A typical PC might have had 10 gigabytes of storage in 2000.
 Today, Facebook ingests 500 terabytes of new data every day.
 Boeing 737 will generate 240 terabytes of flight data during a
single flight across the US.
 The smart phones, the data they create and consume; sensors
embedded into everyday objects will soon result in billions of
new, constantly-updated data feeds containing environmental,
location, and other information, including video.
www.sensaran.wordpress.com
BIG DATA - VELOCITY
 Clickstreams and ad impressions capture user behavior at
millions of events per second.
 High-frequency stock trading algorithms reflect market changes
within microseconds.
 Machine to machine processes exchange data between billions of
devices.
 Infrastructure and sensors generate massive log data in real-time.
 On-line gaming systems support millions of concurrent users,
each producing multiple inputs per second.
www.sensaran.wordpress.com
BIG DATA - VARIETY
 Big Data isn't just numbers, dates, and strings. Big Data is also
geospatial data, 3D data, audio and video, and unstructured
text, including log files and social media.
 Traditional database systems were designed to address
smaller volumes of structured data, fewer updates or a
predictable, consistent data structure.
 Big Data analysis includes different types of data.
www.sensaran.wordpress.com
BIG DATA - VARACITY
 The quality of captured data can vary greatly, affecting
accurate analysis.
www.sensaran.wordpress.com
 Web logs
 Sensor networks
 Social media
 Internet text and documents
 Internet pages
 Search index data
 Atmospheric science, astronomy,
biochemical and medical records
 Scientific research
 Military surveillance & photography archives.
BIG DATA SOURCES
www.sensaran.wordpress.com
LETS SEE CLEARLY
APPLICATION DATA’S
 Business apps are structured with High volume.
 It is in Structured format.
 Eg – Weather Report ,Heath care service.
 Stocks , Bank and other web service.
www.sensaran.wordpress.com
MACHINE DATA’S
 Companies that utilize devices that are
equipped with sensors and network
connectivity can leverage these for data as well.
 It is Semi-Structured format.
 Eg – Medical device ,Car sensor , Satellites ,
Traffic recording devices and cell towers.
www.sensaran.wordpress.com
www.sensaran.wordpress.com
SOCIAL-MEDIA DATA’S
 It is Highly unstructured.
 FB generates 10TB daily.
 Twitter generates 7TB of data
daily.
www.sensaran.wordpress.com
ARCHIVED DATA’S
 Archived of scanned documents, statements of
insurance forms ,medical record , paper archives
and print stream files that contain original system
of record between organization and customer.
 Highly unstructured and high in volume.
www.sensaran.wordpress.com
TRADITIONAL ANALYTICS APPROACH
 The requirements are defined,
followed by solution design and
build.
 Once the solution is implemented,
queries are executed.
 If there are new requirements or
queries, the system is redesigned
and rebuilt.
www.sensaran.wordpress.com
BIG-DATA’S APPROACH
www.sensaran.wordpress.com
Overview of Apache Hadoop History
www.sensaran.wordpress.com
ABOUT APACHE HADOOP
 Apache Hadoop ( name is derived from a cute toy elephant ) is an
open source java framework which allows you to store large amount
of data on clusters of low cost commodity hardware.
 It provides you the techniques to process the distributed data using
simple programming models.
 Hadoop Framework allows to quickly write and test the distributed
systems.
www.sensaran.wordpress.com
 It does not rely on hardware to provide fault
tolerance and high availability.
 It is compatible to all platforms because it is an
java based component.
 Hadoop was initially inspired by papers published
by Google outlining, its approach to handling an
avalanche of data, and has since become the de
facto standard for storing, processing and
analyzing hundreds of terabytes,and even
petabytes of data.
 Runs a number of applications on distributed
systems with thousands of nodes involving
petabytes of data.
 Has a distributed file system, called Hadoop
Distributed File System or HDFS, which enables fast
data transfer among the nodes.
WHY I NEED TO USE HADOOP ??
www.sensaran.wordpress.com
Scalable :
Hadoop is a highly scalable storage platform, because it can store
and distribute very large data sets across hundreds of inexpensive
servers which can be implemented in parallel way.
Economical/Cost -
In cost level comparison it is helpful to process the large set of
dats's. On other hand hadoop offers to set the large set of data for
later use it will help the major companies to reduce the cost level
implementation.
.
HADOOP FEATURES
www.sensaran.wordpress.com
Flexible :
It can easily process both structured and unstructured data's like
data's from social networking sites or else log processing, data
warehousing and market analysis.
Fast :
It's very efficient to process terabytes data's in a minutes and
petabytes in an hours.
Recovery/Reliable :
A key advantage of using Hadoop is its fault tolerance. When data is
sent to an individual node, that data is also replicated to other nodes
in the cluster, which means that in the event of failure, there is
another copy available for use.
www.sensaran.wordpress.com
READY FOR QUIZ TIME?
www.sensaran.wordpress.com
1 . Apache Hadoop was derived from white paper
a) Amazon
b) Google
c) Yahoo
d) All the above
www.sensaran.wordpress.com
2 . Four V’s in Big data denotes ?
a) Volume , Vertical , Variable , Value
b) Velocity , Volume , Variety , Varacity
c) Velocity , Variable , Value , Vertical
d) Volume, Validity, Value, Variable
www.sensaran.wordpress.com
3 . Which of the following aspects of refers to data size?
a) Volume
b) Velocity
c) Veracity
d) Value
www.sensaran.wordpress.com
4 . Which of the following is semi-structured data ?
a) Collection of tables in databases
b) Collection of text files
c) Collection of tickets
d) Collection of XML files
www.sensaran.wordpress.com
5 . What was Hadoop named after?
a) Creator Doug Cutting's favorite circus act
b) Cutting's high school rock band
c) The toy elephant of Cutting's son
d) A sound Cutting's laptop made during Hadoop's development
www.sensaran.wordpress.com
6 . Does Hadoop support both Unstructured and Structured ?
a) True
b) False
www.sensaran.wordpress.com
7 . All of the following accurately describe Hadoop, EXCEPT?
a) Open source
b) Real-time
c) Java-based
d) Distributed computing approach
www.sensaran.wordpress.com
8 . Which of the following aspects of refers to multiple data source?
a) Volume
b) Velocity
c) Velocity
d) Value
www.sensaran.wordpress.com
9. __________ has the world’s largest Hadoop cluster.
a) Apple
b) Datamatics
c) Facebook
d) None of the mentioned
www.sensaran.wordpress.com
10
Need to add
Installing Hadoop in Windows
VMware Player is a software package, offered by VMware, Inc.,
used to create and work on virtual machines
VMware Player can be downloaded free of cost from:
http://www.vmware.com/in/products/player/
Based on your system version please download 32 bit or 64 bit
version of VM ware
VMware Player—Hardware Requirements
The hardware requirements for working on VMware Player are as
follows:
>1 GHz or faster processor (2GHz recommended)
>1GB RAM minimum
>50GB of disk space to install the application
VMware Player
1.Download the required version
2. Click Next on the installation wizard.
VMware Player
Accept the default location and click Next to continue
with the installation process
VMware Player
Select the checkbox for software update – if required
VMware Player
Select the shortcut option
VMware Player
The installation wizard states that the installation
process is yet to begin. Click Continue.
VMware Player
Click Finish on the Setup Wizard Complete section. .
VMware Player
Double-click on the desktop icon for VMware Player,
accept the license agreement, and click Next.
VMware Player
We have to create the virtual machine
VMware Player
We have to create the virtual machine
VMware Player
We have to create the virtual machine
VMware Player
We have to create the virtual machine
VMware Player
We have to create the virtual machine
VMware Player
We have to create the virtual machine
VMware Player
We have to create the virtual machine
VMware Player
VMware Player
VMware Player
Please reach me
Senthil.srinivasan@capgemini.com
www.sensaran.wordpress.com

More Related Content

What's hot

System z Mainframe Data with Amazon S3 and Amazon Glacier (ENT107) | AWS re:I...
System z Mainframe Data with Amazon S3 and Amazon Glacier (ENT107) | AWS re:I...System z Mainframe Data with Amazon S3 and Amazon Glacier (ENT107) | AWS re:I...
System z Mainframe Data with Amazon S3 and Amazon Glacier (ENT107) | AWS re:I...
Amazon Web Services
 
Azure Databases with IaaS
Azure Databases with IaaSAzure Databases with IaaS
Azure Databases with IaaS
Kellyn Pot'Vin-Gorman
 
Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Has Your Data Gone Rogue?
Has Your Data Gone Rogue?
Tony Pearson
 
Azure DBA with IaaS
Azure DBA with IaaSAzure DBA with IaaS
Azure DBA with IaaS
Kellyn Pot'Vin-Gorman
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
 
Find a needle in Haystack: Facebook's storage system
Find a needle in Haystack: Facebook's storage systemFind a needle in Haystack: Facebook's storage system
Find a needle in Haystack: Facebook's storage system
LIN Yi
 
SharePoint on Azure
SharePoint on Azure SharePoint on Azure
PASS Summit 2020
PASS Summit 2020PASS Summit 2020
PASS Summit 2020
Kellyn Pot'Vin-Gorman
 
How to Win When Migrating to Azure
How to Win When Migrating to AzureHow to Win When Migrating to Azure
How to Win When Migrating to Azure
Kellyn Pot'Vin-Gorman
 
Disaster Recovery Synapse
Disaster Recovery SynapseDisaster Recovery Synapse
Disaster Recovery Synapse
RicardoLinhares22
 
Azure Storage Revisited
Azure Storage RevisitedAzure Storage Revisited
Azure Storage Revisited
Joel Cochran
 
Backup and Recovery for Linux With Amazon S3
Backup and Recovery for Linux With Amazon S3Backup and Recovery for Linux With Amazon S3
Backup and Recovery for Linux With Amazon S3
Amazon Web Services
 
SharePoint on Microsoft Azure
SharePoint on Microsoft AzureSharePoint on Microsoft Azure
SharePoint on Microsoft Azure
K.Mohamed Faizal
 
(BAC309) Automating Backup and Archiving with AWS and CommVault | AWS re:Inve...
(BAC309) Automating Backup and Archiving with AWS and CommVault | AWS re:Inve...(BAC309) Automating Backup and Archiving with AWS and CommVault | AWS re:Inve...
(BAC309) Automating Backup and Archiving with AWS and CommVault | AWS re:Inve...
Amazon Web Services
 
Implement SQL Server on an Azure VM
Implement SQL Server on an Azure VMImplement SQL Server on an Azure VM
Implement SQL Server on an Azure VM
James Serra
 
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
Vivek Adithya Mohankumar
 
Using Oracle Database with Amazon Web Services
Using Oracle Database with Amazon Web ServicesUsing Oracle Database with Amazon Web Services
Using Oracle Database with Amazon Web Services
guest484c12
 
Spring-Boot-PQS with Apache Ignite Caching @ HbaseCon PhoenixCon Dataworks su...
Spring-Boot-PQS with Apache Ignite Caching @ HbaseCon PhoenixCon Dataworks su...Spring-Boot-PQS with Apache Ignite Caching @ HbaseCon PhoenixCon Dataworks su...
Spring-Boot-PQS with Apache Ignite Caching @ HbaseCon PhoenixCon Dataworks su...
Anirudha Jadhav
 
Scalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed SystemsScalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed Systems
hyun soomyung
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with Bluemix
Nicolas Morales
 

What's hot (20)

System z Mainframe Data with Amazon S3 and Amazon Glacier (ENT107) | AWS re:I...
System z Mainframe Data with Amazon S3 and Amazon Glacier (ENT107) | AWS re:I...System z Mainframe Data with Amazon S3 and Amazon Glacier (ENT107) | AWS re:I...
System z Mainframe Data with Amazon S3 and Amazon Glacier (ENT107) | AWS re:I...
 
Azure Databases with IaaS
Azure Databases with IaaSAzure Databases with IaaS
Azure Databases with IaaS
 
Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Has Your Data Gone Rogue?
Has Your Data Gone Rogue?
 
Azure DBA with IaaS
Azure DBA with IaaSAzure DBA with IaaS
Azure DBA with IaaS
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Find a needle in Haystack: Facebook's storage system
Find a needle in Haystack: Facebook's storage systemFind a needle in Haystack: Facebook's storage system
Find a needle in Haystack: Facebook's storage system
 
SharePoint on Azure
SharePoint on Azure SharePoint on Azure
SharePoint on Azure
 
PASS Summit 2020
PASS Summit 2020PASS Summit 2020
PASS Summit 2020
 
How to Win When Migrating to Azure
How to Win When Migrating to AzureHow to Win When Migrating to Azure
How to Win When Migrating to Azure
 
Disaster Recovery Synapse
Disaster Recovery SynapseDisaster Recovery Synapse
Disaster Recovery Synapse
 
Azure Storage Revisited
Azure Storage RevisitedAzure Storage Revisited
Azure Storage Revisited
 
Backup and Recovery for Linux With Amazon S3
Backup and Recovery for Linux With Amazon S3Backup and Recovery for Linux With Amazon S3
Backup and Recovery for Linux With Amazon S3
 
SharePoint on Microsoft Azure
SharePoint on Microsoft AzureSharePoint on Microsoft Azure
SharePoint on Microsoft Azure
 
(BAC309) Automating Backup and Archiving with AWS and CommVault | AWS re:Inve...
(BAC309) Automating Backup and Archiving with AWS and CommVault | AWS re:Inve...(BAC309) Automating Backup and Archiving with AWS and CommVault | AWS re:Inve...
(BAC309) Automating Backup and Archiving with AWS and CommVault | AWS re:Inve...
 
Implement SQL Server on an Azure VM
Implement SQL Server on an Azure VMImplement SQL Server on an Azure VM
Implement SQL Server on an Azure VM
 
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
 
Using Oracle Database with Amazon Web Services
Using Oracle Database with Amazon Web ServicesUsing Oracle Database with Amazon Web Services
Using Oracle Database with Amazon Web Services
 
Spring-Boot-PQS with Apache Ignite Caching @ HbaseCon PhoenixCon Dataworks su...
Spring-Boot-PQS with Apache Ignite Caching @ HbaseCon PhoenixCon Dataworks su...Spring-Boot-PQS with Apache Ignite Caching @ HbaseCon PhoenixCon Dataworks su...
Spring-Boot-PQS with Apache Ignite Caching @ HbaseCon PhoenixCon Dataworks su...
 
Scalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed SystemsScalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed Systems
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with Bluemix
 

Similar to Big data - Apache Hadoop for Beginner's

Big data
Big dataBig data
Big data
Mahmudul Alam
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
Dr.K.Sreenivas Rao
 
big-data-notes1.ppt
big-data-notes1.pptbig-data-notes1.ppt
big-data-notes1.ppt
SutanuGhosal1
 
Big Data
Big DataBig Data
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Edureka!
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Scott Mitchell
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
Ajay Ohri
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
VaishnavGhadge1
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
almaraniabwmalk
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
Neelam Rawat
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
6535ANURAGANURAG
 
No sql databases
No sql databasesNo sql databases
Big data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and HealthcareBig data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and Healthcare
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
Rajesh Kumar
 
Big data ppt
Big data pptBig data ppt
Bigdata
BigdataBigdata
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
Prakalp Agarwal
 
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Edureka!
 
Big data 2017 final
Big data 2017   finalBig data 2017   final
Big data 2017 final
Amjid Ali
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
Vivek Gautam
 

Similar to Big data - Apache Hadoop for Beginner's (20)

Big data
Big dataBig data
Big data
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
big-data-notes1.ppt
big-data-notes1.pptbig-data-notes1.ppt
big-data-notes1.ppt
 
Big Data
Big DataBig Data
Big Data
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Big data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and HealthcareBig data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and Healthcare
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Bigdata
BigdataBigdata
Bigdata
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
 
Big data 2017 final
Big data 2017   finalBig data 2017   final
Big data 2017 final
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 

More from senthil0809

First look on python
First look on pythonFirst look on python
First look on python
senthil0809
 
Enterprise search with apache solr
Enterprise search with apache solrEnterprise search with apache solr
Enterprise search with apache solr
senthil0809
 
Get started with R lang
Get started with R langGet started with R lang
Get started with R lang
senthil0809
 
AIR - Framework ( Cairngorm and Parsley )
AIR - Framework ( Cairngorm and Parsley )AIR - Framework ( Cairngorm and Parsley )
AIR - Framework ( Cairngorm and Parsley )
senthil0809
 
Exploring Layouts and Providers
Exploring Layouts and ProvidersExploring Layouts and Providers
Exploring Layouts and Providers
senthil0809
 
Exploring Adobe Flex
Exploring Adobe Flex Exploring Adobe Flex
Exploring Adobe Flex
senthil0809
 
Flex Introduction
Flex Introduction Flex Introduction
Flex Introduction
senthil0809
 
Multi Touch presentation
Multi Touch presentationMulti Touch presentation
Multi Touch presentation
senthil0809
 

More from senthil0809 (8)

First look on python
First look on pythonFirst look on python
First look on python
 
Enterprise search with apache solr
Enterprise search with apache solrEnterprise search with apache solr
Enterprise search with apache solr
 
Get started with R lang
Get started with R langGet started with R lang
Get started with R lang
 
AIR - Framework ( Cairngorm and Parsley )
AIR - Framework ( Cairngorm and Parsley )AIR - Framework ( Cairngorm and Parsley )
AIR - Framework ( Cairngorm and Parsley )
 
Exploring Layouts and Providers
Exploring Layouts and ProvidersExploring Layouts and Providers
Exploring Layouts and Providers
 
Exploring Adobe Flex
Exploring Adobe Flex Exploring Adobe Flex
Exploring Adobe Flex
 
Flex Introduction
Flex Introduction Flex Introduction
Flex Introduction
 
Multi Touch presentation
Multi Touch presentationMulti Touch presentation
Multi Touch presentation
 

Recently uploaded

DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
uevausa
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
lzdvtmy8
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 

Recently uploaded (20)

DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 

Big data - Apache Hadoop for Beginner's

  • 1. www.sensaran.wordpress.com Building Big Data Analytics with Apache Hadoop for Beginner’s
  • 2. ABOUT ME www.sensaran.wordpress.com  Myself Senthil Kumar Srinivasan  11+ yrs. of experience in programming industry  Working as senior consultant in capgemini as UI Architect since 2011.  Catch me @  www.sensaran.wordpress.com
  • 3. ROAD MAP About Me Targeted - Audience ? Big-Data & Introduction ? Big-Data’s and 4 v’s ? About Apache Hadoop Hadoop features Big-Data’s and sources. Traditional/Big- Data Approach www.sensaran.wordpress.com Quiz/Question
  • 4. TARGETED AUDIENCE ?  Professionals aspiring for a career in Big Data analytics using Apache Hadoop.  Analytics professionals, IT professionals, ETL developers, project managers, and testing novices and experts.  Other aspirants and students looking forward to gain a thorough understanding of the implementation of Hadoop framework. www.sensaran.wordpress.com
  • 5. 5  Big data is a collection of data sets that are large and complex in nature.  They constitute both structured and unstructured data that grow large so fast that they are not manageable by traditional relational database systems or conventional statistical tools.  Today’s cheap commodity hardware, cloud architectures and open source software bring big data processing into the reach of the less well-resourced WHAT IS BIG DATA ? www.sensaran.wordpress.com
  • 6. 6  Due to the advent of new technologies devices and social networking sites , the amount of data produced by mankind will be increasing every year  Amount of data which we produced till 2003 was 5 billion gigabytes if you pile up the data in form of disk it will acquire entire football ground  currently we generating same amount of data in every two days  90% of data’s was generated in last few years How Big is “Big Data”? www.sensaran.wordpress.com
  • 7. “In its raw form, oil has little value. Once processed and refined, it helps power the world.” —Ann Winblad www.sensaran.wordpress.com
  • 8. How Analysis of Big Data useful for organization. www.sensaran.wordpress.com
  • 9. www.sensaran.wordpress.com  Discovering what we do not know from data  For instance , nowadays people rely on Facebook or twitter or buying products in ebay,Flipkart,Amazon…etc  Communicating relevant business stories from data  Building confidence in decisions that drive business value  Creating data for products that have business impact now
  • 10. BYTES IN COMPUTER STORAGE  Byte  Kilobyte (KB)  Megabyte (MB)  Gigabyte (GB)  Terabyte (TB)  Petabyte (PB)  Zettabyte (ZB)  Yottabyte (YB) www.sensaran.wordpress.com
  • 11. Hobbyist  Byte -- Byte is 8 bits.  Kilobyte -- 1,024 bytes.  Megabyte -- 1,024 Kilobytes.  Gigabyte -- 1,024 Megabytes.  Terabyte -- 1,024 Gigabytes.  Petabyte -- 1,024 Terabytes.  Exabyte -- 1,024 Petabytes.  Zettabyte -- 1,024 Exabytes.  Yottabyte -- 1,024 Zettabytes. www.sensaran.wordpress.com
  • 12. Hobbyist  Byte -- Byte is 8 bits.  Kilobyte -- 1,024 bytes.  Megabyte -- 1,024 Kilobytes.  Gigabyte -- 1,024 Megabytes.  Terabyte -- 1,024 Gigabytes.  Petabyte -- 1,024 Terabytes.  Exabyte -- 1,024 Petabytes.  Zettabyte -- 1,024 Exabytes.  Yottabyte -- 1,024 Zettabytes. www.sensaran.wordpress.com Desktop
  • 13. Hobbyist  Byte -- Byte is 8 bits.  Kilobyte -- 1,024 bytes.  Megabyte -- 1,024 Kilobytes.  Gigabyte -- 1,024 Megabytes.  Terabyte -- 1,024 Gigabytes.  Petabyte -- 1,024 Terabytes.  Exabyte -- 1,024 Petabytes.  Zettabyte -- 1,024 Exabytes.  Yottabyte -- 1,024 Zettabytes. www.sensaran.wordpress.com Desktop Internet
  • 14. Hobbyist  Byte -- Byte is 8 bits.  Kilobyte -- 1,024 bytes.  Megabyte -- 1,024 Kilobytes.  Gigabyte -- 1,024 Megabytes.  Terabyte -- 1,024 Gigabytes.  Petabyte -- 1,024 Terabytes.  Exabyte -- 1,024 Petabytes.  Zettabyte -- 1,024 Exabytes.  Yottabyte -- 1,024 Zettabytes. www.sensaran.wordpress.com Desktop Internet Big Data
  • 15. Hobbyist  Byte -- Byte is 8 bits.  Kilobyte -- 1,024 bytes.  Megabyte -- 1,024 Kilobytes.  Gigabyte -- 1,024 Megabytes.  Terabyte -- 1,024 Gigabytes.  Petabyte -- 1,024 Terabytes.  Exabyte -- 1,024 Petabytes.  Zettabyte -- 1,024 Exabytes.  Yottabyte -- 1,024 Zettabytes. www.sensaran.wordpress.com Desktop Internet
  • 16. Hobbyist  Byte -- Byte is 8 bits.  Kilobyte -- 1,024 bytes.  Megabyte -- 1,024 Kilobytes.  Gigabyte -- 1,024 Megabytes.  Terabyte -- 1,024 Gigabytes.  Petabyte -- 1,024 Terabytes.  Exabyte -- 1,024 Petabytes.  Zettabyte -- 1,024 Exabytes.  Yottabyte -- 1,024 Zettabytes. Desktop Internet Our Future
  • 18. FOUR CHARACTERISTICS OF BIG DATA 4V’s www.sensaran.wordpress.com
  • 19. BIG DATA - VOLUME  A typical PC might have had 10 gigabytes of storage in 2000.  Today, Facebook ingests 500 terabytes of new data every day.  Boeing 737 will generate 240 terabytes of flight data during a single flight across the US.  The smart phones, the data they create and consume; sensors embedded into everyday objects will soon result in billions of new, constantly-updated data feeds containing environmental, location, and other information, including video. www.sensaran.wordpress.com
  • 20. BIG DATA - VELOCITY  Clickstreams and ad impressions capture user behavior at millions of events per second.  High-frequency stock trading algorithms reflect market changes within microseconds.  Machine to machine processes exchange data between billions of devices.  Infrastructure and sensors generate massive log data in real-time.  On-line gaming systems support millions of concurrent users, each producing multiple inputs per second. www.sensaran.wordpress.com
  • 21. BIG DATA - VARIETY  Big Data isn't just numbers, dates, and strings. Big Data is also geospatial data, 3D data, audio and video, and unstructured text, including log files and social media.  Traditional database systems were designed to address smaller volumes of structured data, fewer updates or a predictable, consistent data structure.  Big Data analysis includes different types of data. www.sensaran.wordpress.com
  • 22. BIG DATA - VARACITY  The quality of captured data can vary greatly, affecting accurate analysis. www.sensaran.wordpress.com
  • 23.  Web logs  Sensor networks  Social media  Internet text and documents  Internet pages  Search index data  Atmospheric science, astronomy, biochemical and medical records  Scientific research  Military surveillance & photography archives. BIG DATA SOURCES www.sensaran.wordpress.com
  • 24. LETS SEE CLEARLY APPLICATION DATA’S  Business apps are structured with High volume.  It is in Structured format.  Eg – Weather Report ,Heath care service.  Stocks , Bank and other web service. www.sensaran.wordpress.com
  • 25. MACHINE DATA’S  Companies that utilize devices that are equipped with sensors and network connectivity can leverage these for data as well.  It is Semi-Structured format.  Eg – Medical device ,Car sensor , Satellites , Traffic recording devices and cell towers. www.sensaran.wordpress.com
  • 26. www.sensaran.wordpress.com SOCIAL-MEDIA DATA’S  It is Highly unstructured.  FB generates 10TB daily.  Twitter generates 7TB of data daily.
  • 27. www.sensaran.wordpress.com ARCHIVED DATA’S  Archived of scanned documents, statements of insurance forms ,medical record , paper archives and print stream files that contain original system of record between organization and customer.  Highly unstructured and high in volume.
  • 28. www.sensaran.wordpress.com TRADITIONAL ANALYTICS APPROACH  The requirements are defined, followed by solution design and build.  Once the solution is implemented, queries are executed.  If there are new requirements or queries, the system is redesigned and rebuilt.
  • 31. www.sensaran.wordpress.com ABOUT APACHE HADOOP  Apache Hadoop ( name is derived from a cute toy elephant ) is an open source java framework which allows you to store large amount of data on clusters of low cost commodity hardware.  It provides you the techniques to process the distributed data using simple programming models.  Hadoop Framework allows to quickly write and test the distributed systems.
  • 32. www.sensaran.wordpress.com  It does not rely on hardware to provide fault tolerance and high availability.  It is compatible to all platforms because it is an java based component.  Hadoop was initially inspired by papers published by Google outlining, its approach to handling an avalanche of data, and has since become the de facto standard for storing, processing and analyzing hundreds of terabytes,and even petabytes of data.
  • 33.  Runs a number of applications on distributed systems with thousands of nodes involving petabytes of data.  Has a distributed file system, called Hadoop Distributed File System or HDFS, which enables fast data transfer among the nodes. WHY I NEED TO USE HADOOP ??
  • 34. www.sensaran.wordpress.com Scalable : Hadoop is a highly scalable storage platform, because it can store and distribute very large data sets across hundreds of inexpensive servers which can be implemented in parallel way. Economical/Cost - In cost level comparison it is helpful to process the large set of dats's. On other hand hadoop offers to set the large set of data for later use it will help the major companies to reduce the cost level implementation. . HADOOP FEATURES
  • 35. www.sensaran.wordpress.com Flexible : It can easily process both structured and unstructured data's like data's from social networking sites or else log processing, data warehousing and market analysis. Fast : It's very efficient to process terabytes data's in a minutes and petabytes in an hours. Recovery/Reliable : A key advantage of using Hadoop is its fault tolerance. When data is sent to an individual node, that data is also replicated to other nodes in the cluster, which means that in the event of failure, there is another copy available for use.
  • 37. www.sensaran.wordpress.com 1 . Apache Hadoop was derived from white paper a) Amazon b) Google c) Yahoo d) All the above
  • 38. www.sensaran.wordpress.com 2 . Four V’s in Big data denotes ? a) Volume , Vertical , Variable , Value b) Velocity , Volume , Variety , Varacity c) Velocity , Variable , Value , Vertical d) Volume, Validity, Value, Variable
  • 39. www.sensaran.wordpress.com 3 . Which of the following aspects of refers to data size? a) Volume b) Velocity c) Veracity d) Value
  • 40. www.sensaran.wordpress.com 4 . Which of the following is semi-structured data ? a) Collection of tables in databases b) Collection of text files c) Collection of tickets d) Collection of XML files
  • 41. www.sensaran.wordpress.com 5 . What was Hadoop named after? a) Creator Doug Cutting's favorite circus act b) Cutting's high school rock band c) The toy elephant of Cutting's son d) A sound Cutting's laptop made during Hadoop's development
  • 42. www.sensaran.wordpress.com 6 . Does Hadoop support both Unstructured and Structured ? a) True b) False
  • 43. www.sensaran.wordpress.com 7 . All of the following accurately describe Hadoop, EXCEPT? a) Open source b) Real-time c) Java-based d) Distributed computing approach
  • 44. www.sensaran.wordpress.com 8 . Which of the following aspects of refers to multiple data source? a) Volume b) Velocity c) Velocity d) Value
  • 45. www.sensaran.wordpress.com 9. __________ has the world’s largest Hadoop cluster. a) Apple b) Datamatics c) Facebook d) None of the mentioned
  • 47. Installing Hadoop in Windows VMware Player is a software package, offered by VMware, Inc., used to create and work on virtual machines VMware Player can be downloaded free of cost from: http://www.vmware.com/in/products/player/ Based on your system version please download 32 bit or 64 bit version of VM ware
  • 48. VMware Player—Hardware Requirements The hardware requirements for working on VMware Player are as follows: >1 GHz or faster processor (2GHz recommended) >1GB RAM minimum >50GB of disk space to install the application
  • 49. VMware Player 1.Download the required version 2. Click Next on the installation wizard.
  • 50. VMware Player Accept the default location and click Next to continue with the installation process
  • 51. VMware Player Select the checkbox for software update – if required
  • 52. VMware Player Select the shortcut option
  • 53. VMware Player The installation wizard states that the installation process is yet to begin. Click Continue.
  • 54. VMware Player Click Finish on the Setup Wizard Complete section. .
  • 55. VMware Player Double-click on the desktop icon for VMware Player, accept the license agreement, and click Next.
  • 56. VMware Player We have to create the virtual machine
  • 57. VMware Player We have to create the virtual machine
  • 58. VMware Player We have to create the virtual machine
  • 59. VMware Player We have to create the virtual machine
  • 60. VMware Player We have to create the virtual machine
  • 61. VMware Player We have to create the virtual machine
  • 62. VMware Player We have to create the virtual machine

Editor's Notes

  1. sample
  2. www.scmGalaxy.com, Author - Rajesh Kumar
  3. www.scmGalaxy.com, Author - Rajesh Kumar
  4. Acco.to IBM