SlideShare a Scribd company logo
1 of 23
Presented By – Saurav Sinha
1
CONTENT
• Introduction
• What is Big Data
• Who’s generating Big Data ?
• Characteristic of Big Data
• Storing and Processing of Big Data
• Why Big Data
• Setting up the Environment
• IBM Big Insights Info sphere
• Working with the tools
• Advantages & Disadvantages
• Companies in Big Data Hadoop 2
Big Data Definition
 No single standard definition…
“Big Data” is data whose scale, diversity, and complexity
require new architecture, techniques, algorithms, and
analytics to manage it and extract value and hidden
knowledge from it…
3
Who’s Generating Big Data ?
Social media and
networks
(All of us are
generating data)
Scientific instruments
(Collecting all sorts of data)
Mobile devices
(Tracking all objects
all the time)
Sensor technology
and networks
(Measuring all kinds
of data)
• The progress and innovation is no longer hindered by the ability to collect data
• But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected
data in a timely manner and in a scalable fashion 4
5
THREE CHARACTERISTIC OF BIG DATA
• The characteristics of ‘Big Data’ is based on basically 3 – V’s.
6
Some Make it 4V’s
7
STORING OF BIG DATA
Analyzing your data characteristics
 Selecting data sources for analysis
 Eliminating unnecessary data
Overview of Big Data stores
 Hadoop Distributed File System
 HBase
 Hive
8
PROCESSING OF BIG DATA
Integrating Desperate Data Stores
 Connecting and extracting data from storage
 Subdividing data in preparation for Hadoop MapReduce.
Employing Hadoop MapReduce
 Creating the components of Hadoop MapReduce Jobs.
 Distributing data processing across server farms (group of networks).
9
Setting up the Environment
• System configuration
 Frequency – Min. 2.40 GHz
 OS – 64 – Bit Windows 7 or 8
 RAM – Min. 4 Gb
 Hard disk – 1 TB (1024 GB) & 160 GB free space
for Hadoop Installation
 Graphics – 2 GB
 Virtualization Technology must be enabled.
• Software Required
 VMware Workstation 12.1 Pro
 iibi30_QuickStart_Single_VMware_2
Enable Virtualization by going into BIOS setting of the system.
10
IBM Big Insights Info sphere
 Download BigInsights 2.7 Quick Start Edition VMware image from “IBM’s External
Download Site”. Use the image for the single-node cluster.
 Install VMware player or other required software to run VMware images.
 Decompress (Unzip) the file and install the image on your laptop/pc.
 Launch the VMware Player and select the image file.
1st Step:-
Be patient ! ‘Unzipping will take around 25-30 mins’.
2nd Step:-
3rd Step:-
4th Step:-
11
Start the “VMware Image” by clicking the Play virtual
machine button in the “VMware Player” if it is not
already on.
STATE
• Powered Off means the virtual machine is off.
OS
• It shows which Virtual OS is selected.
Edit Virtual Machine Settings
• We can edit the setting for the “Virtual Machine”.
• We can even increase or decrease the RAM.
Play Virtual Machine
• This button starts the Virtual OS.
12
When logging in for the first time, use the root ID (with a password of password). Follow the
instructions to configure your environment, accept the licensing agreement, and enter the passwords for
the root and biadmin IDs (root/password and biadmin/biadmin) when prompted. This is a one-time only
requirement.
When booting up the IBM Info Sphere Big
Insights image will appear like this.
After that this screen will appear.
13
When the one-time configuration process is completed, you will be
presented with a “SUSE Linux log in screen”.
Log in as username -- biadmin
With a password -- biadmin
14
Screen appears similar to this:-
This is the home
screen of the
Virtual Image
after booting up
15
Click Start BigInsights to start all required services. (Alternatively, you can open a terminal
window and issue this command:- $BIGINSIGHTS_HOME/bin/start-all.sh
Double Click on “Start Big Insights”
Now, we can use “Big Insights Shell”, for further Operations.
OR
We can use “Terminal” (Right Click then click on Terminal).
Type this command –
“cd $biginsights_home/bin”
Next type this command – “start-all.sh”
Wait until the operation completes.This may take several minutes, depending on your machine’s resources
16
A subset of which are shown below. Verify that, at a minimum, the following components started
successfully: Hadoop, Hive, SQL, and Console.
From a terminal window, Fire this
command:
$BIGINSIGHTS_HOME/bin/status.sh
Now we are ready to start working
with big data!
All Process with their Processing Id
has been started.
17
Working with the tools
HDFS
• hadoop fs<arguments>
• 1. -ls -listing dir
• 2. -mkdir - make dir
• 3. -ls-R - recursive dir
• 4. -du - size of dir
• 5. -du-S - size of whole dir
• 6. -cp - copy
• 7. -rm - remove(file)
• 8. -rm-r - recursive removing(dir)
• 9. mv - move dir.
• 10. tail - last content of file
• 11. frep - pattern matching
HIVE
• Open a Terminal by going into Big
Insights Shell and open HIVE
Terminal
• create database Someone;
• show databases;
• DESCRIBE DATABASE someonedb;
18
Working with the tools
SQOOP
Sqoop allows you to move data between a relational database
system and Hadoop. Sqoop is able to import data from a
relational table into Hadoop and is also able to export data
from Hadoop into a relational database table.
• Create a database and table
Open a command window. Right click on the desktop and
select Open in Terminal, Switch user to db2inst1. The
password is db2inst1.
Use this command to switch user - su db2inst1
19
20
Advantages
• Scalable:-
‘Hadoop’ is highly scalable platform, because it can store and distribute very large data
sets across hundreds of inexpensive servers that operate in parallel.
• Cost Effective:-
Hadoop also offers a cost effective storage solution for businesses, exploding data sets.
• Flexible:-
Hadoop can be used for wide variety of purposes, such as data warehousing
• Fast:-
Hadoop unique storage method is based on a distributed file system.
• Resilient to Failure:-
The data can not be lost, because of replication of data on different nodes.
21
Disadvantages
• Data which are stored in Big Data Warehouse’s are at some point will
be out of capacity to store all those big data and will require another
warehouse.
• Vulnerable By nature:
The framework is written almost entirely on Java (Controversial
Language)
Top Companies in Big Data Hadoop
22
Thank You
23

More Related Content

What's hot

Presentation On RAID(Redundant Array Of Independent Disks) Basics
Presentation On RAID(Redundant Array Of Independent Disks) BasicsPresentation On RAID(Redundant Array Of Independent Disks) Basics
Presentation On RAID(Redundant Array Of Independent Disks) BasicsKuber Chandra
 
Storage Devices And Backup Media
Storage Devices And Backup MediaStorage Devices And Backup Media
Storage Devices And Backup MediaTyrone Turner
 
Lock it Down with Nutanix Security
Lock it Down with Nutanix SecurityLock it Down with Nutanix Security
Lock it Down with Nutanix SecurityNEXTtour
 
RAID - (Redundant Array of Inexpensive Disks or Drives, or Redundant Array of...
RAID - (Redundant Array of Inexpensive Disks or Drives, or Redundant Array of...RAID - (Redundant Array of Inexpensive Disks or Drives, or Redundant Array of...
RAID - (Redundant Array of Inexpensive Disks or Drives, or Redundant Array of...Jason Augustine
 
Raid : Redundant Array of Inexpensive Disks
Raid : Redundant Array of Inexpensive DisksRaid : Redundant Array of Inexpensive Disks
Raid : Redundant Array of Inexpensive DisksCloudbells.com
 

What's hot (15)

Presentation On RAID(Redundant Array Of Independent Disks) Basics
Presentation On RAID(Redundant Array Of Independent Disks) BasicsPresentation On RAID(Redundant Array Of Independent Disks) Basics
Presentation On RAID(Redundant Array Of Independent Disks) Basics
 
raid technology
raid technologyraid technology
raid technology
 
Raid Technology
Raid TechnologyRaid Technology
Raid Technology
 
Raid
RaidRaid
Raid
 
Raid
RaidRaid
Raid
 
Raid level
Raid levelRaid level
Raid level
 
Week7-slides
Week7-slidesWeek7-slides
Week7-slides
 
Storage Devices And Backup Media
Storage Devices And Backup MediaStorage Devices And Backup Media
Storage Devices And Backup Media
 
Raid
RaidRaid
Raid
 
Lock it Down with Nutanix Security
Lock it Down with Nutanix SecurityLock it Down with Nutanix Security
Lock it Down with Nutanix Security
 
RAID - (Redundant Array of Inexpensive Disks or Drives, or Redundant Array of...
RAID - (Redundant Array of Inexpensive Disks or Drives, or Redundant Array of...RAID - (Redundant Array of Inexpensive Disks or Drives, or Redundant Array of...
RAID - (Redundant Array of Inexpensive Disks or Drives, or Redundant Array of...
 
RAID
RAIDRAID
RAID
 
Raid
RaidRaid
Raid
 
Raid : Redundant Array of Inexpensive Disks
Raid : Redundant Array of Inexpensive DisksRaid : Redundant Array of Inexpensive Disks
Raid : Redundant Array of Inexpensive Disks
 
Week7 homework
Week7 homeworkWeek7 homework
Week7 homework
 

Similar to Big Data and Hadoop

Big Data: Explore Hadoop and BigInsights self-study lab
Big Data:  Explore Hadoop and BigInsights self-study labBig Data:  Explore Hadoop and BigInsights self-study lab
Big Data: Explore Hadoop and BigInsights self-study labCynthia Saracco
 
Windows optimization and customization
Windows optimization and customizationWindows optimization and customization
Windows optimization and customizationHiren Mayani
 
Ibm hadoop info sphere biginsights install
Ibm hadoop info sphere biginsights installIbm hadoop info sphere biginsights install
Ibm hadoop info sphere biginsights installDarnette A
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareData Con LA
 
Hadoop-Automation-Tool_RamkishorTak
Hadoop-Automation-Tool_RamkishorTakHadoop-Automation-Tool_RamkishorTak
Hadoop-Automation-Tool_RamkishorTakRam Kishor Tak
 
Real-time analysis using an in-memory data grid - Cloud Expo 2013
Real-time analysis using an in-memory data grid - Cloud Expo 2013Real-time analysis using an in-memory data grid - Cloud Expo 2013
Real-time analysis using an in-memory data grid - Cloud Expo 2013ScaleOut Software
 
Configure and Manage Virtualization on different Platforms
Configure and Manage Virtualization on different Platforms Configure and Manage Virtualization on different Platforms
Configure and Manage Virtualization on different Platforms Rubal Sagwal
 
Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using HadoopDataWorks Summit
 
Big Data: Big SQL web tooling (Data Server Manager) self-study lab
Big Data:  Big SQL web tooling (Data Server Manager) self-study labBig Data:  Big SQL web tooling (Data Server Manager) self-study lab
Big Data: Big SQL web tooling (Data Server Manager) self-study labCynthia Saracco
 
Sisesnse Business Intelligence Tool
Sisesnse Business Intelligence ToolSisesnse Business Intelligence Tool
Sisesnse Business Intelligence ToolHarnoor Singh
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopMohit Tare
 
Frokost seminar windows 8 februar 2013
Frokost seminar   windows 8 februar 2013Frokost seminar   windows 8 februar 2013
Frokost seminar windows 8 februar 2013Olav Tvedt
 
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopIOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopLeons Petražickis
 
Mark Minasi Introducing Windows 7
Mark Minasi   Introducing Windows 7Mark Minasi   Introducing Windows 7
Mark Minasi Introducing Windows 7Nathan Winters
 
ConnectTheDots - My Galileo based weather station and first entry into IoT
ConnectTheDots - My Galileo based weather station and first entry into IoTConnectTheDots - My Galileo based weather station and first entry into IoT
ConnectTheDots - My Galileo based weather station and first entry into IoTJoe Healy
 
What the Heck Just Happened?
What the Heck Just Happened?What the Heck Just Happened?
What the Heck Just Happened?Ken Evans
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...Evans Ye
 

Similar to Big Data and Hadoop (20)

Big Data: Explore Hadoop and BigInsights self-study lab
Big Data:  Explore Hadoop and BigInsights self-study labBig Data:  Explore Hadoop and BigInsights self-study lab
Big Data: Explore Hadoop and BigInsights self-study lab
 
Windows optimization and customization
Windows optimization and customizationWindows optimization and customization
Windows optimization and customization
 
Ibm hadoop info sphere biginsights install
Ibm hadoop info sphere biginsights installIbm hadoop info sphere biginsights install
Ibm hadoop info sphere biginsights install
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
 
Hadoop-Automation-Tool_RamkishorTak
Hadoop-Automation-Tool_RamkishorTakHadoop-Automation-Tool_RamkishorTak
Hadoop-Automation-Tool_RamkishorTak
 
Real-time analysis using an in-memory data grid - Cloud Expo 2013
Real-time analysis using an in-memory data grid - Cloud Expo 2013Real-time analysis using an in-memory data grid - Cloud Expo 2013
Real-time analysis using an in-memory data grid - Cloud Expo 2013
 
Configure and Manage Virtualization on different Platforms
Configure and Manage Virtualization on different Platforms Configure and Manage Virtualization on different Platforms
Configure and Manage Virtualization on different Platforms
 
Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using Hadoop
 
Big Data: Big SQL web tooling (Data Server Manager) self-study lab
Big Data:  Big SQL web tooling (Data Server Manager) self-study labBig Data:  Big SQL web tooling (Data Server Manager) self-study lab
Big Data: Big SQL web tooling (Data Server Manager) self-study lab
 
Sisesnse Business Intelligence Tool
Sisesnse Business Intelligence ToolSisesnse Business Intelligence Tool
Sisesnse Business Intelligence Tool
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Frokost seminar windows 8 februar 2013
Frokost seminar   windows 8 februar 2013Frokost seminar   windows 8 februar 2013
Frokost seminar windows 8 februar 2013
 
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopIOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
 
Mark Minasi Introducing Windows 7
Mark Minasi   Introducing Windows 7Mark Minasi   Introducing Windows 7
Mark Minasi Introducing Windows 7
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
ConnectTheDots - My Galileo based weather station and first entry into IoT
ConnectTheDots - My Galileo based weather station and first entry into IoTConnectTheDots - My Galileo based weather station and first entry into IoT
ConnectTheDots - My Galileo based weather station and first entry into IoT
 
What the Heck Just Happened?
What the Heck Just Happened?What the Heck Just Happened?
What the Heck Just Happened?
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
Big data
Big dataBig data
Big data
 

Recently uploaded

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Recently uploaded (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Big Data and Hadoop

  • 1. Presented By – Saurav Sinha 1
  • 2. CONTENT • Introduction • What is Big Data • Who’s generating Big Data ? • Characteristic of Big Data • Storing and Processing of Big Data • Why Big Data • Setting up the Environment • IBM Big Insights Info sphere • Working with the tools • Advantages & Disadvantages • Companies in Big Data Hadoop 2
  • 3. Big Data Definition  No single standard definition… “Big Data” is data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it… 3
  • 4. Who’s Generating Big Data ? Social media and networks (All of us are generating data) Scientific instruments (Collecting all sorts of data) Mobile devices (Tracking all objects all the time) Sensor technology and networks (Measuring all kinds of data) • The progress and innovation is no longer hindered by the ability to collect data • But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion 4
  • 5. 5
  • 6. THREE CHARACTERISTIC OF BIG DATA • The characteristics of ‘Big Data’ is based on basically 3 – V’s. 6
  • 7. Some Make it 4V’s 7
  • 8. STORING OF BIG DATA Analyzing your data characteristics  Selecting data sources for analysis  Eliminating unnecessary data Overview of Big Data stores  Hadoop Distributed File System  HBase  Hive 8
  • 9. PROCESSING OF BIG DATA Integrating Desperate Data Stores  Connecting and extracting data from storage  Subdividing data in preparation for Hadoop MapReduce. Employing Hadoop MapReduce  Creating the components of Hadoop MapReduce Jobs.  Distributing data processing across server farms (group of networks). 9
  • 10. Setting up the Environment • System configuration  Frequency – Min. 2.40 GHz  OS – 64 – Bit Windows 7 or 8  RAM – Min. 4 Gb  Hard disk – 1 TB (1024 GB) & 160 GB free space for Hadoop Installation  Graphics – 2 GB  Virtualization Technology must be enabled. • Software Required  VMware Workstation 12.1 Pro  iibi30_QuickStart_Single_VMware_2 Enable Virtualization by going into BIOS setting of the system. 10
  • 11. IBM Big Insights Info sphere  Download BigInsights 2.7 Quick Start Edition VMware image from “IBM’s External Download Site”. Use the image for the single-node cluster.  Install VMware player or other required software to run VMware images.  Decompress (Unzip) the file and install the image on your laptop/pc.  Launch the VMware Player and select the image file. 1st Step:- Be patient ! ‘Unzipping will take around 25-30 mins’. 2nd Step:- 3rd Step:- 4th Step:- 11
  • 12. Start the “VMware Image” by clicking the Play virtual machine button in the “VMware Player” if it is not already on. STATE • Powered Off means the virtual machine is off. OS • It shows which Virtual OS is selected. Edit Virtual Machine Settings • We can edit the setting for the “Virtual Machine”. • We can even increase or decrease the RAM. Play Virtual Machine • This button starts the Virtual OS. 12
  • 13. When logging in for the first time, use the root ID (with a password of password). Follow the instructions to configure your environment, accept the licensing agreement, and enter the passwords for the root and biadmin IDs (root/password and biadmin/biadmin) when prompted. This is a one-time only requirement. When booting up the IBM Info Sphere Big Insights image will appear like this. After that this screen will appear. 13
  • 14. When the one-time configuration process is completed, you will be presented with a “SUSE Linux log in screen”. Log in as username -- biadmin With a password -- biadmin 14
  • 15. Screen appears similar to this:- This is the home screen of the Virtual Image after booting up 15
  • 16. Click Start BigInsights to start all required services. (Alternatively, you can open a terminal window and issue this command:- $BIGINSIGHTS_HOME/bin/start-all.sh Double Click on “Start Big Insights” Now, we can use “Big Insights Shell”, for further Operations. OR We can use “Terminal” (Right Click then click on Terminal). Type this command – “cd $biginsights_home/bin” Next type this command – “start-all.sh” Wait until the operation completes.This may take several minutes, depending on your machine’s resources 16
  • 17. A subset of which are shown below. Verify that, at a minimum, the following components started successfully: Hadoop, Hive, SQL, and Console. From a terminal window, Fire this command: $BIGINSIGHTS_HOME/bin/status.sh Now we are ready to start working with big data! All Process with their Processing Id has been started. 17
  • 18. Working with the tools HDFS • hadoop fs<arguments> • 1. -ls -listing dir • 2. -mkdir - make dir • 3. -ls-R - recursive dir • 4. -du - size of dir • 5. -du-S - size of whole dir • 6. -cp - copy • 7. -rm - remove(file) • 8. -rm-r - recursive removing(dir) • 9. mv - move dir. • 10. tail - last content of file • 11. frep - pattern matching HIVE • Open a Terminal by going into Big Insights Shell and open HIVE Terminal • create database Someone; • show databases; • DESCRIBE DATABASE someonedb; 18
  • 19. Working with the tools SQOOP Sqoop allows you to move data between a relational database system and Hadoop. Sqoop is able to import data from a relational table into Hadoop and is also able to export data from Hadoop into a relational database table. • Create a database and table Open a command window. Right click on the desktop and select Open in Terminal, Switch user to db2inst1. The password is db2inst1. Use this command to switch user - su db2inst1 19
  • 20. 20 Advantages • Scalable:- ‘Hadoop’ is highly scalable platform, because it can store and distribute very large data sets across hundreds of inexpensive servers that operate in parallel. • Cost Effective:- Hadoop also offers a cost effective storage solution for businesses, exploding data sets. • Flexible:- Hadoop can be used for wide variety of purposes, such as data warehousing • Fast:- Hadoop unique storage method is based on a distributed file system. • Resilient to Failure:- The data can not be lost, because of replication of data on different nodes.
  • 21. 21 Disadvantages • Data which are stored in Big Data Warehouse’s are at some point will be out of capacity to store all those big data and will require another warehouse. • Vulnerable By nature: The framework is written almost entirely on Java (Controversial Language)
  • 22. Top Companies in Big Data Hadoop 22