SlideShare a Scribd company logo
1 of 39
18-01-2018
Introducing Technologies for Handling
BIG Data
JASEELA,TKMCE
18-01-2018
Contents
• Distributed and parallel Computing for Big Data
• Introducing Hadoop
• Cloud Computing and Big Data
• In-Memory Computing Technology for Big Data
18-01-2018
Big Data
• Big Data can’t be handled by traditional data storage and
processing systems.
• For handling such type of data,
Distributed and Parallel Technologies are more suitable.
18-01-2018
Distributed and Parallel Computing for Big Data
• Distributed Computing
Multiple computing resources are connected in a network and
computing tasks are distributed across these resources.
 Increases the Speed
 Increases the Efficiency
 more suitable to process huge amount of data in a limited time
18-01-2018
Distributed Computing
Task
Control
Server
Grid
Node
18-01-2018
Distributed and Parallel Computing for Big Data
Parallel Computing
• Also improves the processing capability of a computer system by
adding additional computational resources to it.
• Divide complex computations into subtasks, handled individually
by processing units, running in parallel.
Concept – processing capability will increase with the increase in
the level of parallelism.
18-01-2018
Parallel Computing
Network Disk
Storage
Internet
Server
Ethernet
Switch
ComputeNodes
18-01-2018
Big Data Processing Techniques
• With the increase in data, forcing organizations to adopt a data
analysis strategy that can be used for analysing the entire data in a
very short time.
Done by Powerful h/w components and new s/w programs.
The procedure followed by the s/w applications are:
1) Break up the given task
2) Surveying the available resources
3) Assigning the subtask to the nodes
18-01-2018
Issues in the system
• Resources develop some technical problems and fail to respond
• virtualization.
Some processing and analytical tasks are delegated to other
resources.
Latency : can be defined as the aggregate delay in the s/m bcoz of
delays in the completion of individual tasks.
System delay
Also affects data management and communication
Affecting the productivity & profitability of an organization.
18-01-2018
Distributed Computing Technique for
Processing Large Data
Big Data
Job
Task 1
Task 2
Task 3
Task 4
Task N
Job divided into
tasks for
distributed
execution
o/p collected and
returned to user
Data User Invokes
18-01-2018
Merits of the System
 Scalability
The system with added scalability, can accommodate the growing
amounts of data more efficiently and flexibly.
 Virtualization and Load Balancing Features
Load Balancing – The sharing of workload across various systems.
Virtualization – creates a virtual environment
h/w platform, storage device and OS
18-01-2018
Parallel Computing Techniques
1) Cluster or Grid Computing
• primarily used in Hadoop.
• based on a connection of multiple servers in a network (clusters)
• servers share the workload among them.
• a cluster can be either homogeneous or heterogeneous.
• overall cost may be very high.
2) Massively Parallel Processing (MPP)
• used in data warehouses.
18-01-2018
Parallel Computing Techniques
• Single machine working as a grid is used in the MPP platform.
• Capable of handling the storage, memory and computing activities.
• Software written specifically for MPP platform is used for
optimization.
• MPP platforms, EMC Greenplum, ParAccel , suited for high-value use
cases.
3) High Performance Computing (HPC)
• Offer high performance and scalability by using IMC.
18-01-2018
Parallel Computing Techniques
• Suitable for processing floating point data at high speeds.
• Used in research and business organization where the result is more
valuable than the cost.
18-01-2018
Difference b/w Distributed and Parallel Systems
Distributed System Parallel System
• Independent autonomous s/m
connected in a n/w for accomplishing
specific task.
• Computer s/m with several
processing units attached to it.
• Coordination is possible b/w
connected computers that have their
own m/y and CPU
• Common shared m/y can be directly
accessed by every processing unit in
a n/w.
• Loose coupling of computers
connected in a n/w, providing access
to data and remotely located
resources.
• Tight coupling of processing
resources that are used for solving a
single, complex problem.
18-01-2018
Introducing Hadoop
18-01-2018
• Hadoop is a distributed system like distributed database.
• Hadoop is a ‘software library’ that allows its users to process large
datasets across distributed clusters of computers, thereby
enabling them to gather, store and analyse huge sets of data.
• It provides various tools and technologies, collectively termed as
the Hadoop Ecosystem.
Computing Model of Distributed Database
18-01-2018
INSERT INSERT UPDATE DELETE COMMIT
ROLLBACK ROLLBACK ROLLBACK ROLLBACK
Computing Model of Hadoop
18-01-2018
Map
Map
….
Map
Map
Map
….
Map
Reduce
Service
Start job1
Start job2
Combine
Combine
Start Job1
Start job2
Store status
& Results
Get Result
Reduce
Hadoop Multinode Cluster Architecture
18-01-2018
• Hadoop cluster consist of single Master Node and a multiple
Worker Nodes.
• Master Node Worker Node
Name Node
Job Tracker
Data Node
Task Tracker
• Hadoop requires JRE 1.6 or a higher version.
• Start up and shut down script requires Secure Shell to be setup b/w
nodes in the cluster.
Hadoop Multinode Cluster Architecture
18-01-2018
HDFS Client
Name Node
(NN)
Secondary (2⁰)
Name Node
Data Node Data Node Data Node Data Node Data Node
Heartbeats, balancing, replication etc..
Nodes write to Local Disk
• To process the data, Job Tracker assigns tasks to the Task Tracker.
• If a data node cluster goes down while processing is going on, then
the NN should know that some data node is down in the cluster,
otherwise it can’t continue processing.
• Each DN sends a “Heart Beat Signal” after every few minutes to make
NN aware of active/inactive status of DNs. – Heartbeat Mechanism.
18-01-2018
HDFS and MapReduce
- HDFS – used for storage.
- MapReduce – used for processing.
Hadoop Distributed File System (HDFS)
• fault tolerant storage s/m.
• Large size files from terabytes to petabytes.
• attains reliability by replicating the data over multiple hosts.
• The default replication value is 3.
18-01-2018
HDFS and MapReduce
• File in HDFS is split into large block size of 64 MB.
• Each block is independently replicated at multiple data nodes.
MapReduce
• Parallel processing frame work.
• Helps developers to write programs to process large volumes of
unstructured data.
- JobTracker
- TaskTracker
- JobHistoryServer
18-01-2018
Cloud Computing and Big Data
Cloud Computing is the delivery of computing services—servers,
storage, databases, networking, software, analytics and more—over
the Internet (“the cloud”).
Companies offering these computing services are called cloud
providers and typically charge for cloud computing services based on
usage, similar to how you are billed for water or electricity at home.
18-01-2018
Cloud Computing and Big Data
18-01-2018
Internet
Cloud
Provider
SaaS
PaaS
IaaS
Laptop
Desktop
Mobiles or
PDAs
Features of Cloud Computing
• Scalability – addition of new resources to an existing infrastructure.
- increase in the amount of data , requires organization to improve
h/w components.
- The new h/w may not provide complete support to the s/w, that
used to run properly on the earlier set of h/w.
- Solution to this problem is using cloud services - that employ the
distributed computing technique to provide scalability.
18-01-2018
• Elasticity – Hiring certain resources, as and when required, and
paying for those resources.
- no extra payment is required for acquiring specific cloud services.
- A cloud does not require customers to declare their resource
requirements in advance.
• Resource Pooling - multiple organizations, which use similar kinds
of resources to carry out computing practices, have no need to
individually hire all the resources.
18-01-2018
• Self Service – cloud computing involves a simple user interface that
helps customers to directly access the cloud services they want.
• Low Cost – cloud offers customized solutions, especially to
organizations that cannot afford too much initial investment.
- cloud provides pay-us-you-use option, in which organizations need
to sign for those resources only that are essential.
• Fault Tolerance – offering uninterrupted services to customers
18-01-2018
Cloud Deployment Models
Depending upon the architecture used in forming the n/w, services
and applications used, and the target consumers, cloud services form
various deployment models. They are,
 Public Cloud
 Private Cloud
 Community Cloud
 Hybrid Cloud
18-01-2018
• Public Cloud (End-User Level Cloud)
- Owned and managed by a company than the one using it.
- Third party administrator.
- Eg : Savvis, Verizon, Amazon Web Services, and Rackspace.
- The workload is categorized on the basis of service category,
h/w customization is possible to provide optimized performance.
- The process of computing becomes very flexible and scalable
through customized h/w resources.
- The primary concern with a public cloud include security and
latency.
18-01-2018
18-01-2018
Public Cloud
Cloud
Services(IaaS/
PaaS/SaaS)
Company X
Company Y
Company Z
Fig : Level of Accessibility in a Public Cloud
• Private Cloud (Enterprise Level Cloud)
- Remains entirely in the ownership of the organization using it.
- Infrastructure is solely designed for a single organization.
- Can automate several processes and operations that require manual
handling in a public cloud.
- Can also provide firewall protection to the cloud, solving latency and
security concerns.
- A private cloud can be either on-premises or hosted externally.
on premises : service is exclusively used and hosted by a single
organization.
hosted externally : used by a single organization and are not shared
with other organizations.
18-01-2018
18-01-2018
Private
Cloud
Cloud
services
Fig: Level of Accessibility in a Private Cloud
• Community Cloud
-Type of cloud that is shared among various organizations with a
common tie.
-Managed by third party cloud services.
-Available on or off premises.
Eg. In any state, the community cloud can provided so that almost all
govt. organizations of that state can share the resources available on
the cloud. Because of the sharing of resources on community cloud, the
data of all citizens of that state can be easily managed by the govt.
organizations.
18-01-2018
Organization having common
tie to share resources
18-01-2018
Community
Cloud for
Level A
Community
Cloud for
Level A
Cloud Services Cloud Services
Fig : Level of Accessibility in Community Clouds
• Hybrid Cloud
-various internal or external service providers offer services to many
organizations.
-In hybrid clouds, an organization can use both types of cloud, i.e.
public and private together – situations such as cloud bursting.
• Organization uses its own computing infrastructure, high load
requirement, access clouds.
The organization using the hybrid cloud can manage an internal private
cloud for general use and migrate the entire or part of an application to
the public cloud during the peak periods.
18-01-2018
18-01-2018
Public
Cloud
Private
Cloud
Migrated
Application
Cloud Services
Organization X
Organization Y
Fig : Implementation of a Hybrid Cloud
Thank You
  
18-01-2018

More Related Content

What's hot

4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data miningKrish_ver2
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache HadoopAjit Koti
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATAGauravBiswas9
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...sumithragunasekaran
 
Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisVincenzo Gulisano
 
VTU 6th Sem Elective CSE - Module 4 cloud computing
VTU 6th Sem Elective CSE - Module 4  cloud computingVTU 6th Sem Elective CSE - Module 4  cloud computing
VTU 6th Sem Elective CSE - Module 4 cloud computingSachin Gowda
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data miningSlideshare
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and UsesSuvradeep Rudra
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Harish Chand
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 

What's hot (20)

4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data mining
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data Preprocessing
 
Ppt
PptPpt
Ppt
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
 
Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data Analysis
 
VTU 6th Sem Elective CSE - Module 4 cloud computing
VTU 6th Sem Elective CSE - Module 4  cloud computingVTU 6th Sem Elective CSE - Module 4  cloud computing
VTU 6th Sem Elective CSE - Module 4 cloud computing
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
Decision tree
Decision treeDecision tree
Decision tree
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
04 olap
04 olap04 olap
04 olap
 

Similar to Introducing Technologies for Handling Big Data by Jaseela

Lecture 3.31 3.32.pptx
Lecture 3.31  3.32.pptxLecture 3.31  3.32.pptx
Lecture 3.31 3.32.pptxRATISHKUMAR32
 
Module-2_HADOOP.pptx
Module-2_HADOOP.pptxModule-2_HADOOP.pptx
Module-2_HADOOP.pptxShreyasKv13
 
BIg Data Analytics-Module-2 vtu engineering.pptx
BIg Data Analytics-Module-2 vtu engineering.pptxBIg Data Analytics-Module-2 vtu engineering.pptx
BIg Data Analytics-Module-2 vtu engineering.pptxVishalBH1
 
Traditional data word
Traditional data wordTraditional data word
Traditional data wordorcoxsm
 
Internet of Things and Hadoop
Internet of Things and HadoopInternet of Things and Hadoop
Internet of Things and Hadoopaziksa
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsrishavkumar1402
 
Cloud Storage and Cloud Computing.pptx
Cloud Storage and  Cloud Computing.pptxCloud Storage and  Cloud Computing.pptx
Cloud Storage and Cloud Computing.pptxANALEESUAREZ2
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningModusOptimum
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo
 
Is cloud computing really ready for prime time
Is cloud computing really ready for prime timeIs cloud computing really ready for prime time
Is cloud computing really ready for prime timeVaishnavi
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2Aswini Ashu
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2aswini pilli
 
Cloud Computing for Small & Medium Businesses
Cloud Computing for Small & Medium BusinessesCloud Computing for Small & Medium Businesses
Cloud Computing for Small & Medium BusinessesAl Sabawi
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptxbetalab
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Alluxio, Inc.
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresOzgun Erdogan
 

Similar to Introducing Technologies for Handling Big Data by Jaseela (20)

Lecture 3.31 3.32.pptx
Lecture 3.31  3.32.pptxLecture 3.31  3.32.pptx
Lecture 3.31 3.32.pptx
 
Module-2_HADOOP.pptx
Module-2_HADOOP.pptxModule-2_HADOOP.pptx
Module-2_HADOOP.pptx
 
BIg Data Analytics-Module-2 vtu engineering.pptx
BIg Data Analytics-Module-2 vtu engineering.pptxBIg Data Analytics-Module-2 vtu engineering.pptx
BIg Data Analytics-Module-2 vtu engineering.pptx
 
Big Data
Big DataBig Data
Big Data
 
Traditional data word
Traditional data wordTraditional data word
Traditional data word
 
Internet of Things and Hadoop
Internet of Things and HadoopInternet of Things and Hadoop
Internet of Things and Hadoop
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering students
 
Cloud Storage and Cloud Computing.pptx
Cloud Storage and  Cloud Computing.pptxCloud Storage and  Cloud Computing.pptx
Cloud Storage and Cloud Computing.pptx
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine Learning
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
Is cloud computing really ready for prime time
Is cloud computing really ready for prime timeIs cloud computing really ready for prime time
Is cloud computing really ready for prime time
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
Cloud Computing for Small & Medium Businesses
Cloud Computing for Small & Medium BusinessesCloud Computing for Small & Medium Businesses
Cloud Computing for Small & Medium Businesses
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptx
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with Postgres
 

More from Student

Python for Machine Learning
Python for Machine LearningPython for Machine Learning
Python for Machine LearningStudent
 
Python for Machine Learning
Python for Machine LearningPython for Machine Learning
Python for Machine LearningStudent
 
Decision Trees
Decision TreesDecision Trees
Decision TreesStudent
 
A review of localization systems for robotic endoscopic
A review of localization systems for robotic endoscopicA review of localization systems for robotic endoscopic
A review of localization systems for robotic endoscopicStudent
 
Signature Free Virus Blocking Method to Detect Software Code Security (Intern...
Signature Free Virus Blocking Method to Detect Software Code Security (Intern...Signature Free Virus Blocking Method to Detect Software Code Security (Intern...
Signature Free Virus Blocking Method to Detect Software Code Security (Intern...Student
 
Sigfree ppt (International Journal of Computer Science and Mobile Computing)
Sigfree ppt (International Journal of Computer Science and Mobile Computing)Sigfree ppt (International Journal of Computer Science and Mobile Computing)
Sigfree ppt (International Journal of Computer Science and Mobile Computing)Student
 
Design and Implementation of Improved Authentication System for Android Smart...
Design and Implementation of Improved Authentication System for Android Smart...Design and Implementation of Improved Authentication System for Android Smart...
Design and Implementation of Improved Authentication System for Android Smart...Student
 
Desgn&imp authentctn.ppt by Jaseela
Desgn&imp authentctn.ppt by JaseelaDesgn&imp authentctn.ppt by Jaseela
Desgn&imp authentctn.ppt by JaseelaStudent
 
Google glass by Jaseela
Google glass by JaseelaGoogle glass by Jaseela
Google glass by JaseelaStudent
 

More from Student (9)

Python for Machine Learning
Python for Machine LearningPython for Machine Learning
Python for Machine Learning
 
Python for Machine Learning
Python for Machine LearningPython for Machine Learning
Python for Machine Learning
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
A review of localization systems for robotic endoscopic
A review of localization systems for robotic endoscopicA review of localization systems for robotic endoscopic
A review of localization systems for robotic endoscopic
 
Signature Free Virus Blocking Method to Detect Software Code Security (Intern...
Signature Free Virus Blocking Method to Detect Software Code Security (Intern...Signature Free Virus Blocking Method to Detect Software Code Security (Intern...
Signature Free Virus Blocking Method to Detect Software Code Security (Intern...
 
Sigfree ppt (International Journal of Computer Science and Mobile Computing)
Sigfree ppt (International Journal of Computer Science and Mobile Computing)Sigfree ppt (International Journal of Computer Science and Mobile Computing)
Sigfree ppt (International Journal of Computer Science and Mobile Computing)
 
Design and Implementation of Improved Authentication System for Android Smart...
Design and Implementation of Improved Authentication System for Android Smart...Design and Implementation of Improved Authentication System for Android Smart...
Design and Implementation of Improved Authentication System for Android Smart...
 
Desgn&imp authentctn.ppt by Jaseela
Desgn&imp authentctn.ppt by JaseelaDesgn&imp authentctn.ppt by Jaseela
Desgn&imp authentctn.ppt by Jaseela
 
Google glass by Jaseela
Google glass by JaseelaGoogle glass by Jaseela
Google glass by Jaseela
 

Recently uploaded

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 

Recently uploaded (20)

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 

Introducing Technologies for Handling Big Data by Jaseela

  • 2. Introducing Technologies for Handling BIG Data JASEELA,TKMCE 18-01-2018
  • 3. Contents • Distributed and parallel Computing for Big Data • Introducing Hadoop • Cloud Computing and Big Data • In-Memory Computing Technology for Big Data 18-01-2018
  • 4. Big Data • Big Data can’t be handled by traditional data storage and processing systems. • For handling such type of data, Distributed and Parallel Technologies are more suitable. 18-01-2018
  • 5. Distributed and Parallel Computing for Big Data • Distributed Computing Multiple computing resources are connected in a network and computing tasks are distributed across these resources.  Increases the Speed  Increases the Efficiency  more suitable to process huge amount of data in a limited time 18-01-2018
  • 7. Distributed and Parallel Computing for Big Data Parallel Computing • Also improves the processing capability of a computer system by adding additional computational resources to it. • Divide complex computations into subtasks, handled individually by processing units, running in parallel. Concept – processing capability will increase with the increase in the level of parallelism. 18-01-2018
  • 9. Big Data Processing Techniques • With the increase in data, forcing organizations to adopt a data analysis strategy that can be used for analysing the entire data in a very short time. Done by Powerful h/w components and new s/w programs. The procedure followed by the s/w applications are: 1) Break up the given task 2) Surveying the available resources 3) Assigning the subtask to the nodes 18-01-2018
  • 10. Issues in the system • Resources develop some technical problems and fail to respond • virtualization. Some processing and analytical tasks are delegated to other resources. Latency : can be defined as the aggregate delay in the s/m bcoz of delays in the completion of individual tasks. System delay Also affects data management and communication Affecting the productivity & profitability of an organization. 18-01-2018
  • 11. Distributed Computing Technique for Processing Large Data Big Data Job Task 1 Task 2 Task 3 Task 4 Task N Job divided into tasks for distributed execution o/p collected and returned to user Data User Invokes 18-01-2018
  • 12. Merits of the System  Scalability The system with added scalability, can accommodate the growing amounts of data more efficiently and flexibly.  Virtualization and Load Balancing Features Load Balancing – The sharing of workload across various systems. Virtualization – creates a virtual environment h/w platform, storage device and OS 18-01-2018
  • 13. Parallel Computing Techniques 1) Cluster or Grid Computing • primarily used in Hadoop. • based on a connection of multiple servers in a network (clusters) • servers share the workload among them. • a cluster can be either homogeneous or heterogeneous. • overall cost may be very high. 2) Massively Parallel Processing (MPP) • used in data warehouses. 18-01-2018
  • 14. Parallel Computing Techniques • Single machine working as a grid is used in the MPP platform. • Capable of handling the storage, memory and computing activities. • Software written specifically for MPP platform is used for optimization. • MPP platforms, EMC Greenplum, ParAccel , suited for high-value use cases. 3) High Performance Computing (HPC) • Offer high performance and scalability by using IMC. 18-01-2018
  • 15. Parallel Computing Techniques • Suitable for processing floating point data at high speeds. • Used in research and business organization where the result is more valuable than the cost. 18-01-2018
  • 16. Difference b/w Distributed and Parallel Systems Distributed System Parallel System • Independent autonomous s/m connected in a n/w for accomplishing specific task. • Computer s/m with several processing units attached to it. • Coordination is possible b/w connected computers that have their own m/y and CPU • Common shared m/y can be directly accessed by every processing unit in a n/w. • Loose coupling of computers connected in a n/w, providing access to data and remotely located resources. • Tight coupling of processing resources that are used for solving a single, complex problem. 18-01-2018
  • 17. Introducing Hadoop 18-01-2018 • Hadoop is a distributed system like distributed database. • Hadoop is a ‘software library’ that allows its users to process large datasets across distributed clusters of computers, thereby enabling them to gather, store and analyse huge sets of data. • It provides various tools and technologies, collectively termed as the Hadoop Ecosystem.
  • 18. Computing Model of Distributed Database 18-01-2018 INSERT INSERT UPDATE DELETE COMMIT ROLLBACK ROLLBACK ROLLBACK ROLLBACK
  • 19. Computing Model of Hadoop 18-01-2018 Map Map …. Map Map Map …. Map Reduce Service Start job1 Start job2 Combine Combine Start Job1 Start job2 Store status & Results Get Result Reduce
  • 20. Hadoop Multinode Cluster Architecture 18-01-2018 • Hadoop cluster consist of single Master Node and a multiple Worker Nodes. • Master Node Worker Node Name Node Job Tracker Data Node Task Tracker • Hadoop requires JRE 1.6 or a higher version. • Start up and shut down script requires Secure Shell to be setup b/w nodes in the cluster.
  • 21. Hadoop Multinode Cluster Architecture 18-01-2018 HDFS Client Name Node (NN) Secondary (2⁰) Name Node Data Node Data Node Data Node Data Node Data Node Heartbeats, balancing, replication etc.. Nodes write to Local Disk
  • 22. • To process the data, Job Tracker assigns tasks to the Task Tracker. • If a data node cluster goes down while processing is going on, then the NN should know that some data node is down in the cluster, otherwise it can’t continue processing. • Each DN sends a “Heart Beat Signal” after every few minutes to make NN aware of active/inactive status of DNs. – Heartbeat Mechanism. 18-01-2018
  • 23. HDFS and MapReduce - HDFS – used for storage. - MapReduce – used for processing. Hadoop Distributed File System (HDFS) • fault tolerant storage s/m. • Large size files from terabytes to petabytes. • attains reliability by replicating the data over multiple hosts. • The default replication value is 3. 18-01-2018
  • 24. HDFS and MapReduce • File in HDFS is split into large block size of 64 MB. • Each block is independently replicated at multiple data nodes. MapReduce • Parallel processing frame work. • Helps developers to write programs to process large volumes of unstructured data. - JobTracker - TaskTracker - JobHistoryServer 18-01-2018
  • 25. Cloud Computing and Big Data Cloud Computing is the delivery of computing services—servers, storage, databases, networking, software, analytics and more—over the Internet (“the cloud”). Companies offering these computing services are called cloud providers and typically charge for cloud computing services based on usage, similar to how you are billed for water or electricity at home. 18-01-2018
  • 26. Cloud Computing and Big Data 18-01-2018 Internet Cloud Provider SaaS PaaS IaaS Laptop Desktop Mobiles or PDAs
  • 27. Features of Cloud Computing • Scalability – addition of new resources to an existing infrastructure. - increase in the amount of data , requires organization to improve h/w components. - The new h/w may not provide complete support to the s/w, that used to run properly on the earlier set of h/w. - Solution to this problem is using cloud services - that employ the distributed computing technique to provide scalability. 18-01-2018
  • 28. • Elasticity – Hiring certain resources, as and when required, and paying for those resources. - no extra payment is required for acquiring specific cloud services. - A cloud does not require customers to declare their resource requirements in advance. • Resource Pooling - multiple organizations, which use similar kinds of resources to carry out computing practices, have no need to individually hire all the resources. 18-01-2018
  • 29. • Self Service – cloud computing involves a simple user interface that helps customers to directly access the cloud services they want. • Low Cost – cloud offers customized solutions, especially to organizations that cannot afford too much initial investment. - cloud provides pay-us-you-use option, in which organizations need to sign for those resources only that are essential. • Fault Tolerance – offering uninterrupted services to customers 18-01-2018
  • 30. Cloud Deployment Models Depending upon the architecture used in forming the n/w, services and applications used, and the target consumers, cloud services form various deployment models. They are,  Public Cloud  Private Cloud  Community Cloud  Hybrid Cloud 18-01-2018
  • 31. • Public Cloud (End-User Level Cloud) - Owned and managed by a company than the one using it. - Third party administrator. - Eg : Savvis, Verizon, Amazon Web Services, and Rackspace. - The workload is categorized on the basis of service category, h/w customization is possible to provide optimized performance. - The process of computing becomes very flexible and scalable through customized h/w resources. - The primary concern with a public cloud include security and latency. 18-01-2018
  • 32. 18-01-2018 Public Cloud Cloud Services(IaaS/ PaaS/SaaS) Company X Company Y Company Z Fig : Level of Accessibility in a Public Cloud
  • 33. • Private Cloud (Enterprise Level Cloud) - Remains entirely in the ownership of the organization using it. - Infrastructure is solely designed for a single organization. - Can automate several processes and operations that require manual handling in a public cloud. - Can also provide firewall protection to the cloud, solving latency and security concerns. - A private cloud can be either on-premises or hosted externally. on premises : service is exclusively used and hosted by a single organization. hosted externally : used by a single organization and are not shared with other organizations. 18-01-2018
  • 34. 18-01-2018 Private Cloud Cloud services Fig: Level of Accessibility in a Private Cloud
  • 35. • Community Cloud -Type of cloud that is shared among various organizations with a common tie. -Managed by third party cloud services. -Available on or off premises. Eg. In any state, the community cloud can provided so that almost all govt. organizations of that state can share the resources available on the cloud. Because of the sharing of resources on community cloud, the data of all citizens of that state can be easily managed by the govt. organizations. 18-01-2018
  • 36. Organization having common tie to share resources 18-01-2018 Community Cloud for Level A Community Cloud for Level A Cloud Services Cloud Services Fig : Level of Accessibility in Community Clouds
  • 37. • Hybrid Cloud -various internal or external service providers offer services to many organizations. -In hybrid clouds, an organization can use both types of cloud, i.e. public and private together – situations such as cloud bursting. • Organization uses its own computing infrastructure, high load requirement, access clouds. The organization using the hybrid cloud can manage an internal private cloud for general use and migrate the entire or part of an application to the public cloud during the peak periods. 18-01-2018
  • 39. Thank You    18-01-2018

Editor's Notes

  1. -most effective & popular innovations have been in the fields of ds & pl cmptng. -Hadoop is an open source platform, used by data driven organizations to xtract max o/p from their normal data usage. -cloud cmptng – helps businesses to save cost and better manage their resources by allowing them to use resources as a service on the basis of spcfc reqrmnts and paying for only those services that have actually been used. -IMC helps u to organize and complete ur task faster by carrying out all the computational activities from the main memory.
  2. -We call such s/ms as parallel s/ms, in which multiple parallel computing resources are involved to carryout calculations simultaneously.
  3. Organizations use both ds & pl computing techniques to process Big Data. The most important constraint for business today is “Time”. 1) Breaking up the given task into smaller components. 2) Surveying the available resources at hand. 3) Assigning the subtasks to the nodes that are interconnected in a n/w.
  4. Latency can be defined as the aggregate delay in the s/m bcoz of delays in the completion of individual tasks. Such a delay automatically leads to the slowdown in s/m performance as a whole and is often termed as s/m delay.
  5. The figure elaborates the processing of a large dataset in a distributed computing environment. -The nodes are arranged within a system along with the elements that form the core of computing resources. (cpu, m/y, disk, etc). -These nodes are more beneficial for adding scalability to the Big Data environment.
  6. Load Balancing – The sharing of workload across various systems throughout the network to manage the load. Virtualization Feature - creates a virtual environment in which h/w platform, storage device and OS are included.
  7. MPP- Typically, the setup for MPP is more complicated. how to partition a common database among processors and how to assign work among the processors. An MPP system is also known as a "loosely coupled" or "shared nothing" system. -EMC Grennplum database is shared nothing MPP architecture, that has been designed for business intelligence and analytical processing. -Each server node act as self contained DBMS that owns and manages a distinct portion of the overall data. -The s/m automatically distributes data and parallelizes query workloads across all available h/w. -ParAccel –California based s/w company, it provided a dbms designed for advanced analytics for business intelligence.
  8. IMC- is the storage of info in the main RAM of dedicated servers rather than in complicated RDBs. - The drop in m/y prices in the present market is a major factor contributing to the IMC technology. -This has made IMC economical among a wide variety of aplctns.
  9. one of the open source platform designed to process Big Data. Earlier, ds environments were used to process high volume of data, multiple nodes in such an environment may not always co-operate with each other through a communication system, leaving a lot of scope for errors. Hadoop platform provides an improved programming model, which is used to create and run ds quickly and efficiently.
  10. Generates notations of a job divided into tasks. Implement MapReduce computing model. Considers every task as a map or reduce.
  11. -In larger cluster, HDFS is managed through a NameNode server to host the file system index. -2⁰ NN that keeps snap shot of the NNs and at the time of failure of NN the 2⁰NN replaces the 1⁰NN, thus preventing file system from getting corrupt and reducing data loss. 2⁰ NN takes snap shot of 1⁰ NN directory information after a regular interval of time, which is saved in local or remote directories. NN is the single point of storage and management of metadata.
  12. HDFS – is a cluster of storage solutions that is highly reliable, more efficient, and economical and provides facilities to manage files containing related data across machines. Hadoop MapReduce – is a computational framework used in hadoop to perform all the mathematical computations. It is based on a parallel and distribute implementation of MapReduce algthm that provide high performance.
  13. -The name node actively monitors the no : of replicas of a block. When a replica of a block is lost due to a DN failure or disk failure, the NN creates another replica of the block. Map reduce is a frame work that helps developers to write pgms to process large volumes of unstructured data parallel over distributed architecture. JobTracker – Master node that manages all jobs and resources in a cluster of commodity computers. TaskTracker – Agent deployed at each machine in the cluster to run the map and reduce task at the terminal. JobHistoryServer – Component that tracks completed jobs.
  14. Scalability – The increase in the amount of data being collected and analysed requires organizations to improve their h/w components processing ability. need to replace the existing h/w to improve data mngmnt and processing activities.
  15. -The resources are owned or hosted by the cloud service providers, and the services are sold to other companies.
  16. On premises - This format, also known as an “internal cloud,” is hosted within an organization’s own data center. It provides a more standardized process and protection, but is often limited in size and scalability. These are best used for applications that require complete control and configurability of the infrastructure and security. Hosted externally - This private cloud model is hosted by an external cloud computing provider. This format is recommended for organizations that prefer not to use a public cloud infrastructure due to the risks associated with the sharing of physical resources.