SlideShare a Scribd company logo
Big data
PREPARE BY:
AHMED ALTAYEB SHEIKH EDREES
FAWAZ AWAD YAHIA ABDELGADIR
Outlines
1. INTRODUCTION
2. WHAT IS BIG DATA
3. BIG DATA GENERATORS
4. CHARACTERISTIC OF BIG DATA
5. BENEFIT OF BIG DATA
6. HADOOP
 HDFS
 Map Reduce
7. BI VS BIG DATA
Introduction
What is Big data
Is very large data sets that may be analyzed computationally to reveal patterns,
trends, and associations, especially relating to Customers behavior and
interactions.
Big Data in general is defined as high volume, velocity and variety information
assets that demand cost-effective, innovative forms of information processing
for enhanced insight and decision making.”
A technology term about Data that becomes too large to be managed in a
manner that is previously known to work normally.
Big Data generators
This data comes from everywhere:
sensors used to gather climate information,
posts to social media sites,
digital pictures
online Shopping
Airlines
purchase transaction records, and many more…
This data is “ big data.”
Characteristic
“Big data is the data characterized by 3 attributes: volume, velocity and variety .”
Volume
It is the size of the data which determines the value and potential of the data under
consideration. The name ‘Big Data’ itself contains a term which is related to size and
hence the characteristic.
Variety
Data today comes in all types of formats. Structured, numeric data in traditional
databases. Unstructured text documents, email, stock ticker data and financial
transactions and semi-structured data too.
Velocity
speed of generation of data or how fast the data is generated and processed to meet the
demands and the challenges which lie ahead in the path of growth and development.
FB generates 100TB daily
Twitter generates 8TB of data Daily
Benefit of Big data
Cost Reduction from Big Data Technologies
Time Reduction from Big Data
Developing New Big Data-Based Offerings
Supporting Internal Business Decisions
Real-time big data isn’t just a process for storing petabytes or Exabyte's of data in
a data warehouse, It’s about the ability to make better decisions and take
meaningful actions at the right time.
What is Hadoop
Flexible and available architecture for large scale computation and data
performance on a network of commodity hardware
Framework that allows for distributed processing of large data sets across clusters
of commodity servers
– Store large amount of data
– Process the large amount of data stored
Why Hadoop ?
open source,
highly reliable,
distributed data processing platform
Handles large amounts of data
Stores data in native format
Delivers linear scalability at low cost
Resilient in case of infrastructure failures
Transparent application scalability
HDFS Hadoop Distributed File System
HDFS enables Hadoop to store huge files. It’s a scalable file system
that distributes and stores data across all machines in a Hadoop cluster.
Scale-Out Architecture - Add servers to increase capacity
High Availability - Serve mission-critical workflows and applications
Fault Tolerance - Automatically and seamlessly recover from failures
Load Balancing - Place data intelligently for maximum efficiency and
utilization
Tunable Replication - Multiple copies of each file provide data protection and
computational performance
Namenode and datanode
DataNode- There is a piece of software running on each of these nodes of the cluster called Datanode
which
runs on slave nodes which make up the majority of the machines of a cluster. The name node
places the data into these data nodes.
NameNode- It Runs on a master node that tracks and directs the storage of the cluster.
Also we know that the nodes or blocks which make up the original 150 MB file and
that is handled by a separate machine is the Namenode. Information stored here is
called as metadata.
MapReduce
MapReduce is a programming model for processing large data sets with a parallel,
distributed algorithm on a cluster
Scale-out Architecture - Add servers to increase processing power
Security & Authentication - Works with HDFS security to make sure that only
approved users can operate against the data in the system
Resource Manager - Employs data locality and server resources to determine optimal
computing operations
Optimized Scheduling - Completes jobs according to prioritization
Flexibility - Procedures can be written in virtually any programming language
Resiliency & High Availability - Multiple job and task trackers ensure that jobs fail
independently and restart automatically
BI VS Big Data
 big data and hadoop
 big data and hadoop

More Related Content

What's hot

Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshers
rajkamaltibacademy
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
Chirag Ahuja
 
Big data presentation
Big data presentationBig data presentation
Big data presentation
SreeSowmya7
 
Big Data
Big DataBig Data
Big Data
Neha Mehta
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Tyrone Systems
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
Adam Doyle
 
Big Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning GuruBig Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning Guru
KCC Software Ltd. & Easylearning.guru
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
Lucian Neghina
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
Spotle.ai
 
Big data Ppt
Big data PptBig data Ppt
Big data Ppt
Prashant Navatre
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
Prof .Pragati Khade
 
Big data management
Big data managementBig data management
Big data management
zeba khanam
 
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsCore concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsKaniska Mandal
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
ITJobZone.biz
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentationAmrut Patil
 
Big data
Big dataBig data
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
Suman Saurabh
 

What's hot (20)

Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshers
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
Big data presentation
Big data presentationBig data presentation
Big data presentation
 
Big data
Big dataBig data
Big data
 
Big Data
Big DataBig Data
Big Data
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
Big Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning GuruBig Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning Guru
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
Big data Ppt
Big data PptBig data Ppt
Big data Ppt
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
 
Big data management
Big data managementBig data management
Big data management
 
Big data hadoop
Big data hadoopBig data hadoop
Big data hadoop
 
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsCore concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data Analytics
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 
Big data
Big dataBig data
Big data
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 

Viewers also liked

Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
Beyond Big Data: Harnessing the Industrial Internet for Wind Power
Beyond Big Data: Harnessing the Industrial Internet for Wind PowerBeyond Big Data: Harnessing the Industrial Internet for Wind Power
Beyond Big Data: Harnessing the Industrial Internet for Wind PowerGE_India
 
5 Big Data Use Cases for 2013
5 Big Data Use Cases for 20135 Big Data Use Cases for 2013
5 Big Data Use Cases for 2013
Infochimps, a CSC Big Data Business
 
Big Data Analytics for the Industrial Internet of Things
Big Data Analytics for the Industrial Internet of ThingsBig Data Analytics for the Industrial Internet of Things
Big Data Analytics for the Industrial Internet of Things
Anthony Chen
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
Bernard Marr
 

Viewers also liked (6)

Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 
Beyond Big Data: Harnessing the Industrial Internet for Wind Power
Beyond Big Data: Harnessing the Industrial Internet for Wind PowerBeyond Big Data: Harnessing the Industrial Internet for Wind Power
Beyond Big Data: Harnessing the Industrial Internet for Wind Power
 
GE Predix - The IIoT Platform
GE Predix - The IIoT PlatformGE Predix - The IIoT Platform
GE Predix - The IIoT Platform
 
5 Big Data Use Cases for 2013
5 Big Data Use Cases for 20135 Big Data Use Cases for 2013
5 Big Data Use Cases for 2013
 
Big Data Analytics for the Industrial Internet of Things
Big Data Analytics for the Industrial Internet of ThingsBig Data Analytics for the Industrial Internet of Things
Big Data Analytics for the Industrial Internet of Things
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 

Similar to big data and hadoop

Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion
ahmed alshikh
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
almaraniabwmalk
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
Sysfore Technologies
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
Subhas Kumar Ghosh
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
Shivanee garg
 
Big data
Big dataBig data
Big data
revathireddyb
 
Big data
Big dataBig data
Big data
revathireddyb
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
himanshu arora
 
Big Data
Big DataBig Data
Big Data
Kirubaburi R
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
Md. Hasan Basri (Angel)
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
IT Strategy Group
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
ijtsrd
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Sri Kanth
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
Nitesh Ghosh
 
Bigdata overview
Bigdata overviewBigdata overview
Bigdata overview
AllsoftSolutions
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
MaulikLakhani
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
IRJET Journal
 

Similar to big data and hadoop (20)

Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Big Data
Big DataBig Data
Big Data
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
paper
paperpaper
paper
 
Hadoop
HadoopHadoop
Hadoop
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 
IJARCCE_49
IJARCCE_49IJARCCE_49
IJARCCE_49
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
Bigdata overview
Bigdata overviewBigdata overview
Bigdata overview
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
 

Recently uploaded

"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 

Recently uploaded (20)

"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 

big data and hadoop

  • 1. Big data PREPARE BY: AHMED ALTAYEB SHEIKH EDREES FAWAZ AWAD YAHIA ABDELGADIR
  • 2. Outlines 1. INTRODUCTION 2. WHAT IS BIG DATA 3. BIG DATA GENERATORS 4. CHARACTERISTIC OF BIG DATA 5. BENEFIT OF BIG DATA 6. HADOOP  HDFS  Map Reduce 7. BI VS BIG DATA
  • 4.
  • 5. What is Big data Is very large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to Customers behavior and interactions. Big Data in general is defined as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.” A technology term about Data that becomes too large to be managed in a manner that is previously known to work normally.
  • 6. Big Data generators This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures online Shopping Airlines purchase transaction records, and many more… This data is “ big data.”
  • 7. Characteristic “Big data is the data characterized by 3 attributes: volume, velocity and variety .”
  • 8. Volume It is the size of the data which determines the value and potential of the data under consideration. The name ‘Big Data’ itself contains a term which is related to size and hence the characteristic.
  • 9. Variety Data today comes in all types of formats. Structured, numeric data in traditional databases. Unstructured text documents, email, stock ticker data and financial transactions and semi-structured data too.
  • 10. Velocity speed of generation of data or how fast the data is generated and processed to meet the demands and the challenges which lie ahead in the path of growth and development.
  • 11. FB generates 100TB daily Twitter generates 8TB of data Daily
  • 12. Benefit of Big data Cost Reduction from Big Data Technologies Time Reduction from Big Data Developing New Big Data-Based Offerings Supporting Internal Business Decisions Real-time big data isn’t just a process for storing petabytes or Exabyte's of data in a data warehouse, It’s about the ability to make better decisions and take meaningful actions at the right time.
  • 13. What is Hadoop Flexible and available architecture for large scale computation and data performance on a network of commodity hardware Framework that allows for distributed processing of large data sets across clusters of commodity servers – Store large amount of data – Process the large amount of data stored
  • 14. Why Hadoop ? open source, highly reliable, distributed data processing platform Handles large amounts of data Stores data in native format Delivers linear scalability at low cost Resilient in case of infrastructure failures Transparent application scalability
  • 15. HDFS Hadoop Distributed File System HDFS enables Hadoop to store huge files. It’s a scalable file system that distributes and stores data across all machines in a Hadoop cluster. Scale-Out Architecture - Add servers to increase capacity High Availability - Serve mission-critical workflows and applications Fault Tolerance - Automatically and seamlessly recover from failures Load Balancing - Place data intelligently for maximum efficiency and utilization Tunable Replication - Multiple copies of each file provide data protection and computational performance
  • 16. Namenode and datanode DataNode- There is a piece of software running on each of these nodes of the cluster called Datanode which runs on slave nodes which make up the majority of the machines of a cluster. The name node places the data into these data nodes. NameNode- It Runs on a master node that tracks and directs the storage of the cluster. Also we know that the nodes or blocks which make up the original 150 MB file and that is handled by a separate machine is the Namenode. Information stored here is called as metadata.
  • 17.
  • 18.
  • 19. MapReduce MapReduce is a programming model for processing large data sets with a parallel, distributed algorithm on a cluster Scale-out Architecture - Add servers to increase processing power Security & Authentication - Works with HDFS security to make sure that only approved users can operate against the data in the system Resource Manager - Employs data locality and server resources to determine optimal computing operations Optimized Scheduling - Completes jobs according to prioritization Flexibility - Procedures can be written in virtually any programming language Resiliency & High Availability - Multiple job and task trackers ensure that jobs fail independently and restart automatically
  • 20.
  • 21. BI VS Big Data