SlideShare a Scribd company logo
Presented By:
Nitesh Gupta
Nimish Kochhar
Presented by:
Nitesh Gupta
Nimish Kochhar
Acknowledgement
 We would like to express our most sincere gratitude and appreciation to
our respected teacher Mr.Vinay Arora Sir for his guidance, patience and
encouragement throughout the development of the presentation.
 Thank you Sir for being a constant source of inspiration throughout this
tedious process.
Table of Contents
1. Traditional Approach
2. The Beginning
3. What is Big Data
4. Characteristic of Big Data
5. Why Big Data
6. Big Data Analytics
7. Big Players
8. Hadoop as an Example
9. Components of Hadoop
10.References
The Beginning…
 Big data burst upon the scene in the first
decade of the 21st century.
 The first organizations to embrace it were
online and startup firms.
 Firms like Google, eBay, LinkedIn and
Facebook were built around big data
from the beginning.
 Big Data may well be the Next Big Thing
in the IT world.
 Like many new information
technologies, big data can bring about
dramatic cost reductions, substantial
improvements in the time required to
perform a computing task and other
service offerings.
Traditional Approach
 In this approach, an enterprise used to have a
computer to store and process big data.
 Here data was stored in an RDBMS like Oracle
Database, MS SQL Server or DB2 .
 Sophisticated softwares were written to interact with
the database, process the required data and present
it to the users.
 This approach works well where we have less volume
of data that can be accommodated by standard
database servers.
What is Big Data
 ‘Big Data’ is similar to ‘small data’, but bigger in size
 Big Data refers to technologies and initiatives that involve data that is too
diverse, fast-changing or massive for conventional technologies, skills and
infra- structure to address efficiently.
 Big Data generates value from the storage and processing of very large
quantities of digital information that cannot be analyzed with traditional
computing techniques.
Characteristics of Big Data(4 V’s)
Volume
 Big data implies enormous volumes of data.
 Big Data requires processing high volumes of
low-density data, that is, data of unknown
value, such as twitter data feeds, clicks on a
web page, network traffic, sensor-enabled
equipment capturing data at the speed of
light and many more.
 Today, Facebook ingests 500 terabytes of
new data every day.
 A Boeing 737 will generate 240 terabytes of
flight data during a single flight across the US.
 Every 2 days we create as much data as we
did from the beginning of time until 2003.
Velocity
 It refers to the speed at which new data
is generated and the speed at which
data moves around.
 Big data technology now allows us to
analyse the data while it is being
generated without ever putting it into
databases.
 Machine to machine processes
exchange data between billions of
devices.
 Infrastructure and sensors generate
massive log data in real-time.
 On-line gaming systems support millions
of concurrent users, each producing
multiple inputs per second.
Variety
 It refers to the many sources and types of data both structured and
unstructured.
 Traditional database systems were designed to address smaller volumes of
structured data, fewer updates or a predictable, consistent data structure.
 Now data comes in the form of emails, photos, videos, monitoring devices,
PDFs, audio, etc. This variety of unstructured data creates problems for
storage, mining and analyzing data.
 The real world have data in many different formats and that is the
challenge we need to overcome with the Big Data.
Veracity
 Veracity refers to the messiness or trustworthiness of the data.
 With many forms of big data, quality and accuracy are less controllable,
for example Twitter posts with hashtags, abbreviations, typos and colloquial
speech.
 Big data and analytics technology now allows us to work with these types
of data. The volumes often make up for the lack of quality or accuracy.
Sources of Big Data
Today organizations are utilizing, sharing and storing
more information in varying formats including:
 E-mail and Instant Messaging
 Social media channels
 Video and audio files
This unstructured data adds up to as much as 85% of the
information that businesses store.
The ability to extract high value from this data to enable
innovation and competitive gain is the purpose of Big
Data analytics.
Big Data Analytics
 Big data is really critical to our life and its emerging as
one of the most important technologies in modern
world.
 Using the information kept in the social networking sites
like Facebook, the marketing agencies are learning
about the response for their campaigns, promotions and
other advertising mediums.
 Analyzing the data like preferences and product
perception of their consumers, product companies and
retail organizations are planning their production.
 Using the data regarding the previous medical history of
patients, hospitals are providing better and quick
service.
Big Players
Hadoop
 Hadoop is an open-source framework that allows to store and
process big data in a distributed environment across clusters of
computers using simple programming models.
 It is designed to scale up from single servers to thousands of
machines, each offering local computation and storage.
 Doug Cutting took the solution provided by Google and started an
Open Source Project called HADOOP in 2005
 Operates on unstructured and structured data.
 A large and active ecosystem.
 Open source under the Apache License.
Hadoop Distributed File System
 Data is organized into files and directories
 Files are divided into blocks,distributed across nodes.
 Blocks replicated to handle failure
 Reliable,redundant,distributed file system optimized for large files
MapReduce
 The MapReduce framework consists of a single JobTracker and several
TaskTrackers in a cluster.
 The JobTracker is responsible for resource management, tracking resource
consumption/availability and scheduling the job component tasks onto the
data nodes.
 The TaskTracker execute the tasks as directed by the JobTracker and
provide task-status information periodically.
References
 https://www.mongodb.com/big-data-explained
 https://en.wikipedia.org/wiki/Big_data
 www.tutorialspoint.com/hadoop/hadoop_big_data_overview.ht
m
Books-
 Big Data: A Revolution by Viktor Mayer-Schonberger
 Hadoop: The Definitive Guide by Tom White
Big data

More Related Content

What's hot

Latest Update Bigdata in indonesia
Latest Update Bigdata in indonesiaLatest Update Bigdata in indonesia
Latest Update Bigdata in indonesia
Heru Sutadi
 
Big data
Big dataBig data
Big datahsn99
 
Big Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesBig Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation Slides
SlideTeam
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
Prof .Pragati Khade
 
Big data
Big dataBig data
Big data
Pooja Shah
 
Big data ppt
Big data pptBig data ppt
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
Sandip Tipayle Patil
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
Shahbaz Anjam
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data Scientists
Way-Yen Lin
 
10 Most Effective Big Data Technologies
10 Most Effective Big Data Technologies10 Most Effective Big Data Technologies
10 Most Effective Big Data Technologies
Mahindra Comviva
 
Motivation for big data
Motivation for big dataMotivation for big data
Motivation for big data
Arockiaraj Durairaj
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
Kathirvel Ayyaswamy
 
Chapter 4 what is data and data types
Chapter 4  what is data and data typesChapter 4  what is data and data types
Chapter 4 what is data and data types
Pro Guide
 
Big data
Big dataBig data
Big data
SaraRao3
 
Big Data Presentation
Big  Data PresentationBig  Data Presentation
Big Data Presentation
Ritika Barethia
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
Md. Salman Ahmed
 
Big data
Big dataBig data
Big data
madhavsolanki
 
Big Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning GuruBig Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning Guru
KCC Software Ltd. & Easylearning.guru
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
Poonam Kshirsagar
 
5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance
Qubole
 

What's hot (20)

Latest Update Bigdata in indonesia
Latest Update Bigdata in indonesiaLatest Update Bigdata in indonesia
Latest Update Bigdata in indonesia
 
Big data
Big dataBig data
Big data
 
Big Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesBig Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation Slides
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
 
Big data
Big dataBig data
Big data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data Scientists
 
10 Most Effective Big Data Technologies
10 Most Effective Big Data Technologies10 Most Effective Big Data Technologies
10 Most Effective Big Data Technologies
 
Motivation for big data
Motivation for big dataMotivation for big data
Motivation for big data
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
Chapter 4 what is data and data types
Chapter 4  what is data and data typesChapter 4  what is data and data types
Chapter 4 what is data and data types
 
Big data
Big dataBig data
Big data
 
Big Data Presentation
Big  Data PresentationBig  Data Presentation
Big Data Presentation
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big data
Big dataBig data
Big data
 
Big Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning GuruBig Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning Guru
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance
 

Similar to Big data

An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data Analytics
Audrey Britton
 
In memory big data management and processing
In memory big data management and processingIn memory big data management and processing
In memory big data management and processing
Pranav Gontalwar
 
ANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEWANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEW
International Journal of Technical Research & Application
 
UNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdfUNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdf
vvpadhu
 
Big Data-Survey
Big Data-SurveyBig Data-Survey
Big Data-Survey
ijeei-iaes
 
Unit No2 Introduction to big data.pdf
Unit No2 Introduction to big data.pdfUnit No2 Introduction to big data.pdf
Unit No2 Introduction to big data.pdf
Ranjeet Bhalshankar
 
Big data seminor
Big data seminorBig data seminor
Big data seminor
berasrujana
 
big data.pptx
big data.pptxbig data.pptx
big data.pptx
ParasSundriyal2
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
Neelam Rawat
 
Big data Analytics
Big data Analytics Big data Analytics
Big data Analytics
Guduru Lakshmi Kiranmai
 
Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion
ahmed alshikh
 
big-data-notes1.ppt
big-data-notes1.pptbig-data-notes1.ppt
big-data-notes1.ppt
SutanuGhosal1
 
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewBig Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –Review
IJERA Editor
 
Big Data
Big DataBig Data
Big Data
Kirubaburi R
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
Rajesh Kumar
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
Mohamed Magdy
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
Nitesh Ghosh
 
Big data's impact on online marketing
Big data's impact on online marketingBig data's impact on online marketing
Big data's impact on online marketing
Pros Global Inc
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
VaishnavGhadge1
 

Similar to Big data (20)

An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data Analytics
 
In memory big data management and processing
In memory big data management and processingIn memory big data management and processing
In memory big data management and processing
 
ANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEWANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEW
 
UNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdfUNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdf
 
1
11
1
 
Big Data-Survey
Big Data-SurveyBig Data-Survey
Big Data-Survey
 
Unit No2 Introduction to big data.pdf
Unit No2 Introduction to big data.pdfUnit No2 Introduction to big data.pdf
Unit No2 Introduction to big data.pdf
 
Big data seminor
Big data seminorBig data seminor
Big data seminor
 
big data.pptx
big data.pptxbig data.pptx
big data.pptx
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
 
Big data Analytics
Big data Analytics Big data Analytics
Big data Analytics
 
Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion
 
big-data-notes1.ppt
big-data-notes1.pptbig-data-notes1.ppt
big-data-notes1.ppt
 
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewBig Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –Review
 
Big Data
Big DataBig Data
Big Data
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
Big data's impact on online marketing
Big data's impact on online marketingBig data's impact on online marketing
Big data's impact on online marketing
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
 

Recently uploaded

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 

Recently uploaded (20)

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 

Big data

  • 1. Presented By: Nitesh Gupta Nimish Kochhar Presented by: Nitesh Gupta Nimish Kochhar
  • 2. Acknowledgement  We would like to express our most sincere gratitude and appreciation to our respected teacher Mr.Vinay Arora Sir for his guidance, patience and encouragement throughout the development of the presentation.  Thank you Sir for being a constant source of inspiration throughout this tedious process.
  • 3. Table of Contents 1. Traditional Approach 2. The Beginning 3. What is Big Data 4. Characteristic of Big Data 5. Why Big Data 6. Big Data Analytics 7. Big Players 8. Hadoop as an Example 9. Components of Hadoop 10.References
  • 4. The Beginning…  Big data burst upon the scene in the first decade of the 21st century.  The first organizations to embrace it were online and startup firms.  Firms like Google, eBay, LinkedIn and Facebook were built around big data from the beginning.  Big Data may well be the Next Big Thing in the IT world.  Like many new information technologies, big data can bring about dramatic cost reductions, substantial improvements in the time required to perform a computing task and other service offerings.
  • 5. Traditional Approach  In this approach, an enterprise used to have a computer to store and process big data.  Here data was stored in an RDBMS like Oracle Database, MS SQL Server or DB2 .  Sophisticated softwares were written to interact with the database, process the required data and present it to the users.  This approach works well where we have less volume of data that can be accommodated by standard database servers.
  • 6. What is Big Data  ‘Big Data’ is similar to ‘small data’, but bigger in size  Big Data refers to technologies and initiatives that involve data that is too diverse, fast-changing or massive for conventional technologies, skills and infra- structure to address efficiently.  Big Data generates value from the storage and processing of very large quantities of digital information that cannot be analyzed with traditional computing techniques.
  • 7. Characteristics of Big Data(4 V’s)
  • 8. Volume  Big data implies enormous volumes of data.  Big Data requires processing high volumes of low-density data, that is, data of unknown value, such as twitter data feeds, clicks on a web page, network traffic, sensor-enabled equipment capturing data at the speed of light and many more.  Today, Facebook ingests 500 terabytes of new data every day.  A Boeing 737 will generate 240 terabytes of flight data during a single flight across the US.  Every 2 days we create as much data as we did from the beginning of time until 2003.
  • 9. Velocity  It refers to the speed at which new data is generated and the speed at which data moves around.  Big data technology now allows us to analyse the data while it is being generated without ever putting it into databases.  Machine to machine processes exchange data between billions of devices.  Infrastructure and sensors generate massive log data in real-time.  On-line gaming systems support millions of concurrent users, each producing multiple inputs per second.
  • 10. Variety  It refers to the many sources and types of data both structured and unstructured.  Traditional database systems were designed to address smaller volumes of structured data, fewer updates or a predictable, consistent data structure.  Now data comes in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. This variety of unstructured data creates problems for storage, mining and analyzing data.  The real world have data in many different formats and that is the challenge we need to overcome with the Big Data.
  • 11. Veracity  Veracity refers to the messiness or trustworthiness of the data.  With many forms of big data, quality and accuracy are less controllable, for example Twitter posts with hashtags, abbreviations, typos and colloquial speech.  Big data and analytics technology now allows us to work with these types of data. The volumes often make up for the lack of quality or accuracy.
  • 12. Sources of Big Data Today organizations are utilizing, sharing and storing more information in varying formats including:  E-mail and Instant Messaging  Social media channels  Video and audio files This unstructured data adds up to as much as 85% of the information that businesses store. The ability to extract high value from this data to enable innovation and competitive gain is the purpose of Big Data analytics.
  • 13. Big Data Analytics  Big data is really critical to our life and its emerging as one of the most important technologies in modern world.  Using the information kept in the social networking sites like Facebook, the marketing agencies are learning about the response for their campaigns, promotions and other advertising mediums.  Analyzing the data like preferences and product perception of their consumers, product companies and retail organizations are planning their production.  Using the data regarding the previous medical history of patients, hospitals are providing better and quick service.
  • 15. Hadoop  Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.  It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.  Doug Cutting took the solution provided by Google and started an Open Source Project called HADOOP in 2005  Operates on unstructured and structured data.  A large and active ecosystem.  Open source under the Apache License.
  • 16.
  • 17. Hadoop Distributed File System  Data is organized into files and directories  Files are divided into blocks,distributed across nodes.  Blocks replicated to handle failure  Reliable,redundant,distributed file system optimized for large files
  • 18. MapReduce  The MapReduce framework consists of a single JobTracker and several TaskTrackers in a cluster.  The JobTracker is responsible for resource management, tracking resource consumption/availability and scheduling the job component tasks onto the data nodes.  The TaskTracker execute the tasks as directed by the JobTracker and provide task-status information periodically.
  • 19. References  https://www.mongodb.com/big-data-explained  https://en.wikipedia.org/wiki/Big_data  www.tutorialspoint.com/hadoop/hadoop_big_data_overview.ht m Books-  Big Data: A Revolution by Viktor Mayer-Schonberger  Hadoop: The Definitive Guide by Tom White