Big data and hadoop overview

•Download as PPTX, PDF•

1 like•64 views

Obinna Ekeh

An Introduction to Big data and hadoop related technologies

Data & Analytics

Big Data and Hadoop Ecosystem Overview
Presented by
Obinna C Ekeh

Characteristics of Big Data – The four V’s of Big Data

Breakthrough Enabling Technologies
1)Hadoop
2)Spark
• Apache Hadoop is an open source software framework for storage and large
scale processing of data-sets on clusters of commodity hardware
-MapReduce : MapReduce is a software framework/model for processing
large datasets that are in commodity hardware that form a cluster (Hadoop uses
MapReduce to process data)

Why is Big Data so important?
Enables businesses /organizations to derive new and better insights
It’s the backbone to these emerging technologies that are radically
changing our world
Big Data is the Lifeblood of the 4th Industrial Revolution
Artificial
Intelligence
Machine
Learning
Deep learning

References
Slides on Hadoop ecosystem overview
Big Data Modelling and Management Systems Course on Coursera
https://www.coursera.org/learn/big-data-management
Source of Mayors Speech
Mayor Andrew Ginther's 2017 State of The City Speech
https://www.columbusunderground.com/full-text-of-the-2017-state-of-the-city-
address

What's hot

Introduction to Big Data & Hadoop iACT Global

Introduction to Big Data & Big Data 1.0 SystemPetr Novotný

Getting Digital Preservation Data Out Of WikidataKenneth Seals-Nutt

Greencomputing1 120424132051-phpapp01Prachi Agrawal

Skillshare - Let's talk about R in Data JournalismSchool of Data

Tuesday 9.15 john frey innovation @ hpSustainable Brands

#4 FAIR - Keith Russell ARDC

RICE HUSK: A POTENTIAL ENERGY RESOURCEsajjalp

HPC Top 5 Stories: August 25, 2017NVIDIA

AbstractKinnudj Amee

Akhil's hadoopAkhil Prem

#4 FAIR - Provenance as an element of FAIR data principles - 20-09-17ARDC

Assignment 1Sarah Ramdhany

International Journal of Data mining Management Systems (IJDMS)ijfcst journal

Big DataNikk Smit

Ijdaijfcst journal

Transformative potential of big data and technologyInternational Food Policy Research Institute (IFPRI)

What's hot (19)

Introduction to Big Data & Hadoop

Introduction to Big Data & Big Data 1.0 System

Getting Digital Preservation Data Out Of Wikidata

Greencomputing1 120424132051-phpapp01

Skillshare - Let's talk about R in Data Journalism

Tuesday 9.15 john frey innovation @ hp

#4 FAIR - Keith Russell

RICE HUSK: A POTENTIAL ENERGY RESOURCE

HPC Top 5 Stories: August 25, 2017

Abstract

Akhil's hadoop

#4 FAIR - Provenance as an element of FAIR data principles - 20-09-17

Assignment 1

International Journal of Data mining Management Systems (IJDMS)

Big Data

Ijda

Transformative potential of big data and technology

Similar to Big data and hadoop overview

Big data Presentationhimanshu arora

Big data: Descoberta de conhecimento em ambientes de big data e computação na...Rio Info

A Glimpse of Bigdata - Introductionsaisreealekhya

How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri

Moving Toward Big Data: Challenges, Trends and PerspectivesIJRESJOURNAL

Introduction-to-Big-Data-and-Hadoop.pptxPratimakumari213460

View on big data technologiesKrisshhna Daasaarii

Memory Management in BigData: A Perpective Viewijtsrd

Big DataKirubaburi R

Hadoop data-lake-white-paperSupratim Ray

re:Introduce Big Data and Hadoop Eco-system.Shakir Ali

Big Data AnalyticsSreedhar Chowdam

Big Data & Data MiningMd Mizanur Rahman

Bridging the Big Data Gap in the Software-Driven WorldCA Technologies

Hadoop.powerpoint.pptxsonukumar379092

Big dataPietro Nardone

Why Hadoop is Useful?Rishish M. Bhatnagar

Easylearning Guru online Hadoop class KCC Software Ltd. & Easylearning.guru

Oh! Session on Introduction to BIG DataPrakalp Agarwal

Similar to Big data and hadoop overview (20)

Big data Presentation

Big data: Descoberta de conhecimento em ambientes de big data e computação na...

A Glimpse of Bigdata - Introduction

How Big Data ,Cloud Computing ,Data Science can help business

Moving Toward Big Data: Challenges, Trends and Perspectives

Introduction-to-Big-Data-and-Hadoop.pptx

View on big data technologies

Memory Management in BigData: A Perpective View

Big Data

Hadoop data-lake-white-paper

re:Introduce Big Data and Hadoop Eco-system.

Big Data Analytics

Big Data & Data Mining

Bridging the Big Data Gap in the Software-Driven World

Hadoop.powerpoint.pptx

Big data

Why Hadoop is Useful?

Easylearning Guru online Hadoop class

Oh! Session on Introduction to BIG Data

Recently uploaded

Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna

PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava

Decoding Loan Approval: Predictive Modeling in ActionBoston Institute of Analytics

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh

Call Girls in Saket 99530🔝 56974 Escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa

RadioAdProWritingCinderellabyButleri.pdfgstagge

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster

Industrialised data - the key to AI success.pdfLars Albertsson

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408

GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss

Recently uploaded (20)

Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...

PKS-TGC-1084-630 - Stage 1 Proposal.pptx

Decoding Loan Approval: Predictive Modeling in Action

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...

Call Girls in Saket 99530🔝 56974 Escort Service

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf

RadioAdProWritingCinderellabyButleri.pdf

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024

Industrialised data - the key to AI success.pdf

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps

GA4 Without Cookies [Measure Camp AMS]

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一

Big data and hadoop overview

1. Big Data and Hadoop Ecosystem Overview Presented by Obinna C Ekeh

2. What is Big Data ?

3. Sources of Big Data

4. Characteristics of Big Data – The four V’s of Big Data

5. Big Data Types

6. Data Growth Rate

7. Breakthrough Enabling Technologies 1)Hadoop 2)Spark • Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware -MapReduce : MapReduce is a software framework/model for processing large datasets that are in commodity hardware that form a cluster (Hadoop uses MapReduce to process data)

8. Breakthrough Enabling Technologies

9. Why is Big Data so important? Enables businesses /organizations to derive new and better insights It’s the backbone to these emerging technologies that are radically changing our world Big Data is the Lifeblood of the 4th Industrial Revolution Artificial Intelligence Machine Learning Deep learning

10. Hadoop ecosystem Overview

11. Hadoop ecosystem Overview

12. Hadoop ecosystem Overview

13. Hadoop ecosystem Overview

14. Hadoop ecosystem Overview

15. Hadoop ecosystem Overview

16. Hadoop ecosystem Overview

17. Hadoop ecosystem Overview

18. Hadoop ecosystem Overview

19.

20. References Slides on Hadoop ecosystem overview Big Data Modelling and Management Systems Course on Coursera https://www.coursera.org/learn/big-data-management Source of Mayors Speech Mayor Andrew Ginther's 2017 State of The City Speech https://www.columbusunderground.com/full-text-of-the-2017-state-of-the-city- address

Editor's Notes

Not a single definition for Big Data. In a nutshell it refers to data that is so big in scale and variety that it becomes impossible to store and process it using traditional RDBMS technologies
Big data sources: Machines, people and organizations
Big Data characteristics: volume, velocity, variety and veracity
Structured data: basically from csv, RDBMS etc. Unstructured data: videos, audios, pdf, email messages etc (Data has no underlying model/structure). Semi-structured data: xml, json
Hadoop is an open-source eco-system of software tools used to store and process big data. It was conceived or started by Doug cutting while he worked in Yahoo. The idea was gotten from a paper published by Google on its “Big Table Project”. Its currently an open-source project of Apache and has numerous contributors with new software included from time to time
Spark is also for processing data in a cluster. Its 100x more faster than MapReduce and supports a concept called “in-memory computation” which very important for the field machine learning. Spark has an SQL interface called Spark-SQL and it provides an interface for 3 programming languages (java, python and scala)
Autonomous trucks are already been used in the outbacks of Australia, its not a matter of if it will occur, its just a matter of time. A lot of white collar jobs today will be non-existent in 15yrs, in scenarios were they do exist, the number of professionals needed per task will be significantly fewer.
HDFS is basically the filesystem of Hadoop.
Manages resources in the eco-system (CPU cores, RAM, Storage), basically allocates resources to jobs performed on the platform.
MapReduce model operates basically by processing data in two stages: the “Map’ stage and the “Reduce”. These stages generate outputs based on the code you write into them
It’s a high level scripting language used for ETL
Used to analyze data from social networks
Spark is also for processing data in a cluster. Its 100x more faster than MapReduce and supports a concept called “in-memory computation” which very important for the field machine learning. Spark has an SQL interface called Spark-SQL and it provides an interface for 3 programming languages (java, python and scala)
Basically ensures all the software's in the eco-system are in-sync ie are working harmoniously
Sir Arthur C. Clarke, he was a British science fiction writer.

Big data and hadoop overview

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Big data and hadoop overview

Similar to Big data and hadoop overview (20)

Recently uploaded

Recently uploaded (20)

Big data and hadoop overview

Editor's Notes