BDA ( haoop ).pptx

•Download as PPTX, PDF•

0 likes•1 view

MangeshShukla3

Hadoop.

Technology

What is Bigdata?
Lots of Data (Terabytes or PetaBytes of data)
Bigdata is term for collection of datasets so large and complex that it becomes
difficult to process using on hand database tools or Traditional data
processing applications.
The challenges include capture, curation, storage, search, sharing, transfer,
analysis and visualization.

 Enterprises like IRCTC / Aadhar / Banks / Stock Market etc generates
huge amount of data from Terabytes to Petabytes of information
Where is Lots of Data ?
(Terabytes or PetaBytes of data)
Types of Lots of Data ?
Data Types Examples
Structured Data Data from enterprise (ERP, CRM)
Semi Structured Data xml, json, csv, log files
Unstructured Data / Documents audio, video, image, archive documents

Bigdata Scenarios
Web and e-tailing
Recommendation Engines
Ad Targeting
Search Quality
Abuse and Click Fraud Detection
Telecommunication
Customer Churn Analysis and Prevention
Network Performance Optimization
Call Data Record (CDR) Analysis
Analysing Network to Predict Failure
Government
Fraud Detection and Cyber Security
Welfare Schemes
Justice
Telecommunication
Health Information Exchange
Gene Sequencing
Serialization
Healthcare service Quality Improvements
Drug Safety

Why Big Data with Hadoop?
 Hadoop was Designed to answer the question “How to process big data with
reasonable cost and time”.
 Apache top level project, Open source implementation of frameworks for
reliable, scalable distributed computing and storage.
 It is a flexible and highly-available architecture for large scale computation and
data processing on a network of commodity hardware.

Some examples
•Yahoo!
More than 100,000 CPUs in ~20,000 computers running Hadoop; biggest cluster: 2000 nodes
(2*4cpu boxes with 4TB disk each); used to support research for Ad Systems and Web Search
•AOL
Used for a variety of things ranging from statistics generation to running advanced algorithms for
doing behavioral analysis and targeting; cluster size is 50 machines, Intel Xeon, dual processors,
dual core, each with 16GB Ram and 800 GB hard-disk giving us a total of 37 TB HDFS capacity.
•Facebook
To store copies of internal log and dimension data sources and use it as a source for
reporting/analytics and machine learning; 320 machine cluster with 2,560 cores and about 1.3 PB
raw storage;

Similar to BDA ( haoop ).pptx

The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu

Big data and computing gridThang Nguyen

Big Data Session 1.pptxElsonPaul2

Introduction to Harnessing Big DataPaul Barsch

Exploring the Wider World of Big DataNetApp

A Big Data ConceptDharmesh Tank

Big Data Analytics with HadoopPhilippe Julio

HadoopMayuri Gupta

A Gentle Introduction to Big DataMehmet Ali Akyol

big data and hadoopahmed alshikh

Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group

Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin

the Data World DistilledRTTS

Big data data lake and beyond Rajesh Kumar

Présentation on radoop siliconsudipt

BigdataShankar R

Big data analysis concepts and referencesInformation Security Awareness Group

Data analytics & its TrendsDr.K.Sreenivas Rao

Big Data Practice_Planning_steps_RKRajesh Jayarman

Big data peresintaion ahmed alshikh

Similar to BDA ( haoop ).pptx (20)

The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios

Big data and computing grid

Big Data Session 1.pptx

Introduction to Harnessing Big Data

Exploring the Wider World of Big Data

A Big Data Concept

Big Data Analytics with Hadoop

Hadoop

A Gentle Introduction to Big Data

big data and hadoop

Vikram Andem Big Data Strategy @ IATA Technology Roadmap

Big Data: Its Characteristics And Architecture Capabilities

the Data World Distilled

Big data data lake and beyond

Présentation on radoop

Bigdata

Big data analysis concepts and references

Data analytics & its Trends

Big Data Practice_Planning_steps_RK

Big data peresintaion

Recently uploaded

AI as an Interface for Commercial BuildingsMemoori

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

How to convert PDF to text with Nanonetsnaman860154

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Pigging Solutions in Pet Food ManufacturingPigging Solutions

Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Install Stable Diffusion in windows machinePadma Pradeep

Recently uploaded (20)

AI as an Interface for Commercial Buildings

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi

08448380779 Call Girls In Friends Colony Women Seeking Men

08448380779 Call Girls In Civil Lines Women Seeking Men

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

How to convert PDF to text with Nanonets

Human Factors of XR: Using Human Factors to Design XR Systems

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Understanding the Laravel MVC Architecture

Pigging Solutions in Pet Food Manufacturing

Benefits Of Flutter Compared To Other Frameworks

Scaling API-first – The story of a global engineering organization

Maximizing Board Effectiveness 2024 Webinar.pptx

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

IAC 2024 - IA Fast Track to Search Focused AI Solutions

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Install Stable Diffusion in windows machine

BDA ( haoop ).pptx

1. Name :- Mangesh Shukla

2. What is Bigdata? Lots of Data (Terabytes or PetaBytes of data) Bigdata is term for collection of datasets so large and complex that it becomes difficult to process using on hand database tools or Traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization.

3.  Enterprises like IRCTC / Aadhar / Banks / Stock Market etc generates huge amount of data from Terabytes to Petabytes of information Where is Lots of Data ? (Terabytes or PetaBytes of data) Types of Lots of Data ? Data Types Examples Structured Data Data from enterprise (ERP, CRM) Semi Structured Data xml, json, csv, log files Unstructured Data / Documents audio, video, image, archive documents

4. Bigdata Scenarios Web and e-tailing Recommendation Engines Ad Targeting Search Quality Abuse and Click Fraud Detection Telecommunication Customer Churn Analysis and Prevention Network Performance Optimization Call Data Record (CDR) Analysis Analysing Network to Predict Failure Government Fraud Detection and Cyber Security Welfare Schemes Justice Telecommunication Health Information Exchange Gene Sequencing Serialization Healthcare service Quality Improvements Drug Safety

5. Why Big Data with Hadoop?  Hadoop was Designed to answer the question “How to process big data with reasonable cost and time”.  Apache top level project, Open source implementation of frameworks for reliable, scalable distributed computing and storage.  It is a flexible and highly-available architecture for large scale computation and data processing on a network of commodity hardware.

6. Some examples •Yahoo! More than 100,000 CPUs in ~20,000 computers running Hadoop; biggest cluster: 2000 nodes (2*4cpu boxes with 4TB disk each); used to support research for Ad Systems and Web Search •AOL Used for a variety of things ranging from statistics generation to running advanced algorithms for doing behavioral analysis and targeting; cluster size is 50 machines, Intel Xeon, dual processors, dual core, each with 16GB Ram and 800 GB hard-disk giving us a total of 37 TB HDFS capacity. •Facebook To store copies of internal log and dimension data sources and use it as a source for reporting/analytics and machine learning; 320 machine cluster with 2,560 cores and about 1.3 PB raw storage;

7. Thank You

BDA ( haoop ).pptx

Recommended

Recommended

More Related Content

Similar to BDA ( haoop ).pptx

Similar to BDA ( haoop ).pptx (20)

Recently uploaded

Recently uploaded (20)

BDA ( haoop ).pptx