SlideShare a Scribd company logo
1 of 40
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
Use a parallel
database system
• eBay – 10PB on 256 nodes
Use a NoSQL system
• Facebook - 20PB on 2700 nodes
• Bing – 150PB on 40K nodes
Data Model Example Stores (apologies to the ones I did not list)
Simple Key-Value Pairs Memcache, Redis, Dynamo, Voldermort, LevelDB, Azure Caching
Wide Sparse Column Sets
HyperTable, Big Table, Cassandra, HBASE, Hyperbase, Amazon DynamoDB,
Windows Azure Tables, SQL Server/Azure Sparse columns
BLOBs
Amazon S3, Oracle Berkeley NoSQL, Windows Azure Blob Store, SQL
Server RBS/FileTable
JSON Documents MongoDB, CouchBase, Riak, RavenDB
Graph Neo4J, GraphDB, HypergraphDB, Stig, Intellidimension
Objects and XML Documents
Versant, Oracle Berkeley NoSQL, MarkLogic, existDB, EMC HiveDB, SQL
Server/Azure, Oracle, IBM DB2
Extended Relational Oracle, EMC SQLFire, IBM DB2, MySQL, Postgres, SQL Server/Azure
H2
2011
Hadoop on Azure CTP2
 More capacity
 Stability Improvements
H1
2012
CY
Hadoop on Azure Private CTP
Hadoop on Server Private TAP
 Hadoop Core & Common
 JavaScript Framework
Hadoop on Azure GA
• Portal Integration & Billing
• Azure SDK integration
Hadoop on Server GA
• JavaScript, PIG, Hive, Hbase
• Active Directory Integration
• Systems Center Integration
H2
2012
Hive ODBC Driver
Azure Labs
 Data Explorer
 Social Analytics
 Private Data Market
Hadoop Connectors
Azure Data Market
Excel Integration
 Hive Add-in for Excel
 PowerPivot Add-in for Excel
 Power View for SharePoint Office 15 Integration
DATA
MANAGEMENT
DATA
ENRICHMENT
INSIGHTS
HTML Page AJAX
Browser
Jetty Server
J2EE Servlets
Job Depot
Query
Translator
Processes
(hadoop, pig, hive)
Web
Resources FsShell
Introduction Big Data and Hadoop
Introduction Big Data and Hadoop
Introduction Big Data and Hadoop

More Related Content

What's hot

Collections generic
Collections genericCollections generic
Collections genericsandhish
 
Tutorial 6 queues & arrays & results recording
Tutorial 6   queues & arrays & results recording Tutorial 6   queues & arrays & results recording
Tutorial 6 queues & arrays & results recording Mohd Batati
 
Qtp compare xml files
Qtp compare xml filesQtp compare xml files
Qtp compare xml filesRamu Palanki
 
OmniBase Object Database
OmniBase Object DatabaseOmniBase Object Database
OmniBase Object DatabaseESUG
 
Saving Data
Saving DataSaving Data
Saving DataSV.CO
 
Combining “Text to Speech” and “Object Recognition” Using Deep Learning Tool ...
Combining “Text to Speech” and “Object Recognition” Using Deep Learning Tool ...Combining “Text to Speech” and “Object Recognition” Using Deep Learning Tool ...
Combining “Text to Speech” and “Object Recognition” Using Deep Learning Tool ...Ilyas Majeed
 
Puppet overview
Puppet overviewPuppet overview
Puppet overviewMike_Foto
 
DIY Percolator
DIY PercolatorDIY Percolator
DIY Percolatorjdhok
 
Qtp training session II
Qtp training session IIQtp training session II
Qtp training session IIAisha Mazhar
 
Javascript foundations: Function modules
Javascript foundations: Function modulesJavascript foundations: Function modules
Javascript foundations: Function modulesJohn Hunter
 
Psycopg2 postgres python DDL Operaytions (select , Insert , update, create ta...
Psycopg2 postgres python DDL Operaytions (select , Insert , update, create ta...Psycopg2 postgres python DDL Operaytions (select , Insert , update, create ta...
Psycopg2 postgres python DDL Operaytions (select , Insert , update, create ta...sachin kumar
 
R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011Mandi Walls
 
Connecting and using PostgreSQL database with psycopg2 [Python 2.7]
Connecting and using PostgreSQL database with psycopg2 [Python 2.7]Connecting and using PostgreSQL database with psycopg2 [Python 2.7]
Connecting and using PostgreSQL database with psycopg2 [Python 2.7]Dinesh Neupane
 

What's hot (19)

Collections generic
Collections genericCollections generic
Collections generic
 
Json
Json Json
Json
 
Tutorial 6 queues & arrays & results recording
Tutorial 6   queues & arrays & results recording Tutorial 6   queues & arrays & results recording
Tutorial 6 queues & arrays & results recording
 
Qtp compare xml files
Qtp compare xml filesQtp compare xml files
Qtp compare xml files
 
Python my SQL - create table
Python my SQL - create tablePython my SQL - create table
Python my SQL - create table
 
OmniBase Object Database
OmniBase Object DatabaseOmniBase Object Database
OmniBase Object Database
 
Moar tools for asynchrony!
Moar tools for asynchrony!Moar tools for asynchrony!
Moar tools for asynchrony!
 
Saving Data
Saving DataSaving Data
Saving Data
 
Combining “Text to Speech” and “Object Recognition” Using Deep Learning Tool ...
Combining “Text to Speech” and “Object Recognition” Using Deep Learning Tool ...Combining “Text to Speech” and “Object Recognition” Using Deep Learning Tool ...
Combining “Text to Speech” and “Object Recognition” Using Deep Learning Tool ...
 
Puppet overview
Puppet overviewPuppet overview
Puppet overview
 
DIY Percolator
DIY PercolatorDIY Percolator
DIY Percolator
 
Qtp training session II
Qtp training session IIQtp training session II
Qtp training session II
 
Javascript foundations: Function modules
Javascript foundations: Function modulesJavascript foundations: Function modules
Javascript foundations: Function modules
 
Psycopg2 postgres python DDL Operaytions (select , Insert , update, create ta...
Psycopg2 postgres python DDL Operaytions (select , Insert , update, create ta...Psycopg2 postgres python DDL Operaytions (select , Insert , update, create ta...
Psycopg2 postgres python DDL Operaytions (select , Insert , update, create ta...
 
R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011
 
Connecting and using PostgreSQL database with psycopg2 [Python 2.7]
Connecting and using PostgreSQL database with psycopg2 [Python 2.7]Connecting and using PostgreSQL database with psycopg2 [Python 2.7]
Connecting and using PostgreSQL database with psycopg2 [Python 2.7]
 
Ajax
AjaxAjax
Ajax
 
RxSubject And Operators
RxSubject And OperatorsRxSubject And Operators
RxSubject And Operators
 
Mongo indexes
Mongo indexesMongo indexes
Mongo indexes
 

Similar to Introduction Big Data and Hadoop

Solving real world problems with Hadoop
Solving real world problems with HadoopSolving real world problems with Hadoop
Solving real world problems with Hadoopsynctree
 
Map reduce模型
Map reduce模型Map reduce模型
Map reduce模型dhlzj
 
6. Generics. Collections. Streams
6. Generics. Collections. Streams6. Generics. Collections. Streams
6. Generics. Collections. StreamsDEVTYPE
 
Functional Programming You Already Know - Kevlin Henney - Codemotion Rome 2015
Functional Programming You Already Know - Kevlin Henney - Codemotion Rome 2015Functional Programming You Already Know - Kevlin Henney - Codemotion Rome 2015
Functional Programming You Already Know - Kevlin Henney - Codemotion Rome 2015Codemotion
 
Kotlin Austin Droids April 14 2016
Kotlin Austin Droids April 14 2016Kotlin Austin Droids April 14 2016
Kotlin Austin Droids April 14 2016DesertJames
 
Stream analysis with kafka native way and considerations about monitoring as ...
Stream analysis with kafka native way and considerations about monitoring as ...Stream analysis with kafka native way and considerations about monitoring as ...
Stream analysis with kafka native way and considerations about monitoring as ...Andrew Yongjoon Kong
 
Functional Programming You Already Know
Functional Programming You Already KnowFunctional Programming You Already Know
Functional Programming You Already KnowKevlin Henney
 
Introducción a hadoop
Introducción a hadoopIntroducción a hadoop
Introducción a hadoopdatasalt
 
Java Concurrency Gotchas
Java Concurrency GotchasJava Concurrency Gotchas
Java Concurrency GotchasAlex Miller
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceDatabricks
 
Apache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API BasicsApache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API BasicsFlink Forward
 
Kotlin, Spek and tests
Kotlin, Spek and testsKotlin, Spek and tests
Kotlin, Spek and testsintive
 
I have created a class hasdhedDictionary that implements the Diction.pdf
I have created a class hasdhedDictionary that implements the Diction.pdfI have created a class hasdhedDictionary that implements the Diction.pdf
I have created a class hasdhedDictionary that implements the Diction.pdfallystraders
 

Similar to Introduction Big Data and Hadoop (20)

Solving real world problems with Hadoop
Solving real world problems with HadoopSolving real world problems with Hadoop
Solving real world problems with Hadoop
 
Map reduce模型
Map reduce模型Map reduce模型
Map reduce模型
 
Hw09 Hadoop + Clojure
Hw09   Hadoop + ClojureHw09   Hadoop + Clojure
Hw09 Hadoop + Clojure
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
6. Generics. Collections. Streams
6. Generics. Collections. Streams6. Generics. Collections. Streams
6. Generics. Collections. Streams
 
Hadoop
HadoopHadoop
Hadoop
 
Interpreter Case Study - Design Patterns
Interpreter Case Study - Design PatternsInterpreter Case Study - Design Patterns
Interpreter Case Study - Design Patterns
 
Functional Programming You Already Know - Kevlin Henney - Codemotion Rome 2015
Functional Programming You Already Know - Kevlin Henney - Codemotion Rome 2015Functional Programming You Already Know - Kevlin Henney - Codemotion Rome 2015
Functional Programming You Already Know - Kevlin Henney - Codemotion Rome 2015
 
An introduction to scala
An introduction to scalaAn introduction to scala
An introduction to scala
 
Kotlin Austin Droids April 14 2016
Kotlin Austin Droids April 14 2016Kotlin Austin Droids April 14 2016
Kotlin Austin Droids April 14 2016
 
Hadoop Puzzlers
Hadoop PuzzlersHadoop Puzzlers
Hadoop Puzzlers
 
Hadoop Puzzlers
Hadoop PuzzlersHadoop Puzzlers
Hadoop Puzzlers
 
Stream analysis with kafka native way and considerations about monitoring as ...
Stream analysis with kafka native way and considerations about monitoring as ...Stream analysis with kafka native way and considerations about monitoring as ...
Stream analysis with kafka native way and considerations about monitoring as ...
 
Functional Programming You Already Know
Functional Programming You Already KnowFunctional Programming You Already Know
Functional Programming You Already Know
 
Introducción a hadoop
Introducción a hadoopIntroducción a hadoop
Introducción a hadoop
 
Java Concurrency Gotchas
Java Concurrency GotchasJava Concurrency Gotchas
Java Concurrency Gotchas
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
 
Apache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API BasicsApache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API Basics
 
Kotlin, Spek and tests
Kotlin, Spek and testsKotlin, Spek and tests
Kotlin, Spek and tests
 
I have created a class hasdhedDictionary that implements the Diction.pdf
I have created a class hasdhedDictionary that implements the Diction.pdfI have created a class hasdhedDictionary that implements the Diction.pdf
I have created a class hasdhedDictionary that implements the Diction.pdf
 

More from 명신 김

업무를 빼고 가치를 더하는 클라우드 기술
업무를 빼고 가치를 더하는 클라우드 기술업무를 빼고 가치를 더하는 클라우드 기술
업무를 빼고 가치를 더하는 클라우드 기술명신 김
 
[2020 Ignite Seoul]Azure에서 사용할 수 있는 컨테이너/오케스트레이션 기술 살펴보기
[2020 Ignite Seoul]Azure에서 사용할 수 있는 컨테이너/오케스트레이션 기술 살펴보기[2020 Ignite Seoul]Azure에서 사용할 수 있는 컨테이너/오케스트레이션 기술 살펴보기
[2020 Ignite Seoul]Azure에서 사용할 수 있는 컨테이너/오케스트레이션 기술 살펴보기명신 김
 
Best of Build Seoul 2019 Keynote
Best of Build Seoul 2019 KeynoteBest of Build Seoul 2019 Keynote
Best of Build Seoul 2019 Keynote명신 김
 
Passwordless society
Passwordless societyPasswordless society
Passwordless society명신 김
 
DevOps and Azure Devops 소개, 동향, 그리고 기대효과
DevOps and Azure Devops 소개, 동향, 그리고 기대효과DevOps and Azure Devops 소개, 동향, 그리고 기대효과
DevOps and Azure Devops 소개, 동향, 그리고 기대효과명신 김
 
Serverless design and adoption
Serverless design and adoptionServerless design and adoption
Serverless design and adoption명신 김
 
Durable functions
Durable functionsDurable functions
Durable functions명신 김
 
Azure functions v2 announcement
Azure functions v2 announcementAzure functions v2 announcement
Azure functions v2 announcement명신 김
 
Azure functions
Azure functionsAzure functions
Azure functions명신 김
 
Azure event grid
Azure event gridAzure event grid
Azure event grid명신 김
 
Serverless, Azure Functions, Logic Apps
Serverless, Azure Functions, Logic AppsServerless, Azure Functions, Logic Apps
Serverless, Azure Functions, Logic Apps명신 김
 
Microservices architecture
Microservices architectureMicroservices architecture
Microservices architecture명신 김
 
Visual studio 2015를 활용한 개발 생산성 및 코드 품질 혁신
Visual studio 2015를 활용한 개발 생산성 및 코드 품질 혁신Visual studio 2015를 활용한 개발 생산성 및 코드 품질 혁신
Visual studio 2015를 활용한 개발 생산성 및 코드 품질 혁신명신 김
 
Connect(); 2016 한시간 총정리
Connect(); 2016 한시간 총정리Connect(); 2016 한시간 총정리
Connect(); 2016 한시간 총정리명신 김
 
크로스 플랫폼을 품은 오픈 소스 프레임워크 .NET Core
크로스 플랫폼을 품은 오픈 소스 프레임워크 .NET Core크로스 플랫폼을 품은 오픈 소스 프레임워크 .NET Core
크로스 플랫폼을 품은 오픈 소스 프레임워크 .NET Core명신 김
 
Coded UI test를 이용한 테스트 자동화
Coded UI test를 이용한 테스트 자동화Coded UI test를 이용한 테스트 자동화
Coded UI test를 이용한 테스트 자동화명신 김
 
VS2015 C++ new features
VS2015 C++ new featuresVS2015 C++ new features
VS2015 C++ new features명신 김
 
Welcome to the microsoft madness
Welcome to the microsoft madnessWelcome to the microsoft madness
Welcome to the microsoft madness명신 김
 

More from 명신 김 (20)

업무를 빼고 가치를 더하는 클라우드 기술
업무를 빼고 가치를 더하는 클라우드 기술업무를 빼고 가치를 더하는 클라우드 기술
업무를 빼고 가치를 더하는 클라우드 기술
 
[2020 Ignite Seoul]Azure에서 사용할 수 있는 컨테이너/오케스트레이션 기술 살펴보기
[2020 Ignite Seoul]Azure에서 사용할 수 있는 컨테이너/오케스트레이션 기술 살펴보기[2020 Ignite Seoul]Azure에서 사용할 수 있는 컨테이너/오케스트레이션 기술 살펴보기
[2020 Ignite Seoul]Azure에서 사용할 수 있는 컨테이너/오케스트레이션 기술 살펴보기
 
Best of Build Seoul 2019 Keynote
Best of Build Seoul 2019 KeynoteBest of Build Seoul 2019 Keynote
Best of Build Seoul 2019 Keynote
 
Passwordless society
Passwordless societyPasswordless society
Passwordless society
 
DevOps and Azure Devops 소개, 동향, 그리고 기대효과
DevOps and Azure Devops 소개, 동향, 그리고 기대효과DevOps and Azure Devops 소개, 동향, 그리고 기대효과
DevOps and Azure Devops 소개, 동향, 그리고 기대효과
 
Serverless design and adoption
Serverless design and adoptionServerless design and adoption
Serverless design and adoption
 
Durable functions
Durable functionsDurable functions
Durable functions
 
Azure functions v2 announcement
Azure functions v2 announcementAzure functions v2 announcement
Azure functions v2 announcement
 
Azure functions
Azure functionsAzure functions
Azure functions
 
Logic apps
Logic appsLogic apps
Logic apps
 
Serverless
ServerlessServerless
Serverless
 
Azure event grid
Azure event gridAzure event grid
Azure event grid
 
Serverless, Azure Functions, Logic Apps
Serverless, Azure Functions, Logic AppsServerless, Azure Functions, Logic Apps
Serverless, Azure Functions, Logic Apps
 
Microservices architecture
Microservices architectureMicroservices architecture
Microservices architecture
 
Visual studio 2015를 활용한 개발 생산성 및 코드 품질 혁신
Visual studio 2015를 활용한 개발 생산성 및 코드 품질 혁신Visual studio 2015를 활용한 개발 생산성 및 코드 품질 혁신
Visual studio 2015를 활용한 개발 생산성 및 코드 품질 혁신
 
Connect(); 2016 한시간 총정리
Connect(); 2016 한시간 총정리Connect(); 2016 한시간 총정리
Connect(); 2016 한시간 총정리
 
크로스 플랫폼을 품은 오픈 소스 프레임워크 .NET Core
크로스 플랫폼을 품은 오픈 소스 프레임워크 .NET Core크로스 플랫폼을 품은 오픈 소스 프레임워크 .NET Core
크로스 플랫폼을 품은 오픈 소스 프레임워크 .NET Core
 
Coded UI test를 이용한 테스트 자동화
Coded UI test를 이용한 테스트 자동화Coded UI test를 이용한 테스트 자동화
Coded UI test를 이용한 테스트 자동화
 
VS2015 C++ new features
VS2015 C++ new featuresVS2015 C++ new features
VS2015 C++ new features
 
Welcome to the microsoft madness
Welcome to the microsoft madnessWelcome to the microsoft madness
Welcome to the microsoft madness
 

Recently uploaded

Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxnada99848
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 

Recently uploaded (20)

Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 

Introduction Big Data and Hadoop

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21. public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } }
  • 22. public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } }
  • 23.
  • 24.
  • 25.
  • 26. Use a parallel database system • eBay – 10PB on 256 nodes Use a NoSQL system • Facebook - 20PB on 2700 nodes • Bing – 150PB on 40K nodes
  • 27.
  • 28. Data Model Example Stores (apologies to the ones I did not list) Simple Key-Value Pairs Memcache, Redis, Dynamo, Voldermort, LevelDB, Azure Caching Wide Sparse Column Sets HyperTable, Big Table, Cassandra, HBASE, Hyperbase, Amazon DynamoDB, Windows Azure Tables, SQL Server/Azure Sparse columns BLOBs Amazon S3, Oracle Berkeley NoSQL, Windows Azure Blob Store, SQL Server RBS/FileTable JSON Documents MongoDB, CouchBase, Riak, RavenDB Graph Neo4J, GraphDB, HypergraphDB, Stig, Intellidimension Objects and XML Documents Versant, Oracle Berkeley NoSQL, MarkLogic, existDB, EMC HiveDB, SQL Server/Azure, Oracle, IBM DB2 Extended Relational Oracle, EMC SQLFire, IBM DB2, MySQL, Postgres, SQL Server/Azure
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36. H2 2011 Hadoop on Azure CTP2  More capacity  Stability Improvements H1 2012 CY Hadoop on Azure Private CTP Hadoop on Server Private TAP  Hadoop Core & Common  JavaScript Framework Hadoop on Azure GA • Portal Integration & Billing • Azure SDK integration Hadoop on Server GA • JavaScript, PIG, Hive, Hbase • Active Directory Integration • Systems Center Integration H2 2012 Hive ODBC Driver Azure Labs  Data Explorer  Social Analytics  Private Data Market Hadoop Connectors Azure Data Market Excel Integration  Hive Add-in for Excel  PowerPivot Add-in for Excel  Power View for SharePoint Office 15 Integration DATA MANAGEMENT DATA ENRICHMENT INSIGHTS
  • 37. HTML Page AJAX Browser Jetty Server J2EE Servlets Job Depot Query Translator Processes (hadoop, pig, hive) Web Resources FsShell

Editor's Notes

  1. ,