SlideShare a Scribd company logo

Big data technologies and Hadoop infrastructure

Big Data technologies are bringing new complexity, new tasks and new opportunities in this world. How good is Apache Hadoop for all of this?

1 of 39
Download to read offline
Roman Nikitchenko, 09.05.2014
BIG.DATA technologies
& HADOOP infrastructure
2www.vitech.com.ua
Agenda
Hadoop causes real
big data Industry
changes
What technology
is behind this
name?
Why Hadoop is so
promising solution?
BIG DATA
APPROACH
HADOOP
ENVIRONMENT
INDUSTRY FACE IS
CHANGING
3www.vitech.com.ua
No escape for you ;-)
4www.vitech.com.ua
What is BIG DATA?
● Really BIG DATA things: photo banks, video storage,
historical measurements.
● Intensive data transactions and high distribution: stores
(offline or online), banks, advertising networks.
● Realtime data: measurements and minitoring, gaming.
● Intensive processing: science, modelling.
● High volumes of small things: social networks,
healthcare
BIG DATA IS EVERYWHERE
5www.vitech.com.ua
BIG DATA in just 3 words
Indeed any real big
data is just about
DIGITAL LIFE
FOOTPRINT
6www.vitech.com.ua
WORLD is big data itself
Yet to remember....
WORLD ITSELF CAN
BE DIGITIZED TOO
● Earth weather and environment: realtime, really
big data volume, high potential for processing, lot
of things to be analysed, historical data.
● Space: unlimited potential for analysis, ocean is
yet unknow volume.
● Internet of things is going to be digital world itself.
● ???

Recommended

Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Mahantesh Angadi
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopSavvycom Savvycom
 
Big Data Processing Using Hadoop Infrastructure
Big Data Processing Using Hadoop InfrastructureBig Data Processing Using Hadoop Infrastructure
Big Data Processing Using Hadoop InfrastructureDmitry Buzdin
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - IntroductionTomy Rhymond
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopEdureka!
 

More Related Content

What's hot

Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big dataYukti Kaura
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in detailsMahmoud Yassin
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyonddatasalt
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopFebiyan Rachman
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry PerspectiveCloudera, Inc.
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introductionFrans van Noort
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and DeploymentCisco Canada
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 
Hadoop core concepts
Hadoop core conceptsHadoop core concepts
Hadoop core conceptsMaryan Faryna
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop BasicsSonal Tiwari
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Imviplav
 

What's hot (20)

Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big data
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyond
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Big Data Concepts
Big Data ConceptsBig Data Concepts
Big Data Concepts
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introduction
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
Hadoop core concepts
Hadoop core conceptsHadoop core concepts
Hadoop core concepts
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 

Viewers also liked

10 Most Effective Big Data Technologies
10 Most Effective Big Data Technologies10 Most Effective Big Data Technologies
10 Most Effective Big Data TechnologiesMahindra Comviva
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsCognizant
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopGERARDO BARBERENA
 
Data Infrastructure on Hadoop - Hadoop Summit 2011 BLR
Data Infrastructure on Hadoop - Hadoop Summit 2011 BLRData Infrastructure on Hadoop - Hadoop Summit 2011 BLR
Data Infrastructure on Hadoop - Hadoop Summit 2011 BLRSeetharam Venkatesh
 
Big data: current technology scope.
Big data: current technology scope.Big data: current technology scope.
Big data: current technology scope.Roman Nikitchenko
 
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsCore concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsKaniska Mandal
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosDataWorks Summit
 
Big Data Paradigm - Analysis, Application and Challenges
Big Data Paradigm - Analysis, Application and ChallengesBig Data Paradigm - Analysis, Application and Challenges
Big Data Paradigm - Analysis, Application and ChallengesUyoyo Edosio
 
Big data characteristics, value chain and challenges
Big data characteristics, value chain and challengesBig data characteristics, value chain and challenges
Big data characteristics, value chain and challengesMusfiqur Rahman
 
BigData_Chp2: Hadoop & Map-Reduce
BigData_Chp2: Hadoop & Map-ReduceBigData_Chp2: Hadoop & Map-Reduce
BigData_Chp2: Hadoop & Map-ReduceLilia Sfaxi
 
BigData_Chp1: Introduction à la Big Data
BigData_Chp1: Introduction à la Big DataBigData_Chp1: Introduction à la Big Data
BigData_Chp1: Introduction à la Big DataLilia Sfaxi
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Big data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & ChallengesBig data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & ChallengesShilpi Sharma
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big DataBernard Marr
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 

Viewers also liked (20)

Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
10 Most Effective Big Data Technologies
10 Most Effective Big Data Technologies10 Most Effective Big Data Technologies
10 Most Effective Big Data Technologies
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 
Data Infrastructure on Hadoop - Hadoop Summit 2011 BLR
Data Infrastructure on Hadoop - Hadoop Summit 2011 BLRData Infrastructure on Hadoop - Hadoop Summit 2011 BLR
Data Infrastructure on Hadoop - Hadoop Summit 2011 BLR
 
Final White Paper_
Final White Paper_Final White Paper_
Final White Paper_
 
Big data: current technology scope.
Big data: current technology scope.Big data: current technology scope.
Big data: current technology scope.
 
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsCore concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data Analytics
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesos
 
HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016
 
Big Data Paradigm - Analysis, Application and Challenges
Big Data Paradigm - Analysis, Application and ChallengesBig Data Paradigm - Analysis, Application and Challenges
Big Data Paradigm - Analysis, Application and Challenges
 
Big data characteristics, value chain and challenges
Big data characteristics, value chain and challengesBig data characteristics, value chain and challenges
Big data characteristics, value chain and challenges
 
Big Data and Analytics
Big Data and AnalyticsBig Data and Analytics
Big Data and Analytics
 
BigData_Chp2: Hadoop & Map-Reduce
BigData_Chp2: Hadoop & Map-ReduceBigData_Chp2: Hadoop & Map-Reduce
BigData_Chp2: Hadoop & Map-Reduce
 
Big Data and Analytics on AWS
Big Data and Analytics on AWS Big Data and Analytics on AWS
Big Data and Analytics on AWS
 
BigData_Chp1: Introduction à la Big Data
BigData_Chp1: Introduction à la Big DataBigData_Chp1: Introduction à la Big Data
BigData_Chp1: Introduction à la Big Data
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Big data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & ChallengesBig data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & Challenges
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big Data
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
 

Similar to Big data technologies and Hadoop infrastructure

Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.GeeksLab Odessa
 
Big Data: fall seven times, stand up eight!
Big Data: fall seven times, stand up eight!Big Data: fall seven times, stand up eight!
Big Data: fall seven times, stand up eight!Roman Nikitchenko
 
Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshersrajkamaltibacademy
 
Big data & frameworks: no book for you anymore.
Big data & frameworks: no book for you anymore.Big data & frameworks: no book for you anymore.
Big data & frameworks: no book for you anymore.Roman Nikitchenko
 
Big data & frameworks: no book for you anymore
Big data & frameworks: no book for you anymoreBig data & frameworks: no book for you anymore
Big data & frameworks: no book for you anymoreStfalcon Meetups
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopCafé da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopOCTO Technology
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...Denodo
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysDemi Ben-Ari
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 

Similar to Big data technologies and Hadoop infrastructure (20)

Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
 
Big Data - Big Pitfalls.
Big Data - Big Pitfalls.Big Data - Big Pitfalls.
Big Data - Big Pitfalls.
 
Big Data: fall seven times, stand up eight!
Big Data: fall seven times, stand up eight!Big Data: fall seven times, stand up eight!
Big Data: fall seven times, stand up eight!
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshers
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
bigdata 2.pptx
bigdata 2.pptxbigdata 2.pptx
bigdata 2.pptx
 
Big data & frameworks: no book for you anymore.
Big data & frameworks: no book for you anymore.Big data & frameworks: no book for you anymore.
Big data & frameworks: no book for you anymore.
 
Big data & frameworks: no book for you anymore
Big data & frameworks: no book for you anymoreBig data & frameworks: no book for you anymore
Big data & frameworks: no book for you anymore
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
 
bigdata.pdf
bigdata.pdfbigdata.pdf
bigdata.pdf
 
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopCafé da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Big data
Big dataBig data
Big data
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 

Recently uploaded

From Challenger to Champion: How SpiraPlan Outperforms JIRA+Plugins
From Challenger to Champion: How SpiraPlan Outperforms JIRA+PluginsFrom Challenger to Champion: How SpiraPlan Outperforms JIRA+Plugins
From Challenger to Champion: How SpiraPlan Outperforms JIRA+PluginsInflectra
 
Are Human-generated Demonstrations Necessary for In-context Learning?
Are Human-generated Demonstrations Necessary for In-context Learning?Are Human-generated Demonstrations Necessary for In-context Learning?
Are Human-generated Demonstrations Necessary for In-context Learning?MENGSAYLOEM1
 
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...Adrian Sanabria
 
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERNRonnelBaroc
 
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Umar Saif
 
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...Product School
 
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...htrindia
 
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...UiPathCommunity
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolProduct School
 
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...Product School
 
Enterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book ReviewEnterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book ReviewAshraf Fouad
 
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17Ana-Maria Mihalceanu
 
Bit N Build Poland
Bit N Build PolandBit N Build Poland
Bit N Build PolandGDSC PJATK
 
My sample product research idea for you!
My sample product research idea for you!My sample product research idea for you!
My sample product research idea for you!KivenRaySarsaba
 
How we think about an advisor tech stack
How we think about an advisor tech stackHow we think about an advisor tech stack
How we think about an advisor tech stackSummit
 
Power of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdfPower of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdfkatalinjordans1
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVARobert McDermott
 
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, TripadvisorProduct School
 
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions..."How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions...Fwdays
 
Apex Replay Debugger and Salesforce Platform Events.pptx
Apex Replay Debugger and Salesforce Platform Events.pptxApex Replay Debugger and Salesforce Platform Events.pptx
Apex Replay Debugger and Salesforce Platform Events.pptxmohayyudin7826
 

Recently uploaded (20)

From Challenger to Champion: How SpiraPlan Outperforms JIRA+Plugins
From Challenger to Champion: How SpiraPlan Outperforms JIRA+PluginsFrom Challenger to Champion: How SpiraPlan Outperforms JIRA+Plugins
From Challenger to Champion: How SpiraPlan Outperforms JIRA+Plugins
 
Are Human-generated Demonstrations Necessary for In-context Learning?
Are Human-generated Demonstrations Necessary for In-context Learning?Are Human-generated Demonstrations Necessary for In-context Learning?
Are Human-generated Demonstrations Necessary for In-context Learning?
 
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...
 
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
 
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
 
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
 
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
 
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product School
 
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
 
Enterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book ReviewEnterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book Review
 
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
 
Bit N Build Poland
Bit N Build PolandBit N Build Poland
Bit N Build Poland
 
My sample product research idea for you!
My sample product research idea for you!My sample product research idea for you!
My sample product research idea for you!
 
How we think about an advisor tech stack
How we think about an advisor tech stackHow we think about an advisor tech stack
How we think about an advisor tech stack
 
Power of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdfPower of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdf
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVA
 
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
 
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions..."How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
 
Apex Replay Debugger and Salesforce Platform Events.pptx
Apex Replay Debugger and Salesforce Platform Events.pptxApex Replay Debugger and Salesforce Platform Events.pptx
Apex Replay Debugger and Salesforce Platform Events.pptx
 

Big data technologies and Hadoop infrastructure

  • 1. Roman Nikitchenko, 09.05.2014 BIG.DATA technologies & HADOOP infrastructure
  • 2. 2www.vitech.com.ua Agenda Hadoop causes real big data Industry changes What technology is behind this name? Why Hadoop is so promising solution? BIG DATA APPROACH HADOOP ENVIRONMENT INDUSTRY FACE IS CHANGING
  • 4. 4www.vitech.com.ua What is BIG DATA? ● Really BIG DATA things: photo banks, video storage, historical measurements. ● Intensive data transactions and high distribution: stores (offline or online), banks, advertising networks. ● Realtime data: measurements and minitoring, gaming. ● Intensive processing: science, modelling. ● High volumes of small things: social networks, healthcare BIG DATA IS EVERYWHERE
  • 5. 5www.vitech.com.ua BIG DATA in just 3 words Indeed any real big data is just about DIGITAL LIFE FOOTPRINT
  • 6. 6www.vitech.com.ua WORLD is big data itself Yet to remember.... WORLD ITSELF CAN BE DIGITIZED TOO ● Earth weather and environment: realtime, really big data volume, high potential for processing, lot of things to be analysed, historical data. ● Space: unlimited potential for analysis, ocean is yet unknow volume. ● Internet of things is going to be digital world itself. ● ???
  • 7. 7www.vitech.com.ua So... BIG DATA is not about the data. It is about OUR ABILITY TO HANDLE THEM.
  • 8. 8www.vitech.com.ua But how can I handle big data? … BUT HOW TO HANDLE IT? BIG DATA
  • 9. 9www.vitech.com.ua BIG DATA storage: requirements NO BACKUPS
  • 10. 10www.vitech.com.ua BIG DATA storage: requirements SIMPLE BUT RELIABLE ● Really big amount of data is to be stored in reliable manner. ● Storage is to be simple, recoverable and cheap.
  • 11. 11www.vitech.com.ua BIG DATA storage: requirements DECENTRALIZED ● No single point of failure. ● Scalable as close to linear as possible. ● No manual actions to recover in case of failures
  • 12. 12www.vitech.com.ua BIG DATA processing: requirements SIMPLE TO USE ● Complexity is to be burried inside. ● Interface is to be functional and compatible between versions.
  • 13. 13www.vitech.com.ua BIG DATA processing: requirements TOOLS TO BE CLOSE TO WORK ● Process data on the same nodes as it is stored on. ● Distributed storage — distributed processing.
  • 14. 14www.vitech.com.ua BIG DATA processing: requirements ● Work is to be balanced. ● Data placement is to be appropriate to balanced work. ● Amount of work is to be balanced in accordance to resources. SHARE LOAD
  • 15. 15www.vitech.com.ua Solution requirements in general WHAT FINALLY DO WE NEED? ● CPU+HDD in one place ● Cluster of replacable nodes ● Lot of storage space ● Way to control resources and balance load ● Everything is to be relatively simple and affordable x MAX + = BIG DATA
  • 16. 16www.vitech.com.ua … and what is the solution? HADOOP magic is here!
  • 17. 17www.vitech.com.ua What is it? What is HADOOP? ● Hadoop is open source framework for big data. Both distributed storage and processing. ● Hadoop is reliable and fault tolerant with no rely on hardware for these properties. ● Hadoop has unique horisontal scalability. Currently — from single computer up to thousands of cluster nodes.
  • 18. 18www.vitech.com.ua Facts and trends ● 2004, Was inspired by by Google MapReduce idea. Originally was named just after son's elephant toy. ● On June 13, 2012 Facebook announced their Hadoop cluster has 100 PB of data. On November 8, 2012 they announced the warehouse grows by roughly half a PB per day. ● On February 19, 2008, Yahoo! Inc. launched what it claimed was the world's largest Hadoop production application. The Yahoo! Search Webmap is a Hadoop application that runs on a more than 10,000 core Linux cluster.
  • 19. 19www.vitech.com.ua Hadoop: classical picture Hadoop historical top view ● HDFS serves as file system layer ● MapReduce originally served as distributed processing framework. ● Native client API is Java but there are lot of alternatives. ● This is only initial architecture and it is now more complex.
  • 20. 20www.vitech.com.ua HDFS top view ● Namenode is 'management' component. Keeps 'directory' of what file blocks are stored where. ● Actual work is performed by data nodes.
  • 21. 21www.vitech.com.ua HDFS files handling ● Files are stored in large enough blocks. Every block is replicated to several data nodes. ● Replication is tracked by namenode. Clients only locate blocks using namenode and actual load is taken by datanode. ● Datanode failure leads to replication recovery. Namenode could be backed by standby scheme.
  • 22. 22www.vitech.com.ua HDFS properties ● Designed for throughput, not for latency. ● Blocks are expected to be large. There is issue with lot of small files. ● Write once, read many times ideology. ● Only append, no 'edit' ability. ● Special tools are required to implement OLTP like Apache HBase. HDFS is ...
  • 23. 23www.vitech.com.ua MapReduce framework model ● 2 steps data processing: transform and then reduce. Really nice to do things in distributed manner. ● Large class of jobs can be adopted but not all of them.
  • 24. 24www.vitech.com.ua MapReduce service: top view ● One JobTracker with redundancy possible. ● Multiple TaskTrackers doing actual job. ● Ideology is similar to HDFS handling. ● HDFS is usually used as storage on all phases. MapReduce service
  • 25. 25www.vitech.com.ua Technology: Hadoop 2.0 concept ● New component (YARN) forms resource management layer and completes real distributed data OS. ● MapReduce is from now only one among other YARN appliactions.
  • 26. 26www.vitech.com.ua YARN: notable addition ● Resource manager dispatches client requests. ● Node managers manage node resources. ● Any application is set of containers including application master. YARN service
  • 27. 27www.vitech.com.ua YARN: notable addition ● Better resource balance for heterogeneous clusterss and multple applications. ● Dynamic applications over static services. ● Much wider applications model over simple MapReduce. Things like Spark ot Tez. Why YARN is SO important?
  • 28. 28www.vitech.com.ua Hadoop current picture ● HDFS2 is now about storage and YARN is about processing resources. ● Lot of things to do on top of this data OS starting from traditional MapReduce. Now there is lot of alternatives.
  • 29. 29www.vitech.com.ua Just several items around Infrastructure ● HBase: Scalable structured data storage for large tables. ● Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying. ● Mahout: A Scalable machine learning and data mining library. ● Pig: A high-level data-flow language and execution framework for parallel computation. ● ZooKeeper: A high-performance distributed coordination service.
  • 30. 30www.vitech.com.ua Most important concept First ever world DATA OS 10.000 nodes computer... Recent technology changes are focused on higher scale. Better resource usage and control, lower MTTR, higher security, redundancy, fault tolerance.
  • 31. 31www.vitech.com.ua Big data industry is changing. HADOOP has influence on whole BIG DATA INDUSTRY face
  • 32. 32www.vitech.com.ua New concepts DATA LAKE Take as much data about your business processes as you can take. The more data you have the more value you could get from it.
  • 33. 33www.vitech.com.ua New concepts ENTERPRISE DATA HUB Don't ruine your existing data warehouse. Just extend it with new, centralized big data storage through data migration solution.
  • 34. 34www.vitech.com.ua Trends Big data is goind BIGGER ● SSD are going to be widely used as storage and memory based replica is not a miracle anymore. ● Memory and SSD based caching schemes are going to be more and more aggressive. Particularry in HDFS and HBase. ● Clusters grow. Currently some open source features are targeted for clusters of 1K nodes. How about staging 300 nodes cluster in companies like EBay? ● Production clusters go beyond 4000 nodes (up to 10K). Node failure nearly every day.
  • 35. 35www.vitech.com.ua Trends ● Typecal node is expected to include at least 64G memory ● Starting from 4 x 2T drives for storage. 8-16 x 4T drives are not so rare. This is for general 'workload' node. ● 10 and more CPU cores. 2 CPUs is normal approach. ● SSD is starting to be widely used not only for OS and caching but for data itself. ● Main outcome — per node costs model is changing. HARDWARE IS GOING CHEAPER
  • 36. 36www.vitech.com.ua Most important concept ● You need to limit things you are guessing
  • 37. 37www.vitech.com.ua For whom bell tools? Old way ● Make assumptions about data you need. ● Make assumptions about data model. ● Make assumptions about algorithms you need. ● Get confirmation for your initial guess about result. Are you surprised? New way ● Get as much data as you can. ● Detect data model based on set of algorithms with extensive approach. ● Cluster your data, detect correlations, clean from anomalies... in all way you can afford on whole data set. ● Get grounded results. You still cen miss some fundamental aspects but isn't it much better in any case?
  • 38. 38www.vitech.com.ua Major Hadoop distributions ● HortonWorks are 'barely open source'. Innovative, but 'running too fast'. Most ot their key technologies are not so mature yet. ● Cloudera is stable enough but not stale. Hadoop 2.3 with YARN, HBase 0.96.x. Balance. ● MapR focuses on performance per node but they are slightly outdated in term of functionality and their distribution costs. For cases where node performance is high priority. ● Intel is newcomer on this market. Not for near future.