SlideShare a Scribd company logo
1 of 31
Download to read offline
Big Data Consulting
Hadoop, big data
Robert Gibbon - www.bigindustries.be
The information age
■ The “economic third wave” has badly hit many blue chip
organisations
■ Manufacturing and retail is in rapid decline in Europe and the US
■ Tech, connectivity and information is restructuring our societies
■ Levels of political and social engagement have surged
■ Trading platforms are empowering small businesses
Innovation
■ Mass-production hates innovation
■ Innovation means change – a huge cost with little benefit for
production-line economies
■ Continuous improvement mentality
■ Knowledge services need to innovate to differentiate
■ Change in a virtual world can be cheap and yield huge rewards
■ Continuous reinvention mentality
The rover bicycle, 1885
Big data viz. innovation
■ In a free market like the web, innovation can open up new
opportunities
■ Consumer access to grid computing tech is a recent innovation
■ Grid computing opens up new opportunities that would otherwise
not be viable
■ Ideal for ventures architected around the long-tail economic
model
The future - thingternet
■ The internet of things is with us
■ Billions of connected devices, even digital tattoos
Big data viz. internet of things
■ Billions of connected devices create a huge amount
of data
■ Until big data tech, Internet of Things was nearly
impossible to monetize
The internet of things is a wild west
■ Many new, unsolved challenges
■ Privacy
■ Governance
■ Civil liberties
■ New challenges = new opportunities
let's get back to hadoop
■ FOSS software solution for processing terabytes to petabytes of data
■ Using arrays of regular servers
■ Hadoop core:
■ HDFS - a scale-out file system
■ YARN - a scale-out application resource manager
■ Runtimes:
■ Spark, Impala, Flink, MapReduce, Kafka, SolrCloud etc.
■ Components for data protection, access control and operational management
■ NOSQL databases
■ Hbase, Accumulo, Cassandra etc.
Hadoop refresher
what can you do with hadoop?
Storage
■ Pure online data storage, with no other processing
■ Low cost per-GB for petascale online storage
■ Option to directly query and analyse the data is
available if required.
■ Example: huge, constantly changing catalogue of
products – like Ebay and Amazon
■ SolrCloud – an advanced search engine serving
terabytes of content from Hadoop
Search
Messaging
■ A distributed message queue backed by a Hadoop
cluster - Apache Kafka
■ Elastically scalable
■ Messages are persisted and replicated for durability
■ TBs of messages per broker with predictable
performance
Targeting
■ Personalised content for users
■ Generates and consumes a huge amount of log data
■ for reporting
■ for predictive analysis
■ Predictive analysis is compute intensive
■ Can be TBs of data per day
Self-service Business Intelligence
■ Enterprise Data Hub paradigm
■ A very popular emerging use case
■ Business users directly access raw datasets
using specialised discovery tools built on top of
Hadoop - DataMeer, Platfora and others
Data warehousing
■ Migration of Enterprise Data Warehouse to Hadoop
■ Big cost savings versus trad vendors like Oracle and
Teradata
Machine learning
■ Predictive analytics with Spark MLLib or
Revolution R Enterprise
■ Automatically predict component failures for
proactive intervention
Big Database
■ Low latency, high throughput, high concurrency,
high volume
■ Algotrading
■ Realtime ad auctions
■ Volumes at 200BN transactions per day in realtime
reliably served
■ Analysis and response to threats detected by SPI
module on remote switch
■ Automated systems management – shut down
heating when nobody home to reduce heating bill and
emissions
■ Monitor driver propensity to break the speed limit -
offer lower insurance premiums to good drivers
Device management
hadoop - mature?
Choice of vendors
Solid operational management
Impala v Teradata
Free grid computing
Free scale-out database
Growing commercial ecosystem
Secure and available
■ RPC authentication and encryption with PKI
■ Data encryption at rest and in transit
■ Kerberos resource access control - HDFS, YARN
■ Table cell level permissions - Accumulo
■ Online snapshot backups
■ No SPoF
thanks for listening
be.linkedin.com/in/robertgibbon

More Related Content

What's hot

What's hot (20)

Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
 
Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshers
 
Big data
Big dataBig data
Big data
 
Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou on "Big Data Technology, Strategy, and Applications"Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou on "Big Data Technology, Strategy, and Applications"
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Overview of Bigdata Analytics
Overview of Bigdata Analytics Overview of Bigdata Analytics
Overview of Bigdata Analytics
 
Analysis of big data in pandemic case
Analysis of big data in pandemic case Analysis of big data in pandemic case
Analysis of big data in pandemic case
 
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
 
Big data tools
Big data toolsBig data tools
Big data tools
 
BigData
BigDataBigData
BigData
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data frameworks
Big data frameworksBig data frameworks
Big data frameworks
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
 
Big data
Big dataBig data
Big data
 
introduction to big data frameworks
introduction to big data frameworksintroduction to big data frameworks
introduction to big data frameworks
 
Big data hadoop
Big data hadoopBig data hadoop
Big data hadoop
 
Big Data
Big DataBig Data
Big Data
 
Big data Ppt
Big data PptBig data Ppt
Big data Ppt
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 

Similar to Big data, Hadoop - lunchtime talk 2015.02.26

Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
dickonsondorris
 
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docxBIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
tangyechloe
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
kalai75
 

Similar to Big data, Hadoop - lunchtime talk 2015.02.26 (20)

Big data
Big dataBig data
Big data
 
Bigdata
BigdataBigdata
Bigdata
 
Why Big Data - the data rush
Why Big Data - the data rushWhy Big Data - the data rush
Why Big Data - the data rush
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Big_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptxBig_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptx
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docxBIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with Bluemix
 
Introduction to Harnessing Big Data
Introduction to Harnessing Big DataIntroduction to Harnessing Big Data
Introduction to Harnessing Big Data
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Big data
Big dataBig data
Big data
 
Big data 2017 final
Big data 2017   finalBig data 2017   final
Big data 2017 final
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
Internet of Things and Hadoop
Internet of Things and HadoopInternet of Things and Hadoop
Internet of Things and Hadoop
 

Recently uploaded

Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
23050636
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotecAbortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
jk0tkvfv
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
wsppdmt
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
 

Recently uploaded (20)

Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
ℂall Girls In Navi Mumbai Hire Me Neha 9910780858 Top Class ℂall Girl Serviℂe...
ℂall Girls In Navi Mumbai Hire Me Neha 9910780858 Top Class ℂall Girl Serviℂe...ℂall Girls In Navi Mumbai Hire Me Neha 9910780858 Top Class ℂall Girl Serviℂe...
ℂall Girls In Navi Mumbai Hire Me Neha 9910780858 Top Class ℂall Girl Serviℂe...
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotecAbortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotec
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 

Big data, Hadoop - lunchtime talk 2015.02.26

  • 1. Big Data Consulting Hadoop, big data Robert Gibbon - www.bigindustries.be
  • 2.
  • 3. The information age ■ The “economic third wave” has badly hit many blue chip organisations ■ Manufacturing and retail is in rapid decline in Europe and the US ■ Tech, connectivity and information is restructuring our societies ■ Levels of political and social engagement have surged ■ Trading platforms are empowering small businesses
  • 4.
  • 5. Innovation ■ Mass-production hates innovation ■ Innovation means change – a huge cost with little benefit for production-line economies ■ Continuous improvement mentality ■ Knowledge services need to innovate to differentiate ■ Change in a virtual world can be cheap and yield huge rewards ■ Continuous reinvention mentality
  • 7. Big data viz. innovation ■ In a free market like the web, innovation can open up new opportunities ■ Consumer access to grid computing tech is a recent innovation ■ Grid computing opens up new opportunities that would otherwise not be viable ■ Ideal for ventures architected around the long-tail economic model
  • 8. The future - thingternet ■ The internet of things is with us ■ Billions of connected devices, even digital tattoos
  • 9. Big data viz. internet of things ■ Billions of connected devices create a huge amount of data ■ Until big data tech, Internet of Things was nearly impossible to monetize
  • 10. The internet of things is a wild west ■ Many new, unsolved challenges ■ Privacy ■ Governance ■ Civil liberties ■ New challenges = new opportunities
  • 11. let's get back to hadoop
  • 12. ■ FOSS software solution for processing terabytes to petabytes of data ■ Using arrays of regular servers ■ Hadoop core: ■ HDFS - a scale-out file system ■ YARN - a scale-out application resource manager ■ Runtimes: ■ Spark, Impala, Flink, MapReduce, Kafka, SolrCloud etc. ■ Components for data protection, access control and operational management ■ NOSQL databases ■ Hbase, Accumulo, Cassandra etc. Hadoop refresher
  • 13. what can you do with hadoop?
  • 14. Storage ■ Pure online data storage, with no other processing ■ Low cost per-GB for petascale online storage ■ Option to directly query and analyse the data is available if required.
  • 15. ■ Example: huge, constantly changing catalogue of products – like Ebay and Amazon ■ SolrCloud – an advanced search engine serving terabytes of content from Hadoop Search
  • 16. Messaging ■ A distributed message queue backed by a Hadoop cluster - Apache Kafka ■ Elastically scalable ■ Messages are persisted and replicated for durability ■ TBs of messages per broker with predictable performance
  • 17. Targeting ■ Personalised content for users ■ Generates and consumes a huge amount of log data ■ for reporting ■ for predictive analysis ■ Predictive analysis is compute intensive ■ Can be TBs of data per day
  • 18. Self-service Business Intelligence ■ Enterprise Data Hub paradigm ■ A very popular emerging use case ■ Business users directly access raw datasets using specialised discovery tools built on top of Hadoop - DataMeer, Platfora and others
  • 19. Data warehousing ■ Migration of Enterprise Data Warehouse to Hadoop ■ Big cost savings versus trad vendors like Oracle and Teradata
  • 20. Machine learning ■ Predictive analytics with Spark MLLib or Revolution R Enterprise ■ Automatically predict component failures for proactive intervention
  • 21. Big Database ■ Low latency, high throughput, high concurrency, high volume ■ Algotrading ■ Realtime ad auctions ■ Volumes at 200BN transactions per day in realtime reliably served
  • 22. ■ Analysis and response to threats detected by SPI module on remote switch ■ Automated systems management – shut down heating when nobody home to reduce heating bill and emissions ■ Monitor driver propensity to break the speed limit - offer lower insurance premiums to good drivers Device management
  • 30. Secure and available ■ RPC authentication and encryption with PKI ■ Data encryption at rest and in transit ■ Kerberos resource access control - HDFS, YARN ■ Table cell level permissions - Accumulo ■ Online snapshot backups ■ No SPoF