SlideShare a Scribd company logo
Big Data 
Overview – Part 1 
Wm. Barrett Simms 
barrett@wbsimms.com 
@wbsimms
Opening remarks 
• Sponsors 
• Pluralsight 
• Free month gift card give away. Enter your name in the pot! 
• DevExpress 
• $250 in developer JustCode tools. 
• O’Reilly 
• Book give away. Enter your name in the pot! 
• Boston Code Camp 22 (November 22nd) 
• http://www.bostoncodecamp.com/ 
• Thanks to 3thought for the space
About Me 
Software 
Developer 
Agile Team 
Member 
Team Lead 
Agile 
Advocate 
SDLC 
Implementer
SDLC
Big Data 
“Big data is an all-encompassing term for any collection of data sets so 
large and complex that it becomes difficult to process using traditional 
data processing applications.” 
- Wikipedia
The 3 Vs 
• Volume 
• A few Gigabytes -> Petabyte 
• Velocity 
• Arrives quickly 
• Variety 
• Multiple Sources
Volume 
• Traditional SQL architectures don’t scale to very large 
• Maybe this isn’t so true 
…but the MMP systems are expensive
An example problem (Volume) 
• You own a chain of stores 
• … with 25,000 stores and 100,000 POS systems 
• Need information on inventory changes 
• By region 
• By store
Velocity 
• Traditional solutions don’t handle fast inbound data 
• Maybe this isn’t so true 
…but you lose data.
Another example (Velocity) 
• You host a website 
• … on 10,000 servers 
• Monitor logs for errors
Variety 
• Most traditional solutions don’t handle a variety of data types well 
• Maybe this isn’t so true 
…But you need to write a custom importer for every type.
A final example (Variety) 
• You own a business 
• With a sales and marketing teams 
• … in different regions around the world 
• Correlate sales numbers against marketing expenses
The First Problem : Computing Power 
First Second Third 
First Second Third 
First Second Third 
First Second Third 
First Second Third 
Limited by cores 
(Scaling up)
Solution: Scale out (not up!) 
Server 1 Server 2 
Coordinator 
Server 3 Server 4
Coordination 
Job Coordinator 
Runner 
Runner 
Runner
MapReduce 
• A programming model and an associated implementation for 
processing and generating large data sets with a parallel, distributed 
algorithm on a cluster. – Wikipedia 
WHAT?
Map and Reduce 
• Map 
• Process data returning key value pairs 
• Reduce 
• Aggregate/Filter key value pairs into result 
Map 
Map 
Data 
Data 
Reduce Result
Mapping 
• Easy example 
• Store Sales 
• Find most sales per store in 2010 
Year Month Store Id SalesTotal 
2010 1 13 1,000 
2010 3 43 12,000 
2010 3 21 21,000 
2010 4 13 3,000 
2010 2 56 4,000 
2010 6 32 12,000 
2010 7 1 4,000 
2010 2 23 2,000
Solution – Map 
1. Mapper feeds document rows to your program 
2. You return key value pairs 
StoreId Sales 
21 2,000 
23 3,000 
2 1,000 
21 23,000
Solution - Reduce 
• Data is merged 
• Merged into Key/Values: 
{21, [2,000, 23,000]} 
{23, [3,000]} 
{2, [1,000]} 
• You process each row
Data Access 
• Each process needs access to data 
Typical Desired
HDFS 
• Hadoop File System 
• Open-source implementation of the Google File System (GFS) 
Hard drives last about 1,000 days. So, 
if you have 1K hard drives, you’ll lose 
one per day.
The ecosystem 
• Hive 
• SQL-like query language 
• Define and enforce schema 
• Pig 
• SQL-like query language 
• Sqoop 
• SQL/Hadoop integration 
• Oozie 
• Scheduling 
• Mahout 
• Machine Learning interface 
• Storm 
• Stream-based MapReduce 
… and Many Others
Vendors 
• Hortonworks 
• Single click install of Sandbox 
• Cloudera 
• Downloadable VM 
• Syncfusion 
• Single click install of Syncfusion Big Data 
• Amazon AWS 
• Elastic MapReduce 
• Microsoft Azure 
• HDInsight
Contact Me 
Barrett Simms 
barrett@wbsimms.com 
http://wbsimms.com 
Twitter: @wbsimms 
Phone: 781.405.4686

More Related Content

What's hot

Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
Roopendra Vishwakarma
 
Academy PRO: Introduction to search engines. Meet Elasticsearch
Academy PRO: Introduction to search engines. Meet ElasticsearchAcademy PRO: Introduction to search engines. Meet Elasticsearch
Academy PRO: Introduction to search engines. Meet Elasticsearch
Binary Studio
 
ELK - Stack - Munich .net UG
ELK - Stack - Munich .net UGELK - Stack - Munich .net UG
ELK - Stack - Munich .net UG
Steve Behrendt
 
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
kristgen
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document database
Robert Lujo
 
Elasticsearch 5.0
Elasticsearch 5.0Elasticsearch 5.0
Elasticsearch 5.0
Matias Cascallares
 
Visualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaVisualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and Kibana
ObjectRocket
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
Vikram Shinde
 
MongoDB
MongoDBMongoDB
MongoDB
Rony Gregory
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
Vinay Kumar
 
473_LightningTalks.pptx
473_LightningTalks.pptx473_LightningTalks.pptx
473_LightningTalks.pptxAakash Takale
 
Dataspace presentatie
Dataspace presentatieDataspace presentatie
Dataspace presentatie
Roland Cornelissen
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Edureka!
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
Bo Andersen
 
Klevis Mino: MongoDB
Klevis Mino: MongoDBKlevis Mino: MongoDB
Klevis Mino: MongoDB
Carlo Vaccari
 
Building an API layer for C* at Coursera
Building an API layer for C* at CourseraBuilding an API layer for C* at Coursera
Building an API layer for C* at Coursera
Daniel Jin Hao Chia
 
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Windows Developer
 
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Continuent
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
Yann Cluchey
 
Scaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays SingaporeScaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays Singapore
Angad Singh
 

What's hot (20)

Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
 
Academy PRO: Introduction to search engines. Meet Elasticsearch
Academy PRO: Introduction to search engines. Meet ElasticsearchAcademy PRO: Introduction to search engines. Meet Elasticsearch
Academy PRO: Introduction to search engines. Meet Elasticsearch
 
ELK - Stack - Munich .net UG
ELK - Stack - Munich .net UGELK - Stack - Munich .net UG
ELK - Stack - Munich .net UG
 
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document database
 
Elasticsearch 5.0
Elasticsearch 5.0Elasticsearch 5.0
Elasticsearch 5.0
 
Visualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaVisualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and Kibana
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
 
MongoDB
MongoDBMongoDB
MongoDB
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
 
473_LightningTalks.pptx
473_LightningTalks.pptx473_LightningTalks.pptx
473_LightningTalks.pptx
 
Dataspace presentatie
Dataspace presentatieDataspace presentatie
Dataspace presentatie
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Klevis Mino: MongoDB
Klevis Mino: MongoDBKlevis Mino: MongoDB
Klevis Mino: MongoDB
 
Building an API layer for C* at Coursera
Building an API layer for C* at CourseraBuilding an API layer for C* at Coursera
Building an API layer for C* at Coursera
 
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
 
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
 
Scaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays SingaporeScaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays Singapore
 

Viewers also liked

Insights of Brazilian Luxury Market Palestra UBIFRANCE
Insights of Brazilian Luxury Market Palestra UBIFRANCEInsights of Brazilian Luxury Market Palestra UBIFRANCE
Insights of Brazilian Luxury Market Palestra UBIFRANCE
Haroldo Monteiro da Silva Filho
 
Unit Testing and Tools
Unit Testing and ToolsUnit Testing and Tools
Unit Testing and Tools
William Simms
 
shinwari saltish
shinwari saltishshinwari saltish
shinwari saltishJia Usaf
 
Introduction to scrum
Introduction to scrumIntroduction to scrum
Introduction to scrum
William Simms
 
Software Development And Delivery Metrics That Matter
Software Development And Delivery Metrics That MatterSoftware Development And Delivery Metrics That Matter
Software Development And Delivery Metrics That MatterWilliam Simms
 
Negotiation skills ppt.odp
Negotiation skills ppt.odpNegotiation skills ppt.odp
Negotiation skills ppt.odp
Hari Kudchadkar
 
Bir elektronikçinin mutluluk formülü.
Bir elektronikçinin mutluluk formülü.Bir elektronikçinin mutluluk formülü.
Bir elektronikçinin mutluluk formülü.murat-yaman.com
 
Oriental theatre Presentation - Dramatics
Oriental theatre Presentation - DramaticsOriental theatre Presentation - Dramatics
Oriental theatre Presentation - Dramatics
Hari Kudchadkar
 
E commerce business model strategy opportunities and challenges in
E commerce business model strategy opportunities and challenges inE commerce business model strategy opportunities and challenges in
E commerce business model strategy opportunities and challenges in
Haroldo Monteiro da Silva Filho
 
The art of selling fashion franchises through storytelling
The art of selling fashion franchises through  storytellingThe art of selling fashion franchises through  storytelling
The art of selling fashion franchises through storytelling
Haroldo Monteiro da Silva Filho
 
French Theatre Presentation - Dramatics Class
French Theatre Presentation - Dramatics ClassFrench Theatre Presentation - Dramatics Class
French Theatre Presentation - Dramatics Class
Hari Kudchadkar
 
스타트업 홍보의 이론과 실제 20130323
스타트업 홍보의 이론과 실제 20130323스타트업 홍보의 이론과 실제 20130323
스타트업 홍보의 이론과 실제 20130323YoonTaeSup
 
The fashion franchising market in brazil
The fashion franchising market in brazilThe fashion franchising market in brazil
The fashion franchising market in brazil
Haroldo Monteiro da Silva Filho
 
Toyota's Team Culture Case Presentation
Toyota's Team Culture Case PresentationToyota's Team Culture Case Presentation
Toyota's Team Culture Case PresentationHari Kudchadkar
 
Intergroup conflict
Intergroup conflictIntergroup conflict
Intergroup conflict
Jia Usaf
 
News beats journalism ppt final
News beats   journalism ppt finalNews beats   journalism ppt final
News beats journalism ppt final
Hari Kudchadkar
 
Medical Tourism Presentation
Medical Tourism PresentationMedical Tourism Presentation
Medical Tourism Presentation
Hari Kudchadkar
 

Viewers also liked (17)

Insights of Brazilian Luxury Market Palestra UBIFRANCE
Insights of Brazilian Luxury Market Palestra UBIFRANCEInsights of Brazilian Luxury Market Palestra UBIFRANCE
Insights of Brazilian Luxury Market Palestra UBIFRANCE
 
Unit Testing and Tools
Unit Testing and ToolsUnit Testing and Tools
Unit Testing and Tools
 
shinwari saltish
shinwari saltishshinwari saltish
shinwari saltish
 
Introduction to scrum
Introduction to scrumIntroduction to scrum
Introduction to scrum
 
Software Development And Delivery Metrics That Matter
Software Development And Delivery Metrics That MatterSoftware Development And Delivery Metrics That Matter
Software Development And Delivery Metrics That Matter
 
Negotiation skills ppt.odp
Negotiation skills ppt.odpNegotiation skills ppt.odp
Negotiation skills ppt.odp
 
Bir elektronikçinin mutluluk formülü.
Bir elektronikçinin mutluluk formülü.Bir elektronikçinin mutluluk formülü.
Bir elektronikçinin mutluluk formülü.
 
Oriental theatre Presentation - Dramatics
Oriental theatre Presentation - DramaticsOriental theatre Presentation - Dramatics
Oriental theatre Presentation - Dramatics
 
E commerce business model strategy opportunities and challenges in
E commerce business model strategy opportunities and challenges inE commerce business model strategy opportunities and challenges in
E commerce business model strategy opportunities and challenges in
 
The art of selling fashion franchises through storytelling
The art of selling fashion franchises through  storytellingThe art of selling fashion franchises through  storytelling
The art of selling fashion franchises through storytelling
 
French Theatre Presentation - Dramatics Class
French Theatre Presentation - Dramatics ClassFrench Theatre Presentation - Dramatics Class
French Theatre Presentation - Dramatics Class
 
스타트업 홍보의 이론과 실제 20130323
스타트업 홍보의 이론과 실제 20130323스타트업 홍보의 이론과 실제 20130323
스타트업 홍보의 이론과 실제 20130323
 
The fashion franchising market in brazil
The fashion franchising market in brazilThe fashion franchising market in brazil
The fashion franchising market in brazil
 
Toyota's Team Culture Case Presentation
Toyota's Team Culture Case PresentationToyota's Team Culture Case Presentation
Toyota's Team Culture Case Presentation
 
Intergroup conflict
Intergroup conflictIntergroup conflict
Intergroup conflict
 
News beats journalism ppt final
News beats   journalism ppt finalNews beats   journalism ppt final
News beats journalism ppt final
 
Medical Tourism Presentation
Medical Tourism PresentationMedical Tourism Presentation
Medical Tourism Presentation
 

Similar to Big Data Overview Part 1

Big Data
Big DataBig Data
Big Data
Mahesh Bmn
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
Abhishek Roy
 
bigdata.pdf
bigdata.pdfbigdata.pdf
bigdata.pdf
AnjaliKumari301316
 
bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
VIJAYAPRABAP
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with Azure
Christos Charmatzis
 
What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?
CQD
 
Big data technology
Big data technology Big data technology
Big data technology
omer mohamed abd alrhman
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Lucidworks
 
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZoneStartup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
Idan Tohami
 
Scalable web architecture
Scalable web architectureScalable web architecture
Scalable web architecture
Kaushik Paranjape
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
Zohar Elkayam
 
try
trytry
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
Amazon Web Services LATAM
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
Venkata Reddy Konasani
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Caserta
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
سکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرسکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابر
datastack
 
MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, when
Eugenio Minardi
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 

Similar to Big Data Overview Part 1 (20)

Big Data
Big DataBig Data
Big Data
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
bigdata.pdf
bigdata.pdfbigdata.pdf
bigdata.pdf
 
bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with Azure
 
What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?
 
Big data technology
Big data technology Big data technology
Big data technology
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
 
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZoneStartup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
 
Scalable web architecture
Scalable web architectureScalable web architecture
Scalable web architecture
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
try
trytry
try
 
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
سکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرسکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابر
 
MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, when
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 

Recently uploaded

Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 

Recently uploaded (20)

Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 

Big Data Overview Part 1

  • 1. Big Data Overview – Part 1 Wm. Barrett Simms barrett@wbsimms.com @wbsimms
  • 2. Opening remarks • Sponsors • Pluralsight • Free month gift card give away. Enter your name in the pot! • DevExpress • $250 in developer JustCode tools. • O’Reilly • Book give away. Enter your name in the pot! • Boston Code Camp 22 (November 22nd) • http://www.bostoncodecamp.com/ • Thanks to 3thought for the space
  • 3. About Me Software Developer Agile Team Member Team Lead Agile Advocate SDLC Implementer
  • 5. Big Data “Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using traditional data processing applications.” - Wikipedia
  • 6. The 3 Vs • Volume • A few Gigabytes -> Petabyte • Velocity • Arrives quickly • Variety • Multiple Sources
  • 7. Volume • Traditional SQL architectures don’t scale to very large • Maybe this isn’t so true …but the MMP systems are expensive
  • 8. An example problem (Volume) • You own a chain of stores • … with 25,000 stores and 100,000 POS systems • Need information on inventory changes • By region • By store
  • 9. Velocity • Traditional solutions don’t handle fast inbound data • Maybe this isn’t so true …but you lose data.
  • 10. Another example (Velocity) • You host a website • … on 10,000 servers • Monitor logs for errors
  • 11. Variety • Most traditional solutions don’t handle a variety of data types well • Maybe this isn’t so true …But you need to write a custom importer for every type.
  • 12. A final example (Variety) • You own a business • With a sales and marketing teams • … in different regions around the world • Correlate sales numbers against marketing expenses
  • 13. The First Problem : Computing Power First Second Third First Second Third First Second Third First Second Third First Second Third Limited by cores (Scaling up)
  • 14. Solution: Scale out (not up!) Server 1 Server 2 Coordinator Server 3 Server 4
  • 15. Coordination Job Coordinator Runner Runner Runner
  • 16. MapReduce • A programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. – Wikipedia WHAT?
  • 17. Map and Reduce • Map • Process data returning key value pairs • Reduce • Aggregate/Filter key value pairs into result Map Map Data Data Reduce Result
  • 18. Mapping • Easy example • Store Sales • Find most sales per store in 2010 Year Month Store Id SalesTotal 2010 1 13 1,000 2010 3 43 12,000 2010 3 21 21,000 2010 4 13 3,000 2010 2 56 4,000 2010 6 32 12,000 2010 7 1 4,000 2010 2 23 2,000
  • 19. Solution – Map 1. Mapper feeds document rows to your program 2. You return key value pairs StoreId Sales 21 2,000 23 3,000 2 1,000 21 23,000
  • 20. Solution - Reduce • Data is merged • Merged into Key/Values: {21, [2,000, 23,000]} {23, [3,000]} {2, [1,000]} • You process each row
  • 21. Data Access • Each process needs access to data Typical Desired
  • 22. HDFS • Hadoop File System • Open-source implementation of the Google File System (GFS) Hard drives last about 1,000 days. So, if you have 1K hard drives, you’ll lose one per day.
  • 23. The ecosystem • Hive • SQL-like query language • Define and enforce schema • Pig • SQL-like query language • Sqoop • SQL/Hadoop integration • Oozie • Scheduling • Mahout • Machine Learning interface • Storm • Stream-based MapReduce … and Many Others
  • 24. Vendors • Hortonworks • Single click install of Sandbox • Cloudera • Downloadable VM • Syncfusion • Single click install of Syncfusion Big Data • Amazon AWS • Elastic MapReduce • Microsoft Azure • HDInsight
  • 25. Contact Me Barrett Simms barrett@wbsimms.com http://wbsimms.com Twitter: @wbsimms Phone: 781.405.4686

Editor's Notes

  1. Welcome!
  2. Focus on technical product delivery
  3. Each inbound request spawns three processes. Spawning multiple processes isn’t scalable