SlideShare a Scribd company logo
From Big Data to Fast Data
Sina Sheikholeslami
s.sheikholeslami@digikala.com
9th Amirkabir Linux Festival
May 4 2017
Overview
• The War on Big Data Definition
• The Early Days
• State-of-the-art Big Data Processing Platforms
• The Rise of Fast Data: Applications & Platforms
• How to Get Involved
Part 1: 

The War on Big Data Definition
What is Big Data?
• “Big Data… everyone talks about it, nobody really
knows how to do it, everyone thinks everyone else
is doing it, so everyone claims they are doing it…”

- Dan Ariely
4
What is Big Data? (Cont’d)
• Big Data refers to extremely large data sets that
may be analyzed computationally to reveal
patterns, trends, and associations, especially
relating to human behavior and interactions.

- Oxford English Dictionary (Since 2013)
5
What is Big Data? (Cont’d)
• Big Data is high-volume, high-velocity and/or
high-variety information assets that demand cost-
effective, innovative forms of information
processing that enable enhanced insight, decision
making, and process automation. 

- Gartner IT Glossary
6
What is Big Data? (Cont’d)
• Big Data consists of extensive datasets - primarily
in the characteristics of volume, variety, velocity,
and/or variability - that require a scalable
architecture for efficient storage, manipulation, and
analysis.

- U.S. National Institute of Standards & Technology
7
What is Big Data? (Cont’d)
- UC Berkeley Datascience Survey, September 2014
8
Part 2: 

The Early Days
The Google File System
10
In SOSP’03, Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung published the paper on GFS.

Google developed GFS to provide efficient, reliable access to data using large clusters of
commodity hardware.
Bringing the Computation Near Data:
MapReduce
11
Jeffrey Dean & Sanjay Ghemawat published the MapReduce paper in OSDI’04.

It has been cited more than 20000 times since then.
Some say we can divide the human race in two:

Those who have never heard of the “Word Count” example,

and those who… well, let’s just say, don’t like it.
The Word Count Example
https://wikis.nyu.edu/display/NYUHPC/
The Yellow Elephant
14
Based on GFS & MapReduce papers, the guys at Yahoo! developed an open-
source platform for distributed storage and processing of big datasets.

Called “Apache Nutch” in its early days, the first release of Apache Hadoop
happened in January 2006.
The Hadoop Ecosystem
• Hadoop Common: The common
utilities that support the other
Hadoop modules.
• Hadoop Distributed File System
(HDFS): A distributed file system
that provides high-throughput
access to application data.
• Hadoop YARN: A framework for job
scheduling and cluster resource
management.
• Hadoop MapReduce: A YARN-
based system for parallel
processing of large data sets.
15
“Data” Got Bigger…
16
NumberofInternetUsers(Millions)
05001,0001,5002,000
December, 1995 December, 1999 March, 2001 July, 2002 October, 2003 October, 2004 September, 2005 June, 2006 June, 2007 June, 2008 June, 2009
internetworldstats.com
And Bigger…
17
“There were 5 exabytes of information created by the entire world between the
dawn of civilization and 2003. Now that same amount is created every two days.”

Eric Schmidt (then CEO of Google),

at the Techonomy Conference in Lake Tahoe, California, August 2010
Part 3: 

State-of-the-art

Big Data Processing Platforms
A Classic Batch Processing Architecture
19
Dean Wampler, “Fast Data Architectures For Streaming Applications”
The Big Data Stack
20
Courtesy of Amir H. Payberah, “Data Intensive Computing Platforms”
The Big Data Stack

Resource Management Layer
21
Courtesy of Amir H. Payberah, “Data Intensive Computing Platforms”
The Big Data Stack

Storage Layer
22
Courtesy of Amir H. Payberah, “Data Intensive Computing Platforms”
The Big Data Stack

Data Processing Layer
23
Courtesy of Amir H. Payberah, “Data Intensive Computing Platforms”
Apache Spark
• In-Memory Distributed Processing Platform
• Similar Semantics for Batch & Stream
Processing
• Initially started by Matei Zaharia at UC
Berkeley’s AMPLab in 2009
• Became a top-level Apache Project in
February 2014
• 11935 Forks, 1068 Contributors
• Written primarily in Scala, more than 1M
lines of code
24
Spark vs. Hadoop MapReduce
25
Courtesy of Amir H. Payberah, “Data Intensive Computing Platforms”
Spark Stack
26
The Bigger Picture
27
BDAS, the Berkeley Data Analytics Stack
Apache Flink
• “open-source stream processing
framework for distributed, high-
performing, always-available, and
accurate data streaming applications”
• Data is processed an event-at-a-time
rather than as a series of batches
• Originally named “Stratosphere”, started in
2010 with funding from DFG
• Became a top-level Apache Project in
December 2014
• 1598 Forks, 309 Contributors
• Written primarily in Java, more than 1M
lines of code
28
Flink Stack
29
Part 4: 

The Rise of Fast Data
Applications & Platforms
They Don’t Wait For It
31
We Can’t Wait For It
32
They Won’t Wait For It
33
They Shouldn’t Wait For It
34
cabotsolutions.com
My Boss Won’t Wait For It
35
Fast Data: A Definition
“Fast data is the application of big data analytics to
smaller data sets in near-real or real-time in order to
solve a problem or create business value.”

- TechTarget
36
Looking Back at a Classic Batch
Processing Architecture
37
Dean Wampler, “Fast Data Architectures For Streaming Applications”
“Fast Data” Processing Architecture
38
Dean Wampler, “Fast Data Architectures For Streaming Applications”
How to Get Involved
39
Open-source!
40
And to Wrap it Up…
• Big Data History & Platforms
• Big Data vs. Fast Data
• Fast Data Architectures & Platforms
• Getting Involved
41
Attribution
• Thanks to Alekksall, Ddraw, Ibrandify,Yurlick,

and Makyzz of freepik.com, for the free pics!
• Thanks to the awesome people at The Apache
Foundation. For Everything. Including the graphics.
42
And To You…
43

More Related Content

What's hot

Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
Abdullah Çetin ÇAVDAR
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
Putchong Uthayopas
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
17aroumougamh
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
Natalino Busa
 
Open Source Tools for Big Data
Open Source Tools for Big DataOpen Source Tools for Big Data
Open Source Tools for Big Data
Teemu Heikkilä
 
Wikidata
WikidataWikidata
Wikidata
Anja Jentzsch
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Kristof Jozsa
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
AmpoolIO
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
Multisoft Virtual Academy
 
Big data frameworks
Big data frameworksBig data frameworks
Big data frameworks
Cuelogic Technologies Pvt. Ltd.
 
introduction to big data frameworks
introduction to big data frameworksintroduction to big data frameworks
introduction to big data frameworks
Amal Targhi
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Joey Li
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Tyrone Systems
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
nandhiniarumugam619
 
Big Data Tutorial V4
Big Data Tutorial V4Big Data Tutorial V4
Big Data Tutorial V4
Marko Grobelnik
 
Democratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryDemocratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data Discovery
Mark Grover
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data Portals
Peter Haase
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Simplilearn
 
Database Management Myths & Reality for the future
Database Management Myths & Reality for the futureDatabase Management Myths & Reality for the future
Database Management Myths & Reality for the future
A B M Moniruzzaman
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
C. Scyphers
 

What's hot (20)

Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
Open Source Tools for Big Data
Open Source Tools for Big DataOpen Source Tools for Big Data
Open Source Tools for Big Data
 
Wikidata
WikidataWikidata
Wikidata
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
 
Big data frameworks
Big data frameworksBig data frameworks
Big data frameworks
 
introduction to big data frameworks
introduction to big data frameworksintroduction to big data frameworks
introduction to big data frameworks
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Big Data Tutorial V4
Big Data Tutorial V4Big Data Tutorial V4
Big Data Tutorial V4
 
Democratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryDemocratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data Discovery
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data Portals
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
Database Management Myths & Reality for the future
Database Management Myths & Reality for the futureDatabase Management Myths & Reality for the future
Database Management Myths & Reality for the future
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 

Similar to From Big Data to Fast Data

Big_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundBig_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic background
NidhiAhuja30
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
6535ANURAGANURAG
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
neeraj rathore
 
Hadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSHadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFS
praveen bhat
 
Big data(1st presentation)
Big data(1st presentation)Big data(1st presentation)
Big data(1st presentation)
Takrim Ul Islam Laskar
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
giuseppe_futia
 
Big data – a brief overview
Big data – a brief overviewBig data – a brief overview
Big data – a brief overviewDorai Thodla
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
Febiyan Rachman
 
Dba to data scientist -Satyendra
Dba to data scientist -SatyendraDba to data scientist -Satyendra
Dba to data scientist -Satyendra
pasalapudi123
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Slim Baltagi
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
BigData_Europe
 
Data analytics
Data analyticsData analytics
Data analytics
owaiz shaikh
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
Jayant Mukherjee
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
Dr.K.Sreenivas Rao
 
BigData
BigDataBigData
BigData
Viveka Sharma
 

Similar to From Big Data to Fast Data (20)

Big_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundBig_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic background
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
 
Hadoop Eco system
Hadoop Eco systemHadoop Eco system
Hadoop Eco system
 
Hadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSHadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFS
 
Big data(1st presentation)
Big data(1st presentation)Big data(1st presentation)
Big data(1st presentation)
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
 
Big data – a brief overview
Big data – a brief overviewBig data – a brief overview
Big data – a brief overview
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Dba to data scientist -Satyendra
Dba to data scientist -SatyendraDba to data scientist -Satyendra
Dba to data scientist -Satyendra
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Big data
Big dataBig data
Big data
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
Data analytics
Data analyticsData analytics
Data analytics
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
BigData
BigDataBigData
BigData
 

Recently uploaded

一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 

Recently uploaded (20)

一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 

From Big Data to Fast Data

  • 1. From Big Data to Fast Data Sina Sheikholeslami s.sheikholeslami@digikala.com 9th Amirkabir Linux Festival May 4 2017
  • 2. Overview • The War on Big Data Definition • The Early Days • State-of-the-art Big Data Processing Platforms • The Rise of Fast Data: Applications & Platforms • How to Get Involved
  • 3. Part 1: 
 The War on Big Data Definition
  • 4. What is Big Data? • “Big Data… everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…”
 - Dan Ariely 4
  • 5. What is Big Data? (Cont’d) • Big Data refers to extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.
 - Oxford English Dictionary (Since 2013) 5
  • 6. What is Big Data? (Cont’d) • Big Data is high-volume, high-velocity and/or high-variety information assets that demand cost- effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation. 
 - Gartner IT Glossary 6
  • 7. What is Big Data? (Cont’d) • Big Data consists of extensive datasets - primarily in the characteristics of volume, variety, velocity, and/or variability - that require a scalable architecture for efficient storage, manipulation, and analysis.
 - U.S. National Institute of Standards & Technology 7
  • 8. What is Big Data? (Cont’d) - UC Berkeley Datascience Survey, September 2014 8
  • 9. Part 2: 
 The Early Days
  • 10. The Google File System 10 In SOSP’03, Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung published the paper on GFS.
 Google developed GFS to provide efficient, reliable access to data using large clusters of commodity hardware.
  • 11. Bringing the Computation Near Data: MapReduce 11 Jeffrey Dean & Sanjay Ghemawat published the MapReduce paper in OSDI’04.
 It has been cited more than 20000 times since then.
  • 12. Some say we can divide the human race in two:
 Those who have never heard of the “Word Count” example,
 and those who… well, let’s just say, don’t like it.
  • 13. The Word Count Example https://wikis.nyu.edu/display/NYUHPC/
  • 14. The Yellow Elephant 14 Based on GFS & MapReduce papers, the guys at Yahoo! developed an open- source platform for distributed storage and processing of big datasets.
 Called “Apache Nutch” in its early days, the first release of Apache Hadoop happened in January 2006.
  • 15. The Hadoop Ecosystem • Hadoop Common: The common utilities that support the other Hadoop modules. • Hadoop Distributed File System (HDFS): A distributed file system that provides high-throughput access to application data. • Hadoop YARN: A framework for job scheduling and cluster resource management. • Hadoop MapReduce: A YARN- based system for parallel processing of large data sets. 15
  • 16. “Data” Got Bigger… 16 NumberofInternetUsers(Millions) 05001,0001,5002,000 December, 1995 December, 1999 March, 2001 July, 2002 October, 2003 October, 2004 September, 2005 June, 2006 June, 2007 June, 2008 June, 2009 internetworldstats.com
  • 17. And Bigger… 17 “There were 5 exabytes of information created by the entire world between the dawn of civilization and 2003. Now that same amount is created every two days.”
 Eric Schmidt (then CEO of Google),
 at the Techonomy Conference in Lake Tahoe, California, August 2010
  • 18. Part 3: 
 State-of-the-art
 Big Data Processing Platforms
  • 19. A Classic Batch Processing Architecture 19 Dean Wampler, “Fast Data Architectures For Streaming Applications”
  • 20. The Big Data Stack 20 Courtesy of Amir H. Payberah, “Data Intensive Computing Platforms”
  • 21. The Big Data Stack
 Resource Management Layer 21 Courtesy of Amir H. Payberah, “Data Intensive Computing Platforms”
  • 22. The Big Data Stack
 Storage Layer 22 Courtesy of Amir H. Payberah, “Data Intensive Computing Platforms”
  • 23. The Big Data Stack
 Data Processing Layer 23 Courtesy of Amir H. Payberah, “Data Intensive Computing Platforms”
  • 24. Apache Spark • In-Memory Distributed Processing Platform • Similar Semantics for Batch & Stream Processing • Initially started by Matei Zaharia at UC Berkeley’s AMPLab in 2009 • Became a top-level Apache Project in February 2014 • 11935 Forks, 1068 Contributors • Written primarily in Scala, more than 1M lines of code 24
  • 25. Spark vs. Hadoop MapReduce 25 Courtesy of Amir H. Payberah, “Data Intensive Computing Platforms”
  • 27. The Bigger Picture 27 BDAS, the Berkeley Data Analytics Stack
  • 28. Apache Flink • “open-source stream processing framework for distributed, high- performing, always-available, and accurate data streaming applications” • Data is processed an event-at-a-time rather than as a series of batches • Originally named “Stratosphere”, started in 2010 with funding from DFG • Became a top-level Apache Project in December 2014 • 1598 Forks, 309 Contributors • Written primarily in Java, more than 1M lines of code 28
  • 30. Part 4: 
 The Rise of Fast Data Applications & Platforms
  • 31. They Don’t Wait For It 31
  • 32. We Can’t Wait For It 32
  • 33. They Won’t Wait For It 33
  • 34. They Shouldn’t Wait For It 34 cabotsolutions.com
  • 35. My Boss Won’t Wait For It 35
  • 36. Fast Data: A Definition “Fast data is the application of big data analytics to smaller data sets in near-real or real-time in order to solve a problem or create business value.”
 - TechTarget 36
  • 37. Looking Back at a Classic Batch Processing Architecture 37 Dean Wampler, “Fast Data Architectures For Streaming Applications”
  • 38. “Fast Data” Processing Architecture 38 Dean Wampler, “Fast Data Architectures For Streaming Applications”
  • 39. How to Get Involved 39
  • 41. And to Wrap it Up… • Big Data History & Platforms • Big Data vs. Fast Data • Fast Data Architectures & Platforms • Getting Involved 41
  • 42. Attribution • Thanks to Alekksall, Ddraw, Ibrandify,Yurlick,
 and Makyzz of freepik.com, for the free pics! • Thanks to the awesome people at The Apache Foundation. For Everything. Including the graphics. 42