SlideShare a Scribd company logo
1 of 11
Redpoll  ,[object Object],Min Zhou (coderplay@gmail.com) CS Dept. Jinan University, Guangzhou
Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Basic Principles ... Decomposition Mappers Reducer Assume that we have a set of  m  data points each of length  n
Performance Bottlenecks ,[object Object],[object Object],[object Object],[object Object]
Current Works ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
An example: Canopy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
An example: Canopy cont'd ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What's the Next? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Algorithms under plan ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Welcome to join us! ,[object Object],[object Object],[object Object],[object Object],[object Object]
http://code.google.com/p/redpoll Check it out!

More Related Content

What's hot

Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Kenta Oono
 
The convergence of HPC and BigData: What does it mean for HPC sysadmins?
The convergence of HPC and BigData: What does it mean for HPC sysadmins?The convergence of HPC and BigData: What does it mean for HPC sysadmins?
The convergence of HPC and BigData: What does it mean for HPC sysadmins?inside-BigData.com
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pRobert Grossman
 
Deep Learning on Apache Spark
Deep Learning on Apache SparkDeep Learning on Apache Spark
Deep Learning on Apache SparkDash Desai
 
Использование стека Hadoop для построения сервиса сверки данных НДС
Использование стека Hadoop для построения сервиса сверки данных НДСИспользование стека Hadoop для построения сервиса сверки данных НДС
Использование стека Hadoop для построения сервиса сверки данных НДСCEE-SEC(R)
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonRalf Gommers
 
Kim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup
Kim Hammar - Distributed Deep Learning - RISE Learning Machines MeetupKim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup
Kim Hammar - Distributed Deep Learning - RISE Learning Machines MeetupKim Hammar
 
If the Data Cannot Come To The Algorithm...
If the Data Cannot Come To The Algorithm...If the Data Cannot Come To The Algorithm...
If the Data Cannot Come To The Algorithm...Robert Burrell Donkin
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3Robert Grossman
 
Pycon 2016-open-space
Pycon 2016-open-spacePycon 2016-open-space
Pycon 2016-open-spaceChetan Khatri
 
NumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS ForumNumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS ForumRalf Gommers
 
rasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubesrasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubesEUDAT
 
Clustbigfim frequent itemset mining of
Clustbigfim frequent itemset mining ofClustbigfim frequent itemset mining of
Clustbigfim frequent itemset mining ofijfcstjournal
 
Taste Java In The Clouds
Taste Java In The CloudsTaste Java In The Clouds
Taste Java In The CloudsJacky Chu
 
Visualization-Driven Data Aggregation
Visualization-Driven Data AggregationVisualization-Driven Data Aggregation
Visualization-Driven Data AggregationZbigniew Jerzak
 
DSD-INT 2018 Earth Science Through Datacubes - Merticariu
DSD-INT 2018 Earth Science Through Datacubes - MerticariuDSD-INT 2018 Earth Science Through Datacubes - Merticariu
DSD-INT 2018 Earth Science Through Datacubes - MerticariuDeltares
 

What's hot (19)

Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
 
The convergence of HPC and BigData: What does it mean for HPC sysadmins?
The convergence of HPC and BigData: What does it mean for HPC sysadmins?The convergence of HPC and BigData: What does it mean for HPC sysadmins?
The convergence of HPC and BigData: What does it mean for HPC sysadmins?
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9p
 
Deep Learning on Apache Spark
Deep Learning on Apache SparkDeep Learning on Apache Spark
Deep Learning on Apache Spark
 
Использование стека Hadoop для построения сервиса сверки данных НДС
Использование стека Hadoop для построения сервиса сверки данных НДСИспользование стека Hadoop для построения сервиса сверки данных НДС
Использование стека Hadoop для построения сервиса сверки данных НДС
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
Kim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup
Kim Hammar - Distributed Deep Learning - RISE Learning Machines MeetupKim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup
Kim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup
 
If the Data Cannot Come To The Algorithm...
If the Data Cannot Come To The Algorithm...If the Data Cannot Come To The Algorithm...
If the Data Cannot Come To The Algorithm...
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3
 
Pycon 2016-open-space
Pycon 2016-open-spacePycon 2016-open-space
Pycon 2016-open-space
 
NumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS ForumNumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS Forum
 
rasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubesrasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubes
 
Clustbigfim frequent itemset mining of
Clustbigfim frequent itemset mining ofClustbigfim frequent itemset mining of
Clustbigfim frequent itemset mining of
 
CCI DAY PRESENTATION
CCI DAY PRESENTATIONCCI DAY PRESENTATION
CCI DAY PRESENTATION
 
Taste Java In The Clouds
Taste Java In The CloudsTaste Java In The Clouds
Taste Java In The Clouds
 
Visualization-Driven Data Aggregation
Visualization-Driven Data AggregationVisualization-Driven Data Aggregation
Visualization-Driven Data Aggregation
 
Big dataanalyticsbeyondhadoop public_20_june_2013
Big dataanalyticsbeyondhadoop public_20_june_2013Big dataanalyticsbeyondhadoop public_20_june_2013
Big dataanalyticsbeyondhadoop public_20_june_2013
 
DSD-INT 2018 Earth Science Through Datacubes - Merticariu
DSD-INT 2018 Earth Science Through Datacubes - MerticariuDSD-INT 2018 Earth Science Through Datacubes - Merticariu
DSD-INT 2018 Earth Science Through Datacubes - Merticariu
 
Comp
CompComp
Comp
 

Similar to Redpoll

Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsGeoffrey Fox
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsGeoffrey Fox
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsDataWorks Summit
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everythingLew Tucker
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on HadoopPaco Nathan
 
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKMACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKAbhi Jit
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
 
Update on the Mont-Blanc Project for ARM-based HPC
Update on the Mont-Blanc Project for ARM-based HPCUpdate on the Mont-Blanc Project for ARM-based HPC
Update on the Mont-Blanc Project for ARM-based HPCinside-BigData.com
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingJen Aman
 
Hadoop performance modeling for job
Hadoop performance modeling for jobHadoop performance modeling for job
Hadoop performance modeling for jobranjith kumar
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectMatthew Gerring
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22marpierc
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersDataWorks Summit/Hadoop Summit
 
Scientific
Scientific Scientific
Scientific marpierc
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning
 

Similar to Redpoll (20)

Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everything
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on Hadoop
 
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKMACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
 
Ajug april 2011
Ajug april 2011Ajug april 2011
Ajug april 2011
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Update on the Mont-Blanc Project for ARM-based HPC
Update on the Mont-Blanc Project for ARM-based HPCUpdate on the Mont-Blanc Project for ARM-based HPC
Update on the Mont-Blanc Project for ARM-based HPC
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark Streaming
 
Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Hadoop performance modeling for job
Hadoop performance modeling for jobHadoop performance modeling for job
Hadoop performance modeling for job
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science Project
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
Scientific
Scientific Scientific
Scientific
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use Case
 

More from Min Zhou

Distributed Data Analytics at Taobao
Distributed Data Analytics at TaobaoDistributed Data Analytics at Taobao
Distributed Data Analytics at TaobaoMin Zhou
 
Big Data Analytics Infrastructure
Big Data Analytics InfrastructureBig Data Analytics Infrastructure
Big Data Analytics InfrastructureMin Zhou
 
Java Concurrent Optimization: Concurrent Queue
Java Concurrent Optimization: Concurrent QueueJava Concurrent Optimization: Concurrent Queue
Java Concurrent Optimization: Concurrent QueueMin Zhou
 
准实时海量数据分析系统架构探究
准实时海量数据分析系统架构探究准实时海量数据分析系统架构探究
准实时海量数据分析系统架构探究Min Zhou
 
Java trouble shooting
Java trouble shootingJava trouble shooting
Java trouble shootingMin Zhou
 
Java程序员也需要了解CPU
Java程序员也需要了解CPUJava程序员也需要了解CPU
Java程序员也需要了解CPUMin Zhou
 
淘宝Hadoop数据分析实践
淘宝Hadoop数据分析实践淘宝Hadoop数据分析实践
淘宝Hadoop数据分析实践Min Zhou
 
Anthill: A Distributed DBMS Based On MapReduce
Anthill: A Distributed DBMS Based On MapReduceAnthill: A Distributed DBMS Based On MapReduce
Anthill: A Distributed DBMS Based On MapReduceMin Zhou
 

More from Min Zhou (9)

Distributed Data Analytics at Taobao
Distributed Data Analytics at TaobaoDistributed Data Analytics at Taobao
Distributed Data Analytics at Taobao
 
Big Data Analytics Infrastructure
Big Data Analytics InfrastructureBig Data Analytics Infrastructure
Big Data Analytics Infrastructure
 
Java Concurrent Optimization: Concurrent Queue
Java Concurrent Optimization: Concurrent QueueJava Concurrent Optimization: Concurrent Queue
Java Concurrent Optimization: Concurrent Queue
 
准实时海量数据分析系统架构探究
准实时海量数据分析系统架构探究准实时海量数据分析系统架构探究
准实时海量数据分析系统架构探究
 
Java trouble shooting
Java trouble shootingJava trouble shooting
Java trouble shooting
 
Hive
HiveHive
Hive
 
Java程序员也需要了解CPU
Java程序员也需要了解CPUJava程序员也需要了解CPU
Java程序员也需要了解CPU
 
淘宝Hadoop数据分析实践
淘宝Hadoop数据分析实践淘宝Hadoop数据分析实践
淘宝Hadoop数据分析实践
 
Anthill: A Distributed DBMS Based On MapReduce
Anthill: A Distributed DBMS Based On MapReduceAnthill: A Distributed DBMS Based On MapReduce
Anthill: A Distributed DBMS Based On MapReduce
 

Redpoll