SlideShare a Scribd company logo
RHadoop and MapR
The bad old days (i.e. now) Hadoop is a silo HDFS isn’t a normal file system Hadoop doesn’t really like C++ R is limited One machine, one memory space Isn’t there any way we can just get along?
The white knight MapR changes things Lots of new stuff like snapshots, NFS All you need to know, you already know NFS provides cluster wide file access Everything works the way you expect Performance high enough to use as a message bus
Example, out-of-core SVD SVD provides compressed matrix form Based on sum of rank-1 matrices ≈ + +  ? ± ±
More on SVD SVD provides a very nice basis
And a nifty approximation property
Also known as … Latent Semantic Indexing PCA Eigenvectors
An application, approximate translation Translation distributes over concatenation But counting turns concatenation into addition This means that translation is linear! ish
ish
Traditional computation Products of A are dominated by large singular values and corresponding vectors Subtracting these dominate singular values allows the next ones to appear Lanczos method, generally Krylov sub-space
But …
The gotcha Iteration in Hadoop is death Huge process invocation costs Lose all memory residency of data Total lost cause
Randomness to the rescue To save the day, run all iterations at the same time = = A
In R lsa = function(a, k, p) {   n = dim(a)[1]   m = dim(a)[2]   y = a %*% matrix(rnorm(m*(k+p)), nrow=m) y.qr = qr(y)   b = t(qr.Q(y.qr)) %*% a b.qr = qr(t(b)) svd = svd(t(qr.R(b.qr)))   list(u=qr.Q(y.qr) %*% svd$u[,1:k],          d=svd$d[1:k],          v=qr.Q(b.qr) %*% svd$v[,1:k]) }
Not good enough yet Limited to memory size After memory limits, feature extraction dominates
Hybrid architecture Map-reduce Side-data Via NFS Feature extraction and down sampling I n p u t Data join Sequential SVD
Hybrid architecture Map-reduce Side-data Via NFS Feature extraction and down sampling I n p u t Data join R Visualization Sequential SVD
Randomness to the rescue To save the day again, use blocks = = =
Hybrid architecture Map-reduce Feature extraction and down sampling Via NFS Map-reduce R Visualization Block-wise parallel SVD
Conclusions Inter-operability allows massively scalability Prototyping in R not wasted Map-reduce iteration not needed for SVD Feasible scale ~10^9 non-zeros or more

More Related Content

What's hot

LIDAR-derived DTM for archaeology and landscape history research some recent ...
LIDAR-derived DTM for archaeology and landscape history research some recent ...LIDAR-derived DTM for archaeology and landscape history research some recent ...
LIDAR-derived DTM for archaeology and landscape history research some recent ...
Shaun Lewis
 
Climate data in r with the raster package
Climate data in r with the raster packageClimate data in r with the raster package
Climate data in r with the raster packageAlberto Labarga
 
Supporting HDF5 in GrADS
Supporting HDF5 in GrADSSupporting HDF5 in GrADS
Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...
Till Blume
 
DCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceDCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant Space
Yasuo Tabei
 
LSH
LSHLSH
Benchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on SparkBenchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on Spark
Xiaoqian Liu
 
Case Study - DR on Demand
Case Study - DR on DemandCase Study - DR on Demand
Case Study - DR on DemandCTRLS
 
General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...
General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...
General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...
Priyanka Aash
 
Spark meets Telemetry
Spark meets TelemetrySpark meets Telemetry
Spark meets Telemetry
Roberto Agostino Vitillo
 
Pain points with M3, some things to address them and how replication works
Pain points with M3, some things to address them and how replication worksPain points with M3, some things to address them and how replication works
Pain points with M3, some things to address them and how replication works
Rob Skillington
 
CS205 Final project
CS205 Final projectCS205 Final project
CS205 Final projectDanny Gibbs
 
CPM2013-tabei201306
CPM2013-tabei201306CPM2013-tabei201306
CPM2013-tabei201306Yasuo Tabei
 
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
Bhaskar Mitra
 
Cloud flare jgc bigo meetup rolling hashes
Cloud flare jgc   bigo meetup rolling hashesCloud flare jgc   bigo meetup rolling hashes
Cloud flare jgc bigo meetup rolling hashesCloudflare
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech Projects
Jody Garnett
 

What's hot (19)

LIDAR-derived DTM for archaeology and landscape history research some recent ...
LIDAR-derived DTM for archaeology and landscape history research some recent ...LIDAR-derived DTM for archaeology and landscape history research some recent ...
LIDAR-derived DTM for archaeology and landscape history research some recent ...
 
Climate data in r with the raster package
Climate data in r with the raster packageClimate data in r with the raster package
Climate data in r with the raster package
 
Supporting HDF5 in GrADS
Supporting HDF5 in GrADSSupporting HDF5 in GrADS
Supporting HDF5 in GrADS
 
Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...
 
DCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceDCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant Space
 
LSH
LSHLSH
LSH
 
Benchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on SparkBenchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on Spark
 
Cluster Drm
Cluster DrmCluster Drm
Cluster Drm
 
Cluster Drm
Cluster DrmCluster Drm
Cluster Drm
 
Case Study - DR on Demand
Case Study - DR on DemandCase Study - DR on Demand
Case Study - DR on Demand
 
General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...
General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...
General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...
 
Spark meets Telemetry
Spark meets TelemetrySpark meets Telemetry
Spark meets Telemetry
 
Pain points with M3, some things to address them and how replication works
Pain points with M3, some things to address them and how replication worksPain points with M3, some things to address them and how replication works
Pain points with M3, some things to address them and how replication works
 
CS205 Final project
CS205 Final projectCS205 Final project
CS205 Final project
 
CPM2013-tabei201306
CPM2013-tabei201306CPM2013-tabei201306
CPM2013-tabei201306
 
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
 
Cloud flare jgc bigo meetup rolling hashes
Cloud flare jgc   bigo meetup rolling hashesCloud flare jgc   bigo meetup rolling hashes
Cloud flare jgc bigo meetup rolling hashes
 
Raster package jacob
Raster package jacobRaster package jacob
Raster package jacob
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech Projects
 

Viewers also liked

Cmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop PerformanceCmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop Performance
Ted Dunning
 
Bda-dunning-2012-12-06
Bda-dunning-2012-12-06Bda-dunning-2012-12-06
Bda-dunning-2012-12-06
Ted Dunning
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
Ted Dunning
 
Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012
Ted Dunning
 
Mahout and Recommendations
Mahout and RecommendationsMahout and Recommendations
Mahout and Recommendations
Ted Dunning
 
Cmu-2011-09.pptx
Cmu-2011-09.pptxCmu-2011-09.pptx
Cmu-2011-09.pptxTed Dunning
 
New Directions for Mahout
New Directions for MahoutNew Directions for Mahout
New Directions for Mahout
Ted Dunning
 
Graham Mossman - SQL and high performance computing on Hadoop
Graham Mossman - SQL and high performance computing on HadoopGraham Mossman - SQL and high performance computing on Hadoop
Graham Mossman - SQL and high performance computing on Hadoop
huguk
 
SQL + Hadoop: The High Performance Advantage�
SQL + Hadoop:  The High Performance Advantage�SQL + Hadoop:  The High Performance Advantage�
SQL + Hadoop: The High Performance Advantage�
Actian Corporation
 

Viewers also liked (9)

Cmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop PerformanceCmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop Performance
 
Bda-dunning-2012-12-06
Bda-dunning-2012-12-06Bda-dunning-2012-12-06
Bda-dunning-2012-12-06
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
 
Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012
 
Mahout and Recommendations
Mahout and RecommendationsMahout and Recommendations
Mahout and Recommendations
 
Cmu-2011-09.pptx
Cmu-2011-09.pptxCmu-2011-09.pptx
Cmu-2011-09.pptx
 
New Directions for Mahout
New Directions for MahoutNew Directions for Mahout
New Directions for Mahout
 
Graham Mossman - SQL and high performance computing on Hadoop
Graham Mossman - SQL and high performance computing on HadoopGraham Mossman - SQL and high performance computing on Hadoop
Graham Mossman - SQL and high performance computing on Hadoop
 
SQL + Hadoop: The High Performance Advantage�
SQL + Hadoop:  The High Performance Advantage�SQL + Hadoop:  The High Performance Advantage�
SQL + Hadoop: The High Performance Advantage�
 

Similar to R user-group-2011-09

Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and R
Radek Maciaszek
 
Spark 计算模型
Spark 计算模型Spark 计算模型
Spark 计算模型
wang xing
 
MapReduce and NoSQL
MapReduce and NoSQLMapReduce and NoSQL
MapReduce and NoSQL
Aaron Cordova
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraSomnath Mazumdar
 
The Powerful Marriage of Hadoop and R (David Champagne)
The Powerful Marriage of Hadoop and R (David Champagne)The Powerful Marriage of Hadoop and R (David Champagne)
The Powerful Marriage of Hadoop and R (David Champagne)
Revolution Analytics
 
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Ganesan Narayanasamy
 
End-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache SparkEnd-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache SparkDatabricks
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part II
Arjen de Vries
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
rhatr
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
scottcrespo
 
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Cloudera, Inc.
 
Ruby on Big Data (Cassandra + Hadoop)
Ruby on Big Data (Cassandra + Hadoop)Ruby on Big Data (Cassandra + Hadoop)
Ruby on Big Data (Cassandra + Hadoop)
Brian O'Neill
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
Gabriela Agustini
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
CheeWeiTan10
 
dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...
Bikash Chandra Karmokar
 
Spark training-in-bangalore
Spark training-in-bangaloreSpark training-in-bangalore
Spark training-in-bangalore
Kelly Technologies
 
Llnl talk
Llnl talkLlnl talk
Llnl talk
Ted Dunning
 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014
cdmaxime
 

Similar to R user-group-2011-09 (20)

Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and R
 
Spark 计算模型
Spark 计算模型Spark 计算模型
Spark 计算模型
 
SparkNotes
SparkNotesSparkNotes
SparkNotes
 
Zaharia spark-scala-days-2012
Zaharia spark-scala-days-2012Zaharia spark-scala-days-2012
Zaharia spark-scala-days-2012
 
MapReduce and NoSQL
MapReduce and NoSQLMapReduce and NoSQL
MapReduce and NoSQL
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
 
The Powerful Marriage of Hadoop and R (David Champagne)
The Powerful Marriage of Hadoop and R (David Champagne)The Powerful Marriage of Hadoop and R (David Champagne)
The Powerful Marriage of Hadoop and R (David Champagne)
 
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
 
End-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache SparkEnd-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache Spark
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part II
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
 
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
 
Ruby on Big Data (Cassandra + Hadoop)
Ruby on Big Data (Cassandra + Hadoop)Ruby on Big Data (Cassandra + Hadoop)
Ruby on Big Data (Cassandra + Hadoop)
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
 
dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...
 
Spark training-in-bangalore
Spark training-in-bangaloreSpark training-in-bangalore
Spark training-in-bangalore
 
Llnl talk
Llnl talkLlnl talk
Llnl talk
 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014
 

More from Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
Ted Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
Ted Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
Ted Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
Ted Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
Ted Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
Ted Dunning
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
Ted Dunning
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
Ted Dunning
 
T digest-update
T digest-updateT digest-update
T digest-update
Ted Dunning
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
Ted Dunning
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
Ted Dunning
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
Ted Dunning
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
Ted Dunning
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
Ted Dunning
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Ted Dunning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
Ted Dunning
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
Ted Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
Ted Dunning
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
Ted Dunning
 

More from Ted Dunning (20)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 

Recently uploaded

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 

R user-group-2011-09

  • 2. The bad old days (i.e. now) Hadoop is a silo HDFS isn’t a normal file system Hadoop doesn’t really like C++ R is limited One machine, one memory space Isn’t there any way we can just get along?
  • 3. The white knight MapR changes things Lots of new stuff like snapshots, NFS All you need to know, you already know NFS provides cluster wide file access Everything works the way you expect Performance high enough to use as a message bus
  • 4. Example, out-of-core SVD SVD provides compressed matrix form Based on sum of rank-1 matrices ≈ + + ? ± ±
  • 5. More on SVD SVD provides a very nice basis
  • 6. And a nifty approximation property
  • 7. Also known as … Latent Semantic Indexing PCA Eigenvectors
  • 8. An application, approximate translation Translation distributes over concatenation But counting turns concatenation into addition This means that translation is linear! ish
  • 9. ish
  • 10. Traditional computation Products of A are dominated by large singular values and corresponding vectors Subtracting these dominate singular values allows the next ones to appear Lanczos method, generally Krylov sub-space
  • 12. The gotcha Iteration in Hadoop is death Huge process invocation costs Lose all memory residency of data Total lost cause
  • 13. Randomness to the rescue To save the day, run all iterations at the same time = = A
  • 14. In R lsa = function(a, k, p) { n = dim(a)[1] m = dim(a)[2] y = a %*% matrix(rnorm(m*(k+p)), nrow=m) y.qr = qr(y) b = t(qr.Q(y.qr)) %*% a b.qr = qr(t(b)) svd = svd(t(qr.R(b.qr))) list(u=qr.Q(y.qr) %*% svd$u[,1:k], d=svd$d[1:k], v=qr.Q(b.qr) %*% svd$v[,1:k]) }
  • 15. Not good enough yet Limited to memory size After memory limits, feature extraction dominates
  • 16. Hybrid architecture Map-reduce Side-data Via NFS Feature extraction and down sampling I n p u t Data join Sequential SVD
  • 17. Hybrid architecture Map-reduce Side-data Via NFS Feature extraction and down sampling I n p u t Data join R Visualization Sequential SVD
  • 18. Randomness to the rescue To save the day again, use blocks = = =
  • 19. Hybrid architecture Map-reduce Feature extraction and down sampling Via NFS Map-reduce R Visualization Block-wise parallel SVD
  • 20. Conclusions Inter-operability allows massively scalability Prototyping in R not wasted Map-reduce iteration not needed for SVD Feasible scale ~10^9 non-zeros or more