Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Published on
These slides outline the common distributed computing abstractions necessary to implement data science at scale. It starts with a characterization of the computations required to realize common machine learning at scale. Introductions to Hadoop MR, Spark, GraphLab are covered currently. Going forward, we shall update with Flink, Titan and TensorFlow and how to realize machine learning/deep learning algorithms on top of these frameworks as well as trade-offs between these frameworks.
Login to see the comments