Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Performance of Spark vs MapReduce

2,245 views

Published on

Performance of Spark vs MapReduce

Published in: Technology
  • Be the first to comment

Performance of Spark vs MapReduce

  1. 1. www.edureka.co/apache-spark-scala-training Performance of Spark vs MapReduce
  2. 2. www.edureka.co/apache-spark-scala-training What will you learn today ?  Beyond Hadoop MapReduce  How Spark is better than MapReduce?  Benchmark : Spark vs MapReduce  Hands-On : Analyzing data with Spark
  3. 3. www.edureka.co/apache-spark-scala-training Word Count Problem - MapReduce MapReduce Code for a Simple Word Count Problem
  4. 4. www.edureka.co/apache-spark-scala-training Apache Spark Apache Spark is a general purpose data processing engine with in-memory computing Spark provides API for Scala, Java, Python and R which makes Spark widely adopted for data processing
  5. 5. www.edureka.co/apache-spark-scala-training How Spark fits into Hadoop Ecosystem ? Spark is intended to enhance, not replace, the Hadoop stack Spark is designed to read and write data to HDFS as well as other storage systems such as CSV files, Amazon S3 and NoSQL databases
  6. 6. www.edureka.co/apache-spark-scala-training Word Count Problem - Spark Spark Scala Code for Word Count Problem Spark Python Code for Word Count Problem Clearly processing data with Spark is much easier than MapReduce and Spark gives you the flexibility to choose your favorite language Scala, Java, Python etc.
  7. 7. www.edureka.co/apache-spark-scala-training Why Spark for Big Data Analytics ? What makes Spark suitable for Big Data Analytics ?
  8. 8. www.edureka.co/apache-spark-scala-training Why Spark for Big Data Analytics ? Following features make Spark, the best fit for Big Data Analytics :  Spark simplifies data analysis  Spark provides built-in libraries to do advanced analytics  Spark speaks more than one language  Spark provides faster results  Spark allows you to use different Hadoop vendors
  9. 9. www.edureka.co/apache-spark-scala-training Benchmark : Spark is Blazingly Fast
  10. 10. www.edureka.co/apache-spark-scala-training Isn’t Spark In-Memory Only But I have heard Spark is good for only in-memory processing?
  11. 11. www.edureka.co/apache-spark-scala-training Spark : Best of both Worlds It’s a common misconception Spark is only for in-memory processing. From its inception Spark was designed to be a general execution engine that works both in-memory and on- disk. Almost all Spark operators perform external operations when data does not fit in memory
  12. 12. www.edureka.co/apache-spark-scala-training Spark Libraries  Spark SQL : Spark’s module for working with structured data  MLlib : Spark’s machine learning library  GraphX : Spark’s API for graph computation  Spark Streaming : Spark’s API to process streaming data
  13. 13. www.edureka.co/apache-spark-scala-training Spark in one Snapshot
  14. 14. www.edureka.co/apache-spark-scala-training Spark Use Cases Different companies are using Spark for solving various problems e.g. recommendation systems, business intelligence, fraud detection etc.
  15. 15. www.edureka.co/apache-spark-scala-training Who is using Spark? A complete list of companies using Spark can be found here : https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark
  16. 16. www.edureka.co/apache-spark-scala-training Spark is here to stay Spark is not one of those "here today, gone tomorrow". Spark is here to stay for the foreseeable future, and it is well worth to get your teeth into it in order to get value out of your data
  17. 17. www.edureka.co/apache-spark-scala-training Hands-on Analyzing data with Spark
  18. 18. www.edureka.co/apache-spark-scala-training References IBM backs Apache Spark for Big Data Analytics : http://www.forbes.com/sites/paulmiller/2015/06/15/ibm-backs-apache-spark-for-big-data-analytics/ Why Cloudera is saying 'Goodbye, MapReduce' and 'Hello, Spark' : http://fortune.com/2015/09/09/cloudera-spark-mapreduce/ 5 reasons to turn to Spark for Big Data Analytics : http://www.infoworld.com/article/2897287/big-data/5-reasons-to-turn-to-spark-for-big-data-analytics.html
  19. 19. www.edureka.co/apache-spark-scala-training References Spark new record for large scale sorting : https://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html How eBay uses Spark to ignite Data Analytics : http://www.ebaytechblog.com/2014/05/28/using-spark-to-ignite-data-analytics/ Spark is fast on disk too : https://gigaom.com/2014/10/10/databricks-demolishes-big-data-benchmark-to-prove-spark-is-fast-on-disk-too/
  20. 20. www.edureka.co/apache-spark-scala-training Survey Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better! Please spare few minutes to take the survey after the webinar.
  21. 21. www.edureka.co/apache-spark-scala-training Thank You … Questions/Queries/Feedback Recording and presentation will be made available to you within 24 hours

×