Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Spark Briefing

3,713 views

Published on

Apache Spark is rapidly emerging as the prime platform for advanced analytics in Hadoop. This briefing is updated to reflect news and announcements as of July 2014.

Published in: Technology, Education
  • Follow the link, new dating source: ♥♥♥ http://bit.ly/2F4cEJi ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ♥♥♥ http://bit.ly/2F4cEJi ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Apache Spark Briefing

  1. 1. Apache Spark The Emerging Platform for Distributed Analytics July 2014 Thomas W. Dinsmore
  2. 2. What is Apache Spark? • Distributed in-memory analytics engine • Runs in standalone clusters or Hadoop • Fully compatible with Hadoop storage APIs • Runs under YARN • Top-level Apache project • Supported in all major Hadoop distros • Open source and vendor neutral Thomas W. Dinsmore
  3. 3. SAP Support Spark Timeline + + + + +2009 2010 2011 2012 2013 2014 ++ Project begins Open sourced Spark Summit 2013 Spark Summit 2013 Apache Incubator Apache Top-Level Cloudera Support MapR Support Horton Support Thomas W. Dinsmore News cascade starting late last year.
  4. 4. What problems does Spark solve?
  5. 5. Problem #1: MapReduce I/O sandbags runtime for advanced analytics. Compute Store Must persist results after each pass through data Advanced analytics often requires multiple passes through data Hadoop Storage Hadoop Storage Thomas W. Dinsmore
  6. 6. Spark Vision: Distributed in-memory platform Compute Intermediate results stay in memory. 100X performance improvement for iterative algorithms. Compute Compute Compute Hadoop Storage Thomas W. Dinsmore
  7. 7. Problem #2: Many “point” solutions for advanced analytics in Hadoop Machine ! LearningQueries Graph ! Analytics Streaming ! Analytics Thomas W. Dinsmore
  8. 8. Spark Vision: single integrated platform for advanced analytics in Hadoop. • Simplified administration • Integrated results. Thomas W. Dinsmore
  9. 9. How important is Spark?
  10. 10. Mike Olson, Cloudera: “The leading candidate for ‘successor to MapReduce’ today is Apache Spark.” Thomas W. Dinsmore
  11. 11. M.C. Srivas, MapR: “We believe Spark on Hadoop is a game changer for any business.” Thomas W. Dinsmore
  12. 12. Ben Lorica, O’Reilly Media: “The number of companies that are using Spark in production has exploded over the last year.” Thomas W. Dinsmore
  13. 13. Apache Spark is the most active project in the Hadoop ecosystem. Source: Cloudera Commits, Past 12 Months 22% Thomas W. Dinsmore
  14. 14. Spark’s Key Capabilities
  15. 15. Spark 1.0 Machine Learning • Linear Regression • Logistic Regression • Linear Support Vector Machine • Regularization • Decision Trees • Naive Bayes • Alternating Least Squares • K-Means Plus-Plus • Singular Value Decomposition • Principal Components Analysis • Stochastic Gradient Descent • L-BFGS Spark project expects to double supported techniques in 1.1 (August 2014). Thomas W. Dinsmore
  16. 16. Spark SQL • Currently most active project • Supports fast interactive queries • Hive-compatible • Works with Hive data • Runs unmodified queries • Roadmap to support more formats • Will absorb Shark project Thomas W. Dinsmore
  17. 17. Spark Streaming • Supports analysis of data streams in real time • Unifies streaming and batch data • Integrates with popular data sources: • HDFS • Flume • Kafka • Twitter • Easy to use • Fault tolerant Thomas W. Dinsmore
  18. 18. Spark Graph Analytics • Currently Alpha release • Unifies graph-parallel and data- parallel computing under single API • Performance parity with Giraph • Replaces Spark Bagel (Pregel on Spark) Thomas W. Dinsmore
  19. 19. Spark Performance Machine Learning • 100x faster than MapReduce Queries (Shark) ! • Comparable to Impala • 100x faster than Hive ! Streaming • 2X throughput of Storm Graph (GraphX) ! • Comparable to Giraph • 10X faster than MapReduce Thomas W. Dinsmore
  20. 20. Spark Distributions Thomas W. Dinsmore Connector Every major Hadoop distribution, plus… Interface to HANABig Data Appliance
  21. 21. Programming Interfaces Supported APIs “Alpha” Release Thomas W. Dinsmore Spark project expects to release production grade R interface early 2015. “SparkR”
  22. 22. Spark Users Thomas W. Dinsmore
  23. 23. Certified on Spark Thomas W. Dinsmore
  24. 24. Who is Databricks? • Commercial venture, incepted 2013 • Founded by Spark principals • Services and support business model • Gatekeepers to Spark • Just landed $33M in Series B • Andreeson, Horowitz • New Enterprise Associates • Just announced Spark Cloud product Thomas W. Dinsmore
  25. 25. Thank You

×