Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

"Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

398 views

Published on

An invited talk I gave at the "Zurich Spark Meetup" in July 2016. Talking about the Apache Spark trends I observed at Spark Summit 2016, as well as some personal insights.

Published in: Data & Analytics
  • Be the first to comment

"Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

  1. 1. Spark Summit 2016
  2. 2. Who am I?
  3. 3. Looking for a Machine Learning Summer Intern! bit.ly/nzzml
  4. 4. Spark Summit 2016
  5. 5. US / Europe & open to the world
  6. 6. Trend #1: Spark 2.0
  7. 7. Trend #1: Spark 2.0
  8. 8. Trend #2: RDD’s, DF’s, DS’s
  9. 9. RDD’s, DF’s, DS’s ... Why?
  10. 10. RDD’s, DF’s, DS’s ... Why?
  11. 11. RDD’s, DF’s, DS’s ... Why?
  12. 12. RDD’s, DF’s, DS’s ... Why? +
  13. 13. Prefer DF’s & DS’s over RDD’s!
  14. 14. RDD’s, DF’s, DS’s ... Why?
  15. 15. Demo ...
  16. 16. Trend #3: Streaming 2.0
  17. 17. “The simplest way to do streaming analytics, is when you don’t have to worry about streaming.”
  18. 18. Streaming 2.0
  19. 19. Streaming 2.0
  20. 20. Demo ...
  21. 21. Streaming 2.0 val StructuredStream = sqlContext.read.format(“json”).stream (src_path) StructuredStream. select($"constant_Value"). groupBy($"constant_Value"). count. write. format("parquet"). save("/tmp/out/value.parquet"). startStream()
  22. 22. Trend #4: GraphFrames
  23. 23. Trend #4: GraphFrames
  24. 24. Trend #4: GraphFrames http://graphframes.github.io/
  25. 25. Demo ...
  26. 26. Trend #5: SparkR is catching up
  27. 27. Trend #5: SparkR is catching up
  28. 28. Trend #6: Deep-Learning
  29. 29. DNNs are coming: Watch it closely!
  30. 30. Insight #1: Big Players ...
  31. 31. … big community
  32. 32. Insight #2: Same issues everywhere ...
  33. 33. The user- mailinglist is your best friend!
  34. 34. Insight #3: Stream, Compute, Dump
  35. 35. Use Spark (streaming) what it’s meant for: realtime computation, not serving!
  36. 36. Insight #4: 4 Best practices
  37. 37. GroupByKey! GroupByKey?
  38. 38. Circumvent Skew by “Salting” Key: Foo Salted Key: Foo + random(1,saltDim)
  39. 39. Think about resource allocation! --num-executors --executor-cores --executor-memory ? !
  40. 40. You know, window functions ... first value, last value, rank,
  41. 41. Looking for a Machine Learning Summer Intern! bit.ly/nzzml Checkout TechTuesday! meetup.com/ Tech-Tuesday-Zurich

×