Advertisement
Advertisement

More Related Content

Slideshows for you(20)

Similar to Ted Dunning - Keynote: How Can We Take Flink Forward?(20)

Advertisement

More from Flink Forward(20)

Advertisement

Ted Dunning - Keynote: How Can We Take Flink Forward?

  1. © 2014 MapR Technologies 1© 2014 MapR Technologies
  2. © 2014 MapR Technologies 2 Me, Us • Ted Dunning, MapR Chief Application Architect, Apache Member – Committer PMC member Zookeeper, Drill, others – Mentor for Flink, Beam (nee Dataflow), Drill, Storm, Zeppelin – VP Incubator – Bought the beer at the first HUG • MapR – Produces first converged platform for big and fast data – Includes data platform (files, streams, tables) + open source – Adds major technology for performance, HA, industry standard API’s • Contact @ted_dunning, ted.dunning@gmail.com, tdunning@mapr.com
  3. © 2014 MapR Technologies 3 Note: I may need to rely on my laryngitis interpreter
  4. © 2014 MapR Technologies 4 New book on Apache Flink Download free pdf courtesy of MapR Technologies mapr.com/flink-book
  5. © 2014 MapR Technologies 5 What is happening now in computing has only happened a few times before
  6. © 2014 MapR Technologies 6 Businesses are changing to become completely digital
  7. © 2014 MapR Technologies 7 That is causing a complete re-implementation of the software that runs the world
  8. © 2014 MapR Technologies 8 Comparable Events in Software • Accounting invented in Sumeria • Indic numerals (including zero) brought to Europe by Arabs • Banking by letter of credit • Open source data • Electronic automation of business processes • SQL and the relational model • The Internet • ?? Whatever it is that is happening now ??
  9. © 2014 MapR Technologies 9 Early Accounting • Most early writing samples were accounting records • This one is from Crete and records grain inventories • Accounting is a major advance because it allows you to abstract the count of a thing from the thing
  10. © 2014 MapR Technologies 10 Letters of Credit • Used by the knights Templar to record deposits to be protected on crusade • Popularized by the Italian banking system in the Renaissance • Destroyed competing systems that required transfer of silver such as the Hansa
  11. © 2014 MapR Technologies 11 Big data project: Maury’s Wind and Currents charts At first, nobody was interested in them… …until Captain Jackson shaved a month off the run from Baltimore to Rio de Janeiro Then everybody wanted one!
  12. © 2014 MapR Technologies 12 What is it that is happening now ?
  13. © 2014 MapR Technologies 13 There is a revolution going on
  14. © 2014 MapR Technologies 14 Companies get more value from our data than we can get from it ourselves
  15. © 2014 MapR Technologies 15 Symbol Company Cap Rank Market Cap on 2/12/16 on 2/12/16 AAPL Apple 1 521.1 GOOGL Alphabet 2 485.9 MSFT Microsoft 3 399.4 XOM Exxon Mobil 4 336.8 BRK-A Berkshire Hathaway 5 318.7 FB Facebook 6 290.3 JNJ Johnson & Johnson 7 281.7 GE General Electric 8 275.4 WFC Wells Fargo 9 240.9 AMZN Amazon.com 10 238.8 How Much Value?
  16. © 2014 MapR Technologies 16 Symbol Company Cap Rank Market Cap on 2/12/16 on 2/12/16 AAPL Apple 1 521.1 GOOGL Alphabet 2 485.9 MSFT Microsoft 3 399.4 XOM Exxon Mobil 4 336.8 BRK-A Berkshire Hathaway 5 318.7 FB Facebook 6 290.3 JNJ Johnson & Johnson 7 281.7 GE General Electric 8 275.4 WFC Wells Fargo 9 240.9 AMZN Amazon.com 10 238.8 How Much Value?
  17. © 2014 MapR Technologies 17 Symbol Company Cap Rank Market Cap on 2/12/16 on 2/12/16 AAPL Apple 1 521.1 GOOGL Alphabet 2 485.9 MSFT Microsoft 3 399.4 XOM Exxon Mobil 4 336.8 BRK-A Berkshire Hathaway 5 318.7 FB Facebook 6 290.3 JNJ Johnson & Johnson 7 281.7 GE General Electric 8 275.4 WFC Wells Fargo 9 240.9 AMZN Amazon.com 10 238.8 How Much Value?
  18. © 2014 MapR Technologies 18 Data has value in the aggregate and in the moment
  19. © 2014 MapR Technologies 19 But we can’t aggregate it ourselves, nor pass it to each other
  20. © 2014 MapR Technologies 20 But we can’t aggregate it ourselves, nor pass it to each other It’s big
  21. © 2014 MapR Technologies 21 What’s Going On? • Revolution in computing A – Big data just works better • Revolution in computing B – The database is not the core • Change in social structure • Change in computing technology – Big three replatforming events (SQL, Internet, streams) • What does it mean to us?
  22. © 2014 MapR Technologies 22 Revolution A Big is better
  23. © 2014 MapR Technologies 23 More Data Beats Better Algorithms, ish BankoandBrill,2001,ScalingtoVeryVeryLargeCorporafor NaturalLanguageDisambiguation Increasing the data size has a much bigger effect than changing algorithm Does not imply big and stupid is best Big and smart is better
  24. © 2014 MapR Technologies 24 Examples of Big Data Advantage • Credit card fraud detection – Data consortium wins therefore data consortium wins • Speech recognition – Siri and others • Image analysis – Can you identify which of 120 species of dog are in the picture? – Real applications coming – Facebook tagging just the start • Digital marketing – Google’s non-ad
  25. © 2014 MapR Technologies 25 Revolution B How to build big systems
  26. © 2014 MapR Technologies 26 Evolution Beyond Massive Monolithic Systems • In monoliths, complexity of mainframe systems led to specialization – Storage – DB – Systems analysis – Programmers – Operations – Data entry • This made n-tier architectures a natural next step
  27. © 2014 MapR Technologies 27 3-tier Architecture Web tier Middle tier Data tier
  28. © 2014 MapR Technologies 28 3-tier Architecture (essence) Web tier Middle tier Data tier
  29. © 2014 MapR Technologies 29 3-tier, in Practice Web tier Middle tier Data tier Web tier Middle tier Data tier Web tier Middle tier Data tier Web tier Middle tier Data tier
  30. © 2014 MapR Technologies 30 Enter micro-services
  31. © 2014 MapR Technologies 31 RPC layer Logic Disk RPC layer Logic Disk RPC layer Logic Disk Start with Service Partitioning
  32. © 2014 MapR Technologies 32 RPC layer Logic Disk RPC layer Logic Disk RPC layer Logic Disk Start with Service Partitioning
  33. © 2014 MapR Technologies 33 RPC layer Logic Disk RPC layer Logic Disk RPC layer Logic Disk Make Systems Opaque
  34. © 2014 MapR Technologies 34 Give Them a Job, and a Way to Communicate Keep it very light-weight!
  35. © 2014 MapR Technologies 35 This is called micro-services
  36. © 2014 MapR Technologies 36 Results Can Be Stunning • Companies who adopted this style are associated with stunning success – Google, Facebook, Netflix (after DVD mail), Amazon, LinkedIn (v. 2) – And a gazillion less well known companies • Companies that did not are associated with … • Of course, this may just be what happens when you hire smart folk – Correlation, causation, et cetera
  37. © 2014 MapR Technologies 37 But … • Much of the discussion talks about RPC (call/response) services • This fine, but limiting • Key idiom is deferred processing – Do something urgently – Queue message to complete later
  38. © 2014 MapR Technologies 38 Sender Receiver Who Has the Ball? Sender wants to send a message
  39. © 2014 MapR Technologies 39 Sender Receiver Who Has the Ball? But the receiver might be indisposed for the moment
  40. © 2014 MapR Technologies 40 Sender Receiver Who Has the Ball? After sending, the sender may exit
  41. © 2014 MapR Technologies 41 Sender Receiver Who Has the Ball? The receiver has returned, but who has the message?
  42. © 2014 MapR Technologies 42 Sender Receiver Who Has the Ball? The message queue must retain the message
  43. © 2014 MapR Technologies 43 For Message Based Services • We need a persistent queue • The number of messages is plausibly very high – Total number of external requests (x 5-10) – Total number of persistence ops (x 2-3) • Millions of messages, GB/s of traffic quite plausible • Moving this to enterprise from startups adds challenges
  44. © 2014 MapR Technologies 44 Summary • Micro-services requires durable, high-performance message queues • These systems don’t just like durable, high performance queues • These systems require durability. And high performance. • Old school queues need not apply
  45. © 2014 MapR Technologies 45 Streaming data is different
  46. © 2014 MapR Technologies 46 Δt tprovisional Input Output Note that the existence of provisional outputs implies we have to handle provisional inputs as well
  47. © 2014 MapR Technologies 47 More Complications • Our latency isn’t the only story • We don’t get data instantly • So we don’t even start with zero latency • In fact, delay is the key problem in flow-based computing
  48. © 2014 MapR Technologies 48 Thought Problem • What is the temperature everywhere on earth – Right now – This is impossible • What was the temperature everywhere on earth an hour ago? – This is hard • What was the temperature everywhere on earth last month? – This is pretty easy • Does this mean we cannot talk about today’s weather?
  49. © 2014 MapR Technologies 49 The Problem of State • The present temperature of Earth may or may not exist • Only the delayed temperature can matter to a practical computation • But computations in different places will see different delays • (promise me you know that I’m not just talking temperature)
  50. © 2014 MapR Technologies 50 Summary • For important problems, we have to represent distributed computations as messages and flows • This isn’t a matter of convenience • The concept of “now” is either dead or dying
  51. © 2014 MapR Technologies 51 Getting stuff done in the real world
  52. © 2014 MapR Technologies 52 Looking forward
  53. © 2014 MapR Technologies 53 by_sender log-synth sort by time replay explode [2] by_recipient query by sender query by recipient 300k/s 300k/s 3M/s real-time tick by_sender Replica for off-line purposes timemark time timemark time Real-time processing [1]
  54. © 2014 MapR Technologies 54 Looking backwards
  55. © 2014 MapR Technologies 55 mySQL Web-site Auth service Upload service Image extractor Transcoder User profiles Search User action logging Recommendation analysis mySQL mySQL Oracle Solr Elastic mySQL mySQL files Video metadata
  56. © 2014 MapR Technologies 56 mySQL Web-site Auth service Upload service Image extractor Transcoder User profiles Search User action logging Recommendation analysis mySQL mySQL Oracle Solr Elastic mySQL mySQL files Video metadata
  57. © 2014 MapR Technologies 57 Upload service Image extractor Transcoder mySQL mySQL files Video metadata
  58. © 2014 MapR Technologies 58 recodesTranscoder Files Upload service Files thumbs Thumbnail extractor uploads Files video adds Video metadata
  59. © 2014 MapR Technologies 59 Micro-service Diagram Upload service Raw files Thumbnail extractor Transcoder Video metadata Video files uploads thumbs recodes Image files
  60. © 2014 MapR Technologies 60 Real World Implications • Messaging must be durable and infrastructural – Can’t depend on sender or receiver actually running • Messages aren’t great for everything – 1TB message? • We need (scalable) files • We need (scalable) tables • We need (scalable) streams • We still should isolate persistence if possible
  61. © 2014 MapR Technologies 61 The Third Replatforming • From 1970-1995 … relational database • From 1991-2005 ... Internet • From 2005-? … flow-based, streaming computing
  62. © 2014 MapR Technologies 62 Where does this go?
  63. © 2014 MapR Technologies 63 General Questions to Ponder • What are the consequences of listening to customers? – Really listening? • We are willing to pay people to listen to us – Did we want that? Are the fears rational? • Will more data, better algorithms lead to a “cuddly” internet?
  64. © 2014 MapR Technologies 64 Will Flink be at the core of this revolution?
  65. © 2014 MapR Technologies 65 Will Flink be at the core of this revolution? It could be
  66. © 2014 MapR Technologies 66 Will Flink be at the core of this revolution? It could be Or not
  67. © 2014 MapR Technologies 67 It really depends on us Everyone here How can we drive adoption?
  68. © 2014 MapR Technologies 68 The Lessons • Flink was built for the future • It is right in the core of these changes happening now • But what got Flink here isn’t enough to get it there • Large-scale production adoption is the key
  69. © 2014 MapR Technologies 69 New book on Apache Flink Download free pdf courtesy of MapR Technologies mapr.com/flink-book
  70. © 2014 MapR Technologies 70 Streaming Architecture by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly) Free signed hard copies at MapR booth at Flink Forward http://bit.ly/mapr-ebook-streams
  71. © 2014 MapR Technologies 71 Short Books by Ted Dunning & Ellen Friedman • Published by O’Reilly in 2014 - 2016 • For sale from Amazon or O’Reilly • Free e-books currently available courtesy of MapR Download pdfs: mapr.com/ebooks-pdf
  72. © 2014 MapR Technologies 72 Thank You!
  73. © 2014 MapR Technologies 73 Q&A @mapr maprtech tdunning@maprtech.com Engage with us! MapR maprtech mapr-technologies
Advertisement