How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

3,082 views

Published on

How to manage a large file with HTTP? Is it possible to manage data stream when you are in a HTTP POST request? How to be relax when managing data stream (ie : not use while(true) hack or something I/O block)?

This talk is about Play! iteratee and how to use it on real project : what is an iteratee? why it's useful? how to use it? how to manage it? is it clean? is your code readable or not?

http://scaladays.org/#schedule/How-to-manage-large-amounts-of-data-with-Iteratee

Published in: Technology

How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

  1. 1. HOW TO MANAGE LARGE AMOUNTS OF DATA WITH ITERATEE BY @WAXZCE – QUENTIN ADAM SCALA DAYS 2014 BERLIN
  2. 2. MY DAY TO DAY WORK : CLEVER CLOUD, MAKE YOUR APP RUN ALL THE TIME
  3. 3. And learn a lot of things about your code, apps, and good/bad design… KEEP YOUR APPS ONLINE. MADE WITH NODE.JS, SCALA, JAVA, RUBY, PHP, PYTHON, GO…
  4. 4. AND LEARN A LOT OF THINGS ABOUT YOUR CODE, APPS, AND GOOD/BAD DESIGN…
  5. 5. WHY STREAMS ARE SO TRENDY THESE DAYS?
  6. 6. 2 MAIN REASONS
  7. 7. DATA ARE COMING BIGGER AND BIGGER
  8. 8. WE NEED TO ACT WHILE DATA TRANSFERRING IS OCCURRING
  9. 9. &
  10. 10. WE ARE NOW USING LOTS OF PERMANENT CONNECTION
  11. 11. EXAMPLE
  12. 12. WHAT IS INSIDE AN HTTP REQUEST ? Verb • The action Resource • The object of the action Headers • The context of the action Body • Optional • The datas
  13. 13. IN MANY CASE THE REQUEST IS MANIPULATE ALL FROM MEMORY
  14. 14. UPLOADING A 20 GB FILE ON A POST REQUEST AND STORE IT EXAMPLE
  15. 15. IT’S A BAD IDEA TO PUT THE BODY PART IN MEMORY
  16. 16. CREATE A TEMP FILE TO STORE THE DATA
  17. 17. Client upload file Create a temp file Send a file in a backend Then answer to the client The file is transferring two times
  18. 18. NOT SO GOOD
  19. 19. Client upload file Directly stream it to the backend Then answer to the client
  20. 20. LET’S ACT ON STREAM!
  21. 21. CLASSIC JAVA STREAM MANAGEMENT
  22. 22. CLASSIC JAVA STREAM MANAGEMENT • Buffers • Buffer management • Buffer exception • Buffer overflow
  23. 23. CLASSIC JAVA STREAM MANAGEMENT • Low performances if not buffered • Not modular • Thread blocking • Code is ugly • No back pressure • Error handling is bad • i/o management and business code are mixed
  24. 24. I CAN’T SLEEP IN THE NIGHT
  25. 25. WE HAVE TO FIND A BETTER WAY
  26. 26. THAT’S WHY WE NEED ITERATEE
  27. 27. DATA STRUCTURES TO EXPRESS DATA STREAM MANAGEMENT
  28. 28. Like a recipe Consume the data ITERATEE : HOW TO MANAGE A STREAM
  29. 29. Produce the data ENUMERATOR : DATA STREAM
  30. 30. Set of tools to do cool things with Iteratee and Enumerator ENUMERATEE
  31. 31. SIMPLE EXAMPLE TO START
  32. 32. AN ENUMERATOR
  33. 33. AN ITERATEE
  34. 34. INPUTS • EOF • Empty • El
  35. 35. ITERATEE STATES • Done • Error • Cont
  36. 36. AN ITERATEE
  37. 37. AN ITERATEE TO MANAGE THE STREAM
  38. 38. AN ITERATEE
  39. 39. ITERATEE COMPOSITION
  40. 40. PURE FUNCTIONAL
  41. 41. TYPE SAFE
  42. 42. COMPOSITION
  43. 43. SO WE WILL JUST MANAGE THE BODY STREAM JUST DO NOT REWRITE HTTP PARSER
  44. 44. CREATE TEMP FILE Built in on play with
  45. 45. CREATE TEMP FILE Built in on play with
  46. 46. BODY PARSERS REQUEST HEADERS -> ITERATEE[ARRAY[BYTE], EITHER[SIMPLERESULT, ?]]
  47. 47. GET FILE AND CALCULATE HASH FROM CHUNK
  48. 48. DEMO
  49. 49. I WANT TO USE FORMS
  50. 50. SO PARTHANDLER ALLOW YOU TO COMPOSE A BODYPARSER ITERATEE <3 COMPOSITION
  51. 51. NOW WE CAN ACCESS TO STREAMS 
  52. 52. DATA STORAGE
  53. 53. I HATE FILE SYSTEM
  54. 54. I HATE FILE SYSTEM
  55. 55. DO NOT USE THE FILE SYSTEM AS A DATASTORE File system are POSIX compliant • POSIX is ACID • POSIX is powerful but is a bottleneck • File System is the nightmare of ops • File System creates coupling (host provider/OS/language) • SPOF-free multi tenant File System is a unicorn
  56. 56. • Key = hash of the chunk STORE FILE IN CHUNK INSIDE K/V STORE
  57. 57. STORE AS META DATA THE LIST OF CHUNKS
  58. 58. PROCESS File is post by browser Stream is chunk nomalized Each chunk is async store in kv List of chunk hash is store with file id No overhead for data store 
  59. 59. NEW DEMO • Play 2 + iteratee • Datastore is riak 
  60. 60. ARE YOU CONVINCE TO USE STREAM ?
  61. 61. DATA STREAMING PROGRAMMING IS NOW TRENDING
  62. 62. THX TO @CLEMENTD CLEMENT DELAFARGUE
  63. 63. I’m @waxzce on twitter I’m the CEO of A PaaS provider, give it a try ;-) THX FOR LISTENING & QUESTIONS TIME Coupon for Clever Cloud trial : berlin_scala_days

×