Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Streams as Scala Collections
S3 Scala Client with Play Iteratees and Composable Operations
Greg Silin
Platform Engineer
@g...
Agenda
• Reactive at Nitro
• Smart Documents at Scale
• Motivation for Streaming Collections
• Building Streams with Itera...
The Old Way
Create & Prepare
On the Desktop
Print
Document
Sign Printed
Document
Scan Into
Computer
Knowledge workers spend approximately 11+ hours a week
creating and managing documents
The New Way
Create Prepare
Sign
(An...
Nitro accelerates the way
businesses create, prepare, and
sign documents.
Anytime and anywhere.
Smarter Documents for Ever...
Reactive Systems at Nitro
react to user expectations <- responsive
react to state changes <- message driven
react to varia...
Smart Documents at Scale
multiple pages
and formats
per document
Smart Documents at Scale
Each action results in a new document version
render sign approve
...
Smart Documents at Scale
documents / second *
versions / document *
pages / version =
billions of objects in S3
Smart Documents at Scale
millions of new document uploads a day
100MM+/day document state changes resulting in 10x message...
Motivation for Streaming Collections
counting
copying
extracting
cleanup
become non-trivial at scale
Motivation for Streaming Collections
1 percent error margin = 10M objects
That’s money for the business
How?
How do we traverse the data?
How?
Command line tools don’t provide flexibility / scale
How?
Can’t load everything in memory
Command line tools don’t provide flexibility / scale
How?
Can’t load everything in memory
Need some batched solution
Command line tools don’t provide flexibility / scale
How?
Amazon S3 SDK has a Java key iterator
How?
Amazon S3 SDK has a Java key iterator
How?
...
Amazon S3 SDK has a Java key iterator
But we are Scala engineers!
How?
How?
Streaming is a natural fit
Amazon SDK has a Java key iterator
How?
Streaming is a natural fit
We are reactive
Amazon SDK has a Java key iterator
How?
Streaming is a natural fit
Amazon SDK has a Java key iterator
Thus asynchronous streams
We are reactive
How?
Streaming is a natural fit
Amazon SDK has a Java key iterator
Thus asynchronous streams
We are reactive
Can’t over-pa...
What Streams?
Enter Play Iteratees
Enumerator - Source
Enumeratee - Transformer
Iteratee - Consumer / Sink
Building Streams with Iteratees
Why Play Iteratees?
Building Streams with Iteratees
Why Play Iteratees?
Most mature technology at the time
Building Streams with Iteratees
Why Play Iteratees?
Most mature technology at the time
Production Experience
Building Streams with Iteratees
Play Iteratees via a counting example
Building Streams with Iteratees
Enumerator = Source
Building Streams with Iteratees
Enumeratee = Transformer
Building Streams with Iteratees
Iteratee = Sink / Reduce
Building Streams with Iteratees
Tying things together...
Building Streams with Iteratees
Can this be simplified?
Streams as Scala Collections
We are all familiar with Scala collections
Streams as Scala Collections
We are all familiar with Scala collections
map
filter
foreach
grouped
count
Streams as Scala Collections
Can reason about iteratee streams as a collection
Streams as Scala Collections
Can now redo our grouped & count example
Streams as Scala Collections
Can now redo our grouped & count example
Streams as Scala Collections
With the internals hidden, my counting code becomes simple
Streams as Scala Collections - Examples
Cleaning up files
Streams as Scala Collections - Examples
Extract data by date
Streams as Scala Collections - Applications
Can extend this model onto other data
sources
We don’t have to stop at S3
➔ Re...
"Much of my work has come from being lazy." - John Backus
Quoted in the IBM employee magazine Think in 1979 (http://en.wik...
What We Learned
Iteratees are good for traversing large volume of data
Programming iteratees can get a bit tricky
Scaling ...
Future of Streams as Scala Collections
Continue developing a reactive S3 Client
In use in Nitro Production
Introduce other...
www.github.com/nitro/streamcollections
Contributors:
www.github.com/gregsilin / @gregsbriefs
www.github.com/mkolod / @mare...
San Francisco Scala Days 2015
• Nitro is a Gold sponsor
• Meet us at our community booth
sfscala.org:
• Wed: Scala D’Ehs m...
We Are Hiring!
gonitro.com/about/jobs
Questions?
@gregsbriefs
greg.silin@gonitro.com
Upcoming SlideShare
Loading in …5
×

Stream Collections - Scala Days

2,975 views

Published on

These are the slides from the talk I gave at ScalaDays SF 2015.

Published in: Software
  • Be the first to comment

Stream Collections - Scala Days

  1. 1. Streams as Scala Collections S3 Scala Client with Play Iteratees and Composable Operations Greg Silin Platform Engineer @gregsbriefs www.github.com/nitro/streamcollections ScalaDays 2015
  2. 2. Agenda • Reactive at Nitro • Smart Documents at Scale • Motivation for Streaming Collections • Building Streams with Iteratees • Streams as Scala Collections • Applications • Questions
  3. 3. The Old Way Create & Prepare On the Desktop Print Document Sign Printed Document Scan Into Computer
  4. 4. Knowledge workers spend approximately 11+ hours a week creating and managing documents The New Way Create Prepare Sign (Anywhere)
  5. 5. Nitro accelerates the way businesses create, prepare, and sign documents. Anytime and anywhere. Smarter Documents for EveryoneTM
  6. 6. Reactive Systems at Nitro react to user expectations <- responsive react to state changes <- message driven react to variable load <- elastic react to failure <- resilient
  7. 7. Smart Documents at Scale multiple pages and formats per document
  8. 8. Smart Documents at Scale Each action results in a new document version render sign approve ...
  9. 9. Smart Documents at Scale documents / second * versions / document * pages / version = billions of objects in S3
  10. 10. Smart Documents at Scale millions of new document uploads a day 100MM+/day document state changes resulting in 10x messages billions of objects in S3
  11. 11. Motivation for Streaming Collections counting copying extracting cleanup become non-trivial at scale
  12. 12. Motivation for Streaming Collections 1 percent error margin = 10M objects That’s money for the business
  13. 13. How? How do we traverse the data?
  14. 14. How? Command line tools don’t provide flexibility / scale
  15. 15. How? Can’t load everything in memory Command line tools don’t provide flexibility / scale
  16. 16. How? Can’t load everything in memory Need some batched solution Command line tools don’t provide flexibility / scale
  17. 17. How? Amazon S3 SDK has a Java key iterator
  18. 18. How? Amazon S3 SDK has a Java key iterator
  19. 19. How? ... Amazon S3 SDK has a Java key iterator
  20. 20. But we are Scala engineers! How?
  21. 21. How? Streaming is a natural fit Amazon SDK has a Java key iterator
  22. 22. How? Streaming is a natural fit We are reactive Amazon SDK has a Java key iterator
  23. 23. How? Streaming is a natural fit Amazon SDK has a Java key iterator Thus asynchronous streams We are reactive
  24. 24. How? Streaming is a natural fit Amazon SDK has a Java key iterator Thus asynchronous streams We are reactive Can’t over-parallelize
  25. 25. What Streams? Enter Play Iteratees Enumerator - Source Enumeratee - Transformer Iteratee - Consumer / Sink
  26. 26. Building Streams with Iteratees Why Play Iteratees?
  27. 27. Building Streams with Iteratees Why Play Iteratees? Most mature technology at the time
  28. 28. Building Streams with Iteratees Why Play Iteratees? Most mature technology at the time Production Experience
  29. 29. Building Streams with Iteratees Play Iteratees via a counting example
  30. 30. Building Streams with Iteratees Enumerator = Source
  31. 31. Building Streams with Iteratees Enumeratee = Transformer
  32. 32. Building Streams with Iteratees Iteratee = Sink / Reduce
  33. 33. Building Streams with Iteratees Tying things together...
  34. 34. Building Streams with Iteratees Can this be simplified?
  35. 35. Streams as Scala Collections We are all familiar with Scala collections
  36. 36. Streams as Scala Collections We are all familiar with Scala collections map filter foreach grouped count
  37. 37. Streams as Scala Collections Can reason about iteratee streams as a collection
  38. 38. Streams as Scala Collections Can now redo our grouped & count example
  39. 39. Streams as Scala Collections Can now redo our grouped & count example
  40. 40. Streams as Scala Collections With the internals hidden, my counting code becomes simple
  41. 41. Streams as Scala Collections - Examples Cleaning up files
  42. 42. Streams as Scala Collections - Examples Extract data by date
  43. 43. Streams as Scala Collections - Applications Can extend this model onto other data sources We don’t have to stop at S3 ➔ Relational DB ➔ ElasticSearch ➔ HBase / Cassandra ➔ Spark
  44. 44. "Much of my work has come from being lazy." - John Backus Quoted in the IBM employee magazine Think in 1979 (http://en.wikiquote.org/wiki/John_Backus)
  45. 45. What We Learned Iteratees are good for traversing large volume of data Programming iteratees can get a bit tricky Scaling ain’t easy Stream Collections abstraction makes streams simple
  46. 46. Future of Streams as Scala Collections Continue developing a reactive S3 Client In use in Nitro Production Introduce other stream implementations (akka streams, etc)
  47. 47. www.github.com/nitro/streamcollections Contributors: www.github.com/gregsilin / @gregsbriefs www.github.com/mkolod / @marekinfo Open Sourcing Are you interested? We welcome collaborators!
  48. 48. San Francisco Scala Days 2015 • Nitro is a Gold sponsor • Meet us at our community booth sfscala.org: • Wed: Scala D’Ehs meetup @ Stock in Trade • Thu: unconference @ Galvanize • Thu evening: Spark Notebook & Rapture @ Nitro • Fri: free Shapeless training @ Nitro
  49. 49. We Are Hiring! gonitro.com/about/jobs
  50. 50. Questions? @gregsbriefs greg.silin@gonitro.com

×