Road Trip to
Component
Marketa Adamova
NOMNOM INSIGHTS
Technical challenges
• Import large amounts of data in various formats
• Process, analyse and store data
• Present data to user
• Fast search and stats on data
Reasons for change
• Missing NLP libraries
• Better data storage
• Performance issues
• Prototype app
Taking the Clojure
turn
Why Clojure?
• JVM
• Java Interop
• Concurrent processing
• Data representation
• Fun to write
Lukasz Korecki (CTO at NomNom)
“We moved to Clojure
because of JVM and we
stayed for everything else.”
Main road blockers
• Correct JVM setup
• Application structure
• Managing shared application state
How to structure Clojure application
Organising Clojure code
• Namespaces
• Extract code to libraries
• Protocols
Protocol & types
• Mechanism for abstraction
• Polymorphism
• Boundaries between subsystems
Handling state in Clojure
Mutable state
• https://clojure.org/about/state
• State = value associate with identity at given time
• Memory cache, concurrent programming, …
• Atoms, refs, agents
Shared state
• Accessible from various namespaces
• Open connections and channels
• Global accessible configuration
• Mutability not required during runtime
Application configuration
• Function to load env variable
• Configuration in single atom
Application configuration
• Function to load env variable
• Configuration in single atom
• Config library
• Mount
• Component
Stuart Sierra’s
Component
What is ‘Component’
• managing lifecycle and dependencies of
components with runtime state
• db access, external API services, web server
• system of components
How it solves our problem
• Enforcing structure in code
• State defined in single place
• Better visibility of system
Testing
• Mock components
• Integration tests for complex flow
• E2E test
REPL interaction
• Define development system
• Multiple systems in single JVM
• No need to restart REPL
Production
• Avoid accessing production system !!
• Visualise system & strong subsystem boundaries
• Debugging
• Add ad hoc components when required
The bad parts …
• “all or nothing”
• Failures during system startup
• Trying to use “wrappers”
• Integration with other libraries
• OO approach
End of journey
Resources
https://clojure.org/index
https://github.com/stuartsierra/component
https://github.com/tolitius/mount
http://www.joyofclojure.com
http://thinkrelevance.com/blog/2013/11/07/when-should-you-use-clojures-object-
oriented-features
https://purelyfunctional.tv/issues/clojure-gazette-180-how-do-you-structure-your-apps/
https://cb.codes/organizing-clojure-projects-and-libraries/
… and lots of other Clojure talks, articles and discussions
Questions?

Road Trip To Component

Editor's Notes

  • #2 welcome talk introduce myself quick overview what the talk will be about
  • #3 Intro to Nomnom place to gather all your customer feedback …. explain what it does … founded April 2015, live November 2016 some of our customers - Usertesting.com, Wix, Magento Analytics, RJMetrics, Sumologic etc. very small team of 7 people, distributed across 2 continents & 5 countries
  • #4 Data import each API is different the amount of data TODO: add stats about traffic (number of connected integrations, number of requests we make) Analyse & process keep useful data Present data UX is important Maybe clojureScript in future :) Search data TODO add stats about number of performance + search query time
  • #5 understand better visual input play video search by keywords, the source of customer feedback and even the type of the user which left you feedback
  • #6 Original data model simplified version user connected integrations (oauth, api keys), download data on their behalf data stored both in PG and ES & query ES and retrieve docs from PG in production we have all the extra infra stuff (RabbitMQ, Redis, Webhooks, Schedulers, loadBalancers, statsD, s3)
  • #7 NLP libraries not huge support comparing to python, java Data model postgres + JSONB turns out that frequently creating and updating millions of JSON objects can put a lot of strain on the database Performance ruby slow, single threaded … bad for async/concurrent processing growing number of integrations/traffic Prototype how long to keep your original data model? rails serves well in areas you don’t want to worry about (user management, billing)
  • #8  Right tool for the job! Migration is not single steps but rather many small once!
  • #9 JVM: battle tested, easy to get quickly running Java Interop: NLP libraries Concurrency: multithreading without much overhead Data: immutability Fun: who want to write JAVA? Newbies no previous experience in team for writing web servers in clojure you don’t have to care that much about the real definition of pure functions, monads etc code is easy to write, reason about and has great performance
  • #10 TODO spelling transducers
  • #11 Migration 1 - few NLP services
  • #12 Migration 2 main data processing/storing model in Clojure move document storage to rethinkDB Around this time I joined Next challenge was to move the workers As we start adding more services we started to see common issues
  • #13 JVM crashed on first startup over provisioned machines misconfigured thread poll settings in jetty mixed high CPU with lot of IO we had a memory leak due to a bug in regexes (different syntax then in ruby) you need at least basic tuning in place Structure no holding hand like in rails not many books around how to structure large applications people new to Clojure State - why we can’t get away with pure functions - larger application in production need state - clojure approach to problem
  • #14 structure your code structure the business logic Stuart Halloway said ‘If your application is more then 2 weeks old your biggest problem is complexity of your code’
  • #15 Ns specific set of related data & functions separate functionality by comment blocks Libraries extract code to clojure libraries Protocol - create boundaries in system
  • #16 Protocol specification only, no implementation polymorphic functions + protocol object single type can implement multiple protocols Dynamic polymorphism dynaminc, no compile effect generate interface with the same ns function can be used on multiple data types or behave differently based on additional argument dispatch on class type (90% of use cases of multi method) - multi method = runtime polymorphism (dispatch on function) higher level abstraction/organization Deftype vs Defrecord => record give you hash-map
  • #17  Protocol & type & constructor Deftype vs Defrecord => record give you hash-map
  • #18 Protocol & type & constructor Deftype vs Defrecord => record give you hash-map
  • #19 state = value at time I ll be talking here about mutable state & shared state
  • #20 In past .. state= the content of this memory block identity = has a state, exactly one point at the time, state does not change, identity can have new state! This is the Clojure model (Rich hickey) Why we need state Clojure has great solution
  • #21 GMS = global & mutable (can be but do we need it) Shared state = accessible from various namespaces Examples Mutability - usually defined on application startup and then not required to change!
  • #22 First attempt on handling the the shared config …
  • #25 library - avoid code duplicates issues - compilation vs runtime value (TODO add code), opening to many channels to RabbitMQ mount - more clojure like approach, only solving one problem component - was good fit as we were still writing many new services (explain later why full buy in is required)
  • #26 Back to our application … Multiple clojure services service oriented rather then micro services 90% data processing done in clojure to be migrated (legacy integrations from ruby) again not visible the additional infra setup Topic for another talk - overhead/advantage with managing multiple services rather then single monolith
  • #27 Single clojure services does a lot of stuff workers = logic clients - connect to 3rd parties, fetch/store/send data initiate jobs (werb server & schedulers) How to structure, configure and manage
  • #30 'Component' is a tiny Clojure framework for managing the lifecycle and dependencies of software components which have runtime state. This is primarily a design pattern with a few helper functions. It can be seen as a style of dependency injection using immutable data structures. all stageful part is gathered together (rather then scattered atoms) group together related entities good REPL
  • #31 Protocol & type & constructor Deftype vs Defrecord => record give you hash-map
  • #32 the 3 parts protocol = define your interface protocol + type + constructor fn no state changes in constructor fn
  • #35  structure - using protocols, people can follow pattern + boundaries visibility - system map of components, libraries for visualisation (to many cross dependencies) state - configuration + open connection/channels
  • #36 Mock = stub implementation (better then redefs) The cost of creating and starting a system is low enough Mock component can do real things E2E - setup test system with mix of real & test components
  • #37 ;; simplify setup ;; start in dependencies
  • #39 ;; simplify setup ;; start in dependencies New instance of system for each test
  • #40 ;; simplify setup ;; start in dependencies New instance of system for each test
  • #41 dev system - system map of components (replace those which needs to be mocked up) multiple system - test two versions REPL restart = JVM restart (takes long), components makes it unnecessary => rapid development cycle
  • #43 access production = bad idea as can reset the whole system you know what state is running in each component debugging = Nrepl ad hoc components = data migration, long running job
  • #46 use the right tool for the right job async processing is great use the approach which suits you
  • #47 TODO