Persistent Data Structures And Managed References


Published on

These are slide by Rich Hickey from a talk he did at QCon.

Published in: Technology
  • Persistent Data Structures And Managed References
    Are you sure you want to  Yes  No
    Your message goes here
  • Video of the talk that accompanied these slides is here:
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Persistent Data Structures And Managed References

  1. 1. Persistent Data Structures and Managed References Clojure’s approach to Identity and State Rich Hickey
  2. 2. Agenda • Functions and processes • Identity, State, and Values • Persistent Data Structures • Clojure’s Managed References • Q&A
  3. 3. Clojure Fundamentals • Dynamic • Functional • emphasis on immutability • Supporting Concurrency • Hosted on the JVM • Compiles to JVM bytecode • Not Object-oriented • Ideas in this talk are not Clojure- specific
  4. 4. Functions • Function • Depends only on its arguments • Given the same arguments, always returns the same value • Has no effect on the world • Has no notion of time
  5. 5. Functional Programming • Emphasizes functions • Tremendous benefits • But - most programs are not functions • Maybe compilers, theorem provers? • But - They execute on a machine • Observably consume compute resources
  6. 6. Processes • Include some notion of change over time • Might have effects on the world • Might wait for external events • Might produce different answers at different times (i.e. have state) • Many real/interesting programs are processes • This talk is about one way to deal with state and time in the local context
  7. 7. State • Value of an identity at a time • Sounds like a variable/field? • Name that takes on successive ‘values’ • Not quite: • i=0 • i = 42 • j=i • j is 42? - depends
  8. 8. Variables • Variables (and fields) in traditional languages are predicated on a single thread of control, one timeline • Adding concurrency breaks them badly • Non-atomicity (e.g. of longs) • volatile, write visibility • Composite operations require locks • All workarounds for lack of a time model
  9. 9. Time • When things happen • Before/after • Later • At the same time (concurrency) • Now • Inherently relative
  10. 10. Value • An immutable magnitude, quantity, number... or composite thereof • 42 - easy to understand as value • But traditional OO tends to make us think of composites as something other than values • Big mistake • aDate.setMonth(“January”) - ugh! • Dates, collections etc are all values
  11. 11. Identity • A logical entity we associate with a series of causally related values (states) over time • Not a name, but can be named • I call my mom ‘Mom’, but you wouldn’t • Can be composite - the NY Yankees • Programs that are processes need identity
  12. 12. State • Value of an identity at a time • Why not use variables for state? • Variable might not refer to a proper value • Sets of variables/fields never constitute a proper composite value • No state transition management • I.e., no time coordination model
  13. 13. Philosophy • Things don't change in place • Becomes obvious once you incorporate time as a dimension • Place includes time • The future is a function of the past, and doesn’t change it • Co-located entities can observe each other without cooperation • Coordination is desirable in local context
  14. 14. Race-walker foul detector • Get left foot position • off the ground • Get right foot position • off the ground • Must be a foul, right?
  15. 15. • Snapshots are critical to perception and decision making • Can’t stop the runner/race (locking) • Not a problem if we can get runner’s value • Similarly don’t want to stop sales in order to calculate bonuses or sales report
  16. 16. Approach • Programming with values is critical • By eschewing morphing in place, we just need to manage the succession of values (states) of an identity • A timeline coordination problem • Several semantics possible • Managed references • Variable-like cells with coordination semantics
  17. 17. Persistent Data Structures • Composite values - immutable • ‘Change’ is merely a function, takes one value and returns another, ‘changed’ value • Collection maintains its performance guarantees • Therefore new versions are not full copies • Old version of the collection is still available after 'changes', with same performance • Example - hash map/set and vector based upon array mapped hash tries (Bagwell)
  18. 18. Bit-partitioned hash tries
  19. 19. Structural Sharing • Key to efficient ‘copies’ and therefore persistence • Everything is immutable so no chance of interference • Thread safe • Iteration safe
  20. 20. Path Copying HashMap HashMap int count 16 int count 15 INode root INode root
  21. 21. Coordination Methods • Conventional way: • Direct references to mutable objects • Lock and worry (manual/convention) • Clojure way: • Indirect references to immutable persistent data structures (inspired by SML’s ref) • Concurrency semantics for references • Automatic/enforced • No locks in user code!
  22. 22. Typical OO - Direct references to Mutable Objects foo :a ? :b ? :c 42 :d ? :e 6 • Unifies identity and value • Anything can change at any time • Consistency is a user problem • Encapsulation doesn’t solve concurrency problems
  23. 23. Clojure - Indirect references to Immutable Objects foo :a "fred" :b "ethel" @foo :c 42 :d 17 :e 6 • Separates identity and value • Obtaining value requires explicit dereference • Values can never change • Never an inconsistent value • Encapsulation is orthogonal
  24. 24. Clojure References • The only things that mutate are references themselves, in a controlled way • 4 types of mutable references, with different semantics: • Refs - shared/synchronous/coordinated • Agents - shared/asynchronous/autonomous • Atoms - shared/synchronous/autonomous • Vars - Isolated changes within threads
  25. 25. Uniform state transition model • (‘change-state’ reference function [args*]) • function will be passed current state of the reference (plus any args) • Return value of function will be the next state of the reference • Snapshot of ‘current’ state always available with deref • No user locking, no deadlocks
  26. 26. Persistent ‘Edit’ foo :a "fred" :b "ethel" @foo :c 42 :d 17 :e 6 :a "lucy" :b "ethel" • :c 42 New value is function of old :d 17 • :e 6 Shares immutable structure • Doesn’t impede readers • Not impeded by readers
  27. 27. Atomic State Transition foo :a "fred" :b "ethel" :c 42 :d 17 :e 6 @foo :a "lucy" :b "ethel" • Always coordinated :c 42 :d 17 • Multiple semantics :e 6 • Next dereference sees new value • Consumers of values unaffected
  28. 28. Refs and Transactions • Software transactional memory system (STM) • Refs can only be changed within a transaction • All changes are Atomic and Isolated • Every change to Refs made within a transaction occurs or none do • No transaction sees the effects of any other transaction while it is running • Transactions are speculative • Will be retried automatically if conflict • Must avoid side-effects!
  29. 29. The Clojure STM • Surround code with (dosync ...), state changes through alter/commute, using ordinary function (state=>new-state) • Uses Multiversion Concurrency Control (MVCC) • All reads of Refs will see a consistent snapshot of the 'Ref world' as of the starting point of the transaction, + any changes it has made. • All changes made to Refs during a transaction will appear to occur at a single point in the timeline.
  30. 30. Refs in action (def foo (ref {:a "fred" :b "ethel" :c 42 :d 17 :e 6})) @foo -> {:d 17, :a "fred", :b "ethel", :c 42, :e 6} (assoc @foo :a "lucy") -> {:d 17, :a "lucy", :b "ethel", :c 42, :e 6} @foo -> {:d 17, :a "fred", :b "ethel", :c 42, :e 6} (commute foo assoc :a "lucy") -> IllegalStateException: No transaction running (dosync (commute foo assoc :a "lucy")) @foo -> {:d 17, :a "lucy", :b "ethel", :c 42, :e 6}
  31. 31. Implementation - STM • Not a lock-free spinning optimistic design • Uses locks, wait/notify to avoid churn • Deadlock detection + barging • One timestamp CAS is only global resource • No read tracking • Coarse-grained orientation • Refs + persistent data structures • Readers don’t impede writers/readers, writers don’t impede readers, supports commute
  32. 32. Agents • Manage independent state • State changes through actions, which are ordinary functions (state=>new-state) • Actions are dispatched using send or send- off, which return immediately • Actions occur asynchronously on thread- pool threads • Only one action per agent happens at a time
  33. 33. Agents • Agent state always accessible, via deref/@, but may not reflect all actions • Any dispatches made during an action are held until after the state of the agent has changed • Agents coordinate with transactions - any dispatches made during a transaction are held until it commits • Agents are not Actors (Erlang/Scala)
  34. 34. Agents in Action (def foo (agent {:a "fred" :b "ethel" :c 42 :d 17 :e 6})) @foo -> {:d 17, :a "fred", :b "ethel", :c 42, :e 6} (send foo assoc :a "lucy") @foo -> {:d 17, :a "fred", :b "ethel", :c 42, :e 6} ... time passes ... @foo -> {:d 17, :a "lucy", :b "ethel", :c 42, :e 6}
  35. 35. Atoms • Manage independent state • State changes through swap!, using ordinary function (state=>new-state) • Change occurs synchronously on caller thread • Models compare-and-set (CAS) spin swap • Function may be called more than once! • Guaranteed atomic transition • Must avoid side-effects!
  36. 36. Atoms in Action (def foo (atom {:a "fred" :b "ethel" :c 42 :d 17 :e 6})) @foo -> {:d 17, :a "fred", :b "ethel", :c 42, :e 6} (swap! foo assoc :a "lucy") @foo -> {:d 17, :a "lucy", :b "ethel", :c 42, :e 6}
  37. 37. Uniform state transition ;refs (dosync (commute foo assoc :a "lucy")) ;agents (send foo assoc :a "lucy") ;atoms (swap! foo assoc :a "lucy")
  38. 38. Summary • Immutable values, a feature of the functional parts of our programs, are a critical component of the parts that deal with time • Persistent data structures provide efficient immutable composite values • Once you accept immutability, you can separate time management, and swap in various concurrency semantics • Managed references provide easy to use and understand time coordination
  39. 39. Thanks for listening! Questions?