Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

  • 386 views
Uploaded on

In component-based simulation systems, simulation runs are usually executed by combinations of distinct components, each solving a particular sub-task. If multiple components are available for a......

In component-based simulation systems, simulation runs are usually executed by combinations of distinct components, each solving a particular sub-task. If multiple components are available for a given sub-task (e.g., different event queue implementations), a simulation system may rely on an automatic selection mechanism, on a user decision, or --- if neither is available --- on a predefined default component. However, deciding upon a default component for each kind of sub-task is difficult: such a component should work well across various application domains and various combinations with other components. Furthermore, the performance of individual components cannot be evaluated easily, since performance is typically measured for component combinations as a whole (e.g., the execution time of a simulation run). Finally, the selection of default components should be dynamic, as new and potentially superior components may be deployed to the system over time. We illustrate how player rating systems for team-based games can solve the above problems and evaluate our approach with an implementation of the TrueSkill(tm) rating system (Herbrich et al, 2007), applied in the context of the open-source modeling and simulation framework JAMES II. We also show how such systems can be used to steer performance analysis experiments for component ranking.

The paper can be found here: https://docs.google.com/file/d/0BxPrl7QoBqmoUDVXNmZUc29Nbmc/edit?usp=sharing

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
386
On Slideshare
386
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Evaluating Simulation Software Components with Player Rating Systems 6. 3. 2013, SIMUTools 2013 Jonathan Wienß Michael Stein Roland Ewald Sponsored by:6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 1
  • 2. Component-Based Simulation Systems • Simulator: combination of components • Typical components: • Event management • Collision detection • State saving • Result storage • Random number generation • etc. • Example: JAMES II http://flickr.com/photos/jdhancock/7239958506, cc-by6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 2
  • 3. Problem: Evaluating Individual Components https://commons.wikimedia.org/wiki/File:Rowing_-_USA_Lwt_4_@_World_Champs_2003.jpg • Only component combinations are comparable • Dedicated performance studies are expensive & difficult6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 3
  • 4. Solution: Player Rating Systems Performance Comparison Multiplayer Team Results E.g., Event Queues { A B 1. SC 2. SE BSimulators { SC SD SE 3. SD A 15 s 25 s 17 s 1. Component Combination = Team of Players 2. Record results (of multiple combinations) 3. Update global component rating⇒ Component Rating Systems, e.g. to find good default components.6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 4
  • 5. Component Rating Systems • What is required? • How does it work? • How well does it work?6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 5
  • 6. Component Rating Systems: Requirements • Re-usable (system-independent) • Inexpensive (memory, execution time) • Scalable (w.r.t. components / component combinations) • Robust (w.r.t. ‘outlier problems’) • Adaptive (component updates)6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 6
  • 7. Microsoft’s TrueSkillTM Approach1 (used for XBox LiveTM ) • Input: • Team defined by player indices, e.g., Ai = {4, 8, 125} • Team assignment A = {A1 , . . . , Ak } (pairwise disjoint) • Team ranking r (game result) • Output: player skill ratings µ i • Assumptions: • Player skill si ∼ N (µi , σi2 ) • Player performance pi ∼ N (si , β 2 ) • Team performance tj = i ∈ A pi j1: Herbrich, Minka, and Graepel: TrueSkill(tm): A Bayesian Skill Rating System, Adv. in Neural Information Processing Systems 19, 20076. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 7
  • 8. Bayesian Inference in TrueSkill P (r |s , A)p(s ) p(s |r , A) = P (r |A) ∞ ∞ = ... p(s , p , t |r , A)d p d t −∞ −∞r : ranking t : team performances s : player skillsA: team assignment p : player performances 6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 8
  • 9. Factor Graphs & Message Passingparison Multi-Player Team Results B 1. SC Skills sSC sSE sB sSD sA 2. SE B SE 3. SD A 17 s Performance pSC pSE pB pSD pA 1. Pass messages downwards: s→p→t 2. Expectation propagation (≈): Team t ↔ d (r ) Performance tSC tSE+B tSD+A 3. Pass messages upwards: t→p→s Team Performance Difference d1 d2 6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 9
  • 10. Limitations & Adaptations • Strong assumptions that may not hold: • Player performance independence • Normally distributed performance • No additive team performance → average • Player may play in more than one team6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 10
  • 11. Ranking Event Queues in JAMES II: Reference Data Event Queues / Models 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Sum MList 4 4 5 2 3 5 7 3 5 4 4 4 4 4 4 5 1 1 1 6 76 LinkedList 1 1 1 5 1 3 3 5 3 7 6 6 6 6 6 2 5 5 5 3 80 TwoList 2 2 2 6 2 2 2 7 2 3 7 7 7 7 7 4 6 6 6 4 91 CalendarQueue 7 7 8 3 6 8 8 1 6 5 2 2 2 3 2 9 2 2 2 7 92 BucketsThreshold 10 9 9 1 8 1 4 6 10 8 1 1 1 1 1 8 4 4 4 2 93 MPLinkedList 3 3 3 7 4 6 5 4 4 1 9 9 9 9 9 3 7 7 7 5 114 CalendarReQueue 9 10 10 4 7 9 6 10 9 9 3 3 3 2 3 7 3 3 3 8 121 Heap 5 5 4 8 5 4 1 9 8 6 8 8 8 8 8 6 9 9 9 1 129 Simple 8 6 6 9 9 7 9 2 1 10 5 5 5 5 5 1 10 10 10 9 132 DynamicCalendarQueue 6 8 7 10 10 9 10 8 7 2 10 10 10 10 10 10 8 8 8 10 171 55 55 55 55 55 54 55 55 55 55 55 55 55 55 55 55 55 55 55 55 • Five models for each formalism: SRS, stoch-π, PDEVS, SR • Per formalism: (1 + 3 + 1 + 3 = 8 simulators) × 10 event queues • 80 comp. combinations × 20 replications × 5 models = 8.000 runs6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 11
  • 12. Experiment Setup 1. A, rSimulation Problems 2. A B Component Rating SystemEligible ComponentCombinations SD SE Current Event Queue Ranking: 1. ...Execution Times ... ? Count Inversions 10. ...6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 12
  • 13. Evaluation: Ranking Event Queues 25 Default Setup β = 833.3 20 Average Number of Inversions 15 10 5 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Component Combination Comparisons6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 13
  • 14. SummaryProblem: How to evaluate individual components of a simulation system?Solution: A scalable and robust component rating system.Method: Bayesian inference (MS TrueSkill algorithm).Outlook: • Global component rankings • Consider ‘margin of victory’ • Improve usage for experiment steering6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 14
  • 15. http://bitbucket.org/alesia6. 3. 2013 c 2013 (License: Apache 2.0) UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 15
  • 16. Thank you. Questions?6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 16
  • 17. Operation Modes Passive Mode Active Mode Simulation Users Software Problem Results Performance Problem & Results Component Simulation Combinations Software Match Selection & Experiment Control Component Component Component Component Comparisons Ranks Comparisons Ranks etc. Component Component Rating System Rating System6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 17
  • 18. Evaluation: Ranking Event Queues 30 Passive Mode Active Mode 25Average Number of Inversions 20 15 10 5 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Component Combination Comparisons6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 18