- 1. RELATING THE TIME REQUIRED TO OBSERVE A CERTAIN NUMBER OF EVENTS Asoka Korale, Ph.D. C.Eng. MIESL
- 2. MOTIVATIONS FOR RELATING TIME AND EVENTS
- 3. APPLICATIONS OF RELATING TIME AND EVENTS Call CentersTraffic Management Transportation and Logistics Packet Switching Production Scheduling Forecasting / Relating Time based Ev
- 4. INSIGHTS FROM RELATING TIME AND EVENTS• Relate an interval of observation to a sum of inter-arrival time random variables • Relate the interval of observation to • the total number of events observed in the interval • the uncertainty associated with the average number of events in the interval • the sum of the number of inter-arrival time intervals that compose the interval • Establish a probabilistic relationship for the time taken to observe a number of events • Relate the uncertainty in the interval of observation to a number of events
- 5. NOVEL STOCHASTIC RELATIONSHIP BETWEEN TIME AND EVENTS RELATE TIME TAKEN TO OBSERVE A CERTAIN NUMBER OF EVENTS
- 6. UNCERTAINTY ASSOCIATED WITH EVENTS OVER TIME E1 E2 EN-1 EN ∆𝑡1 ∆𝑡2 ∆𝑡 𝑁 𝑍 = ∆𝑡1 + ∆𝑡2 + ⋯ + ∆𝑡 𝑁 total time (Z) to observe a number of events (N) is a sum of a similar number of inter-arrival time – time intervals each inter-arrival time – time interval a random variable (∆𝑡i) total uncertainty in the time interval (Z) a reflection of the uncertainty associated with each individual random variable (∆𝑡i) the dependence between random variables impacts the total uncertainty associated with the sum total uncertainty in the interval (Z) – leads to the variance in the number of events observed in such an interval time interval Z to observe N events inter-arrival time random variables distribution of inter-arrival times events
- 7. A SUM OF INTER-ARRIVAL TIME RANDOM VARIABLES E1 E2 EN-1 EN ∆𝑡1 ∆𝑡2 ∆𝑡 𝑁 𝑍 = ∆𝑡1 + ∆𝑡2 + ⋯ + ∆𝑡 𝑁 • Each event inter-arrival time ∆𝑡i is a random variable • each such random variable has associated with it a certain uncertainty • An N number of inter-arrival time random variables are required to observe an equivalent number of events • The total time Z taken to observe N events is a sum of N inter-arrival time random variables • The uncertainty associated with this sum of random variables – translates in to a number of events • a number of events associated with the uncertainty in the total time taken to observe the events • The distribution of the inter-arrival times may be estimated from historical data
- 8. RELATING TIME AND EVENTS E1 E2 EN-1 EN ∆𝑡1 ∆𝑡2 ∆𝑡 𝑁 𝑍 = ∆𝑡1 + ∆𝑡2 + ⋯ + ∆𝑡 𝑁 when the inter-arrival times are drawn from a single distribution and are independent (IID), Z has mean and variance E(Z) = 𝑁𝜇∆𝑡 Var Z = 𝑁𝜎∆𝑡 2 E(∆𝑡𝑖) = 𝜇∆𝑡 Var ∆𝑡𝑖 = 𝜎∆𝑡 2 when the events are correlated the variance of the sum of a number of inter-arrival times will feature the covariance between each pair of random variables that compose the sum 𝑉𝑎𝑟 𝑍 = ∀𝑖 𝑉𝑎𝑟(∆𝑡𝑖) + ∀𝑖,𝑗 𝑖≠𝑗 𝐶𝑜𝑣(∆𝑡𝑖∆𝑡𝑗) 𝑁 = 𝑉𝑎𝑟(𝑍) 𝜇∆𝑡 = 𝑁𝜎∆𝑡 2 𝜇∆𝑡 𝑁 = 𝑁 ± 𝑘 ∗ 𝑁 • to observe 𝑁 number of events in a time interval of length Z • scale the variance (or standard deviation) via constant k • a measure of the degree of the uncertainty in N - a measure of its deviation from the mean. where where
- 9. NOVEL STOCHASTIC MODEL OF AN M/M/1 QUEUE SYSTEM BY RELATING TIME AND EVENTS VIA A SUM OF INTER-ARRIVAL TIME RANDOM VARIABLES
- 10. Birth – Death process model of an M/M/1 Queue System Deterministic approach – • rates are deterministic – usually measured over an interval of time λ > n=0 Po < µ λ > λ > < µ < µ n=n Pn n= n-1 Pn-1 n=1 P1 n=2 P2 λ > < µ λ𝑃𝑛−1 = μ𝑃𝑛 𝑃𝑛 = (λ/μ) 𝑛 𝑃0 𝑛=0 𝑁 𝑃𝑛 = 1ρ = λ/μ 𝑃𝑛 = 𝜌 − 1 [ 𝜌 𝑁+1 − 1]ρ 𝑛 balance equations traffic intensity probability distribution of state use sum to solve for Po probability of state E1 E2 EN-1 EN ∆𝑡1 ∆𝑡2 ∆𝑡 𝑁
- 11. approach Deterministic Approach Stochastic Approach λ > n=0 Po < µ λ > λ > < µ < µ n=n Pnn= n-1 Pn-1 n=1 P1 n=2 P2 λ > < µ λ 𝑛−1 𝑃𝑛−1 = 𝜇 𝑛 𝑃𝑛 𝑃𝑛 = λ 𝑛−1/μ 𝑛 λ 𝑛−2/μ 𝑛−1 … (λ 𝑜/μ1)𝑃0 𝑛=0 𝑁 𝑃𝑛 = 1 ρ 𝑛 = λ 𝑛−1/μ 𝑛 λ𝑖 𝑖 = 1/∆𝑡𝑖 𝐴 𝜇𝑖 𝑖 = 1/∆𝑡𝑖 𝐷 λ𝑖 = 𝐸{λ𝑖 𝑖 } = 𝐸{1/∆𝑡𝑖 𝐴 } 𝜇𝑖 = 𝐸{𝜇𝑖 𝑖 } = 𝐸{1/∆𝑡𝑖 𝐷 } instantaneous arrivals and departure rates 𝑃𝑛 𝑖 = ∆ 𝑡 𝑛 𝐷 ∆ 𝑡 𝑛−1 𝐴 . . . (∆ 𝑡1 𝐷 ∆ 𝑡0 𝐴 )𝑃0 𝑃𝑛 = λ 𝑛−1/μ 𝑛 λ 𝑛−2/μ 𝑛−1 … (λ 𝑜/μ1)𝑃0 expected probability of state converges to deterministic result instantaneous probability of state E1 E2 EN-1 EN ∆𝑡1 ∆𝑡2 ∆𝑡 𝑁
- 12. Probability of observing a particular sequence of events when inter-arrival times are independent the expectation of the product it the product of the expectations Let Z = 𝑃(∆𝑡1, ∆𝑡2, … , ∆𝑡 𝑁) E1 E2 EN-1 EN ∆𝑡1 ∆𝑡2 ∆𝑡 𝑁 𝑃 𝑍 = 𝑖=1 𝑁 𝑃(∆𝑡𝑖) 𝐸 𝑃 𝑍 = 𝐸 𝑖=1 𝑁 )𝑃(∆𝑡𝑖 = 𝑖=1 𝑁 }𝐸{𝑃(∆𝑡𝑖) probability of a sequence is the product of the individual probabilities of observing a particular inter-arrival time when inter-arrival times are independent – consistent with an M/M/1 scenario
- 13. ANOMALY DETECTION IN AN M/M/1 QUEUE SYSTEM CHARACTERIZING PERFORMANCE OF A SOFTWARE COMPONENT
- 14. Anomaly Detection Scheme • A system of components • Each component a queue / server Comp 1 Comp 2 Comp 3 Comp N • Component Load Distribution of No of Messages in System Arrivals – Departures in ∆𝑇 load trigger threshold M (I) State N+1 Comp 1 Comp 2 Comp 3 Comp N Comp 1 1 1 State N Comp 2 Comp • Dispersion of anomaly across component sy
- 15. Estimating Load on a Software Component • Treat system as a network of components • inter-arrival times help to characterize the performance best • Model each component as queue – server system • Queue – buffering messages into the component • Server – processing all messages within a component • No of messages in “system” (in queuing parlance) • those waiting and in service – difference between arrivals and departures • account for multiple queues within a component -------------------------------------------------------------------------- • Common approach - threshold based alert system • Thresholds commonly measure performance - at • component level • system level • Typically Thresholds use – latencies, queue lengths,
- 16. Performance Measures - Software Component • Variation in the number of messages in “system” (in queuing parlance) • Performance measures – • Variance, Mean - of messages in the system • Variance / Mean - of messages in the system • Estimate Performance measure from the Distribution of • no of messages • Variance / Mean • Threshold setting – • detect an outlier • a certain number of standard deviations from mean • The time behavior of the distribution in the arrivals and departures will imp envision time dependent thresholds
- 17. Characterizing Variation in the load 𝑍 𝐴 = ∆𝑇 = ∆𝑡1 𝐴 + ∆𝑡2 𝐴 + ⋯ + ∆𝑡 𝑁 𝐴 𝑍 𝐷 = ∆𝑇 = ∆𝑡1 𝐷 + ∆𝑡2 𝐷 + ⋯ + ∆𝑡 𝑁 𝐷 𝑁 𝐴 = 𝑘 𝐴 𝑉𝑎𝑟(𝑍 𝐴) 𝜇∆𝑡,𝐴 = 𝑘 𝐴 𝑁 𝐴 𝜎 𝐴 2 𝜇∆𝑡,𝐴 𝑁 𝐷 = 𝑘 𝐷 𝑉𝑎𝑟(𝑍 𝐷) 𝜇∆𝑡,𝐷 = 𝑘 𝐷 𝑁 𝐷 𝜎 𝐷 2 𝜇∆𝑡,𝐷 𝑁 = 𝐸{𝑁 𝐴 } − 𝐸{𝑁 𝐷 } 𝑉𝑎𝑟{𝑁 𝐴 − 𝑁 𝐷 } = 𝑉𝑎𝑟{𝑁 𝐴 } + Var{𝑁 𝐷 } No of events in the system at the end of a common time interval ∆𝑇 is the difference between those that arrive and those that depart total number of arrivals in time interval ∆𝑇 is 𝑁 𝐴 total number of arrivals in time interval ∆𝑇 is 𝑁 𝐷 number of arrivals associated with the composition of 𝑁 𝐴 events in time interval ∆𝑇 number of departures associated with the composition of 𝑁 𝐷 events in time interval ∆𝑇 average number of events in the system at the end of time interval ∆𝑇 variance in the number of events in the system at the end of time interval ∆𝑇 The variance arises due to the contribution of the individual uncertainties associated with the individual random variables that compose the sum ∆𝑇
- 18. Components • Model the anomaly state (yes 1 / no 0) at each component - interface • Track anomalies across system and across time via a transition matrix (M) • Update transition matrix entries at each change of state • the difference between matrix M(I+1) and M(I) will provide system state at M(I-1) and also the • The transition matrix gives insight in to how M (I) State N+1 Comp 1 Comp 2 Comp 3 Comp N Comp 1 1 1 State N Comp 2 Comp 3 1 Comp N Comp 1 Comp 2 Comp 3 Comp N M (I+1) State N+1 Comp 1 Comp 2 Comp 3 Comp N Comp 1 2 1 State N Comp 2 Comp 3 1 update when system state changes record anomaly on a link-component
- 19. RESULTS: ANOMALY DETECTION IN AN M/M/1 QUEUE SYSTEM TO CHARACTERIZE PERFORMANCE OF A SOFTWARE COMPONENT
- 20. Test Scenarios and Validation of model Test Scenarios: • Different offered load and service discipline • Poisson arrivals (exponential service time with independent increments) Exponential service time (independent increments) Summary Results: • Behavior of number in system • Average number in system = difference in mean arrivals and departures • Variance of number in system = sum of variances in arrivals and departures Inter-Arrival Time (s) Scenario I Scenario II Scenario III Arrivals - Mean Inter-Arrival Time 0.50 0.51 0.79 Arrivals - Variance Inter-Arrival Time 0.26 0.26 0.62 Departures - Mean Inter-Arrival Time 0.50 0.80 1.00 Departures - Variance Inter-Arrival Time 0.24 0.64 1.05 Number Over Window Scenario I Scenario II Scenario III Mean Arrivals 19.93 19.79 12.67 Variance in Arrivals 19.35 18.52 11.36 Mean Departures 20.05 12.39 9.99 Variance in Departures 18.71 10.98 10.41 Mean (Arrivals - Departures) -0.13 7.39 2.68 Variance (Arrivals - 37.57 29.45 21.38
- 21. Arrivals / Departures Process • Exponential service time with mean 0.5 seconds • Distribution of number of arrivals in an interval of 10s • The number of arrivals equivalent to the sum of a number of inter-arrival times • which is a sum of random variables • the sum converges to a normal
- 22. Characterizing component load • Use distribution of the average number of events in the system into characterize the load • Variance in the number of events in system set thresholds to trigger at a probability level
- 23. Variation in the Variance • Use cumulative distribution in the variance to characterize the impact of variation in the variance with window length • Longer windows feature a larger number of events – each event a inter-arrival time random variable • The uncertainty scales with the number of random variables in the sum • Longer intervals have larger uncertainty associated with the composition of the time interval – • rightward shifting – flattening curves 𝑍 = ∆𝑡1 + ∆𝑡2 + ⋯ + ∆𝑡 𝑁 𝑁 = 𝑉𝑎𝑟(𝑍) 𝜇∆𝑡 = 𝑁𝜎∆𝑡 2 𝜇∆𝑡
- 24. IMPROVEMENTS AND FUTURE WORK MESSAGE SCHEDULING AND THRESHOLD OPTIMIZATION
- 25. SCHEDULING AND IMPACT ON PERFORMANCE • Introduce load balancing to intelligently route messages – • Particularly in components with multiple queues • Assign messages • to queue with lowest load • to queue that is most likely to process it fastest / most efficiently • Characterizing • processing time of messages as a function of • Type of messages – and expected processing time • messages in the queue … • Model inter-arrival times – on a per queue basis – • see appendix: on relating events and time taken to observe them • Account for time dependence of statistics
- 26. ALERT THRESHOLDS OPTIMIZATION • Critical Stats guide uses a fixed set of thresholds • Consider component load stat – use variation of number of messages in • stat – based on existing / recoded measurements • Performance at component level – • irrespective of input conditions • based on maximum design spec of component • depending on input conditions – traffic / trading / time dependent • set thresholds to account for behavior that is also depending on • expected / normal traffic • Determine threshold values based on Normal / Abnormal behavior • amount of load that is historically observed • Consider time based thresholds – • if feasible – as offered load is time varying • Tune anomaly threshold – based on time varying load
- 27. Slide | 27 THANK YOU
- 28. APPENDIX EVENT (INTER) ARRIVAL TIME PROCESS
- 29. EVENT INTER – ARRIVAL TIME • Introduce a Feature to characterize the “Time property” in the Event based Model • Each Event has a time stamp and between Events – an Event “Inter - Arrival time” • Modeling this “time interval” will give insights in to “Time Patterns” of the Events in characterizing Trading behavior • Natural to consider basic statistics related to Inter- Arrival Time • Descriptive Statistics – means, variances, Higher Order Statistics • But they don’t necessarily capture the characteristics in the pattern of Event Inter - Arrival Times • Also fitting Distributions and estimating their characteristics may not be very viable / reliable • Data Dependent, too little data to estimate , degree of fit issues A B C B …….. A C E1 E2 E3 E4 …..... EN-1 EN …….t1 t2 t3 tN-1 Event Type Event No
- 30. MODELING THE - EVENT INTER- ARRIVAL TIME • This Time Series captures the time patterns in the placing of Market Orders and Trading Event • We characterize and quantify these patterns through Statistical Analysis that captures its important properties • The Randomness in the Event Inter - Arrival times – via Entropy • Autocorrelation – measures degree of correlation between samples of inter arrival times A B C B …….. A C E1 E2 E3 E4 .…..... EN-1 EN …….t1 t2 t3 tN-1 1 2 3 ........... N-2 N-1 ..……. t1 t2 t3 tN-1 Event Type Time Series of Event Inter - Arrival Time Sample Number of time series Event No ti - Event inter-arrival time
- 31. A DISTRIBUTION INDEPENDENT OF MEASUREMENT (TIME) WINDOW • Observe the distribution of the time between each pair of events • call it the event inter arrival time • The distribution of this quantity does not change as its not dependent on a window of measurement. • purely a function of the event arrival (generative) process • the process will depend on the particular quantity (orders, trades ect …) we are observing • The underlying distribution however is fixed for a particular data set
- 32. RELATING NUMBER OF EVENTS OBSERVED TO INTERVALS OF TIME E1 E2 E3 EN-1 EN ……. Event No ∆𝑡 𝑁∆𝑡1 ∆𝑡2 ∆𝑡3 Z= ∆𝑡1 + ∆𝑡2 + ⋯ + ∆𝑡 𝑁 Let Z be the sum of N IID random variables drawn from the distribution of the inter arrival time E(Z) = 𝑁𝜇∆𝑡 Let the mean and variance of distribution of the inter-arrival time be E(∆𝑡) = 𝜇∆𝑡 Var(∆𝑡) = 𝜎∆𝑡 2 ∆𝑡 Var(Z) = 𝑁𝜎∆𝑡 2 For large N Z is a random variable and is the time taken to observe N events. Its expected value (average) is E(Z) A measure of the uncertainty in Z (about its mean) is its standard deviation
- 33. RELATING NUMBER OF EVENTS TO INTERVALS OF TIME E1 E2 E3 EN-1 EN ……. Event No ∆𝑡 𝑁∆𝑡1 ∆𝑡2 ∆𝑡3 • The uncertainty in Z can be translated in to an average number of events • As the total time and total the number of IID events observed in that time is related probabilistically via the distribution in the inter arrival time • So we may estimate an average number of events associated with this uncertainty 𝜎𝑧 2 = 𝑁𝜎∆𝑡 2 𝑁 = 𝜎𝑧 𝜇∆𝑡 = 𝑁𝜎∆𝑡 2 𝜇∆𝑡 Thus we may set a threshold “T” for the number of events observed in an interval of length to detect outliers E(Z) = 𝑁𝜇∆𝑡 𝑇 > 𝑁 + 𝑎 𝑓𝑎𝑐𝑡𝑜𝑟 ∗ 𝑁

- The element by element difference of the prices provides insights in to the underlying random processes …
- The element by element difference of the prices provides insights in to the underlying random processes …
- The element by element difference of the prices provides insights in to the underlying random processes …