Modeling, Simulation & Data Mining: Answering Tough Cost, Date & Staff Forecasts Questions provides techniques for using modeling, simulation, and data mining to answer difficult questions about project costs, dates, and staffing needs. The presentation discusses using the Scrum and Kanban frameworks in simulation models to forecast outcomes under different conditions. It emphasizes that forecasts should include uncertainty and risk, and that risk factors often have a bigger impact on outcomes than estimated backlog alone. Sensitivity analysis and Monte Carlo simulation are presented as ways to better understand uncertainty and communicate risk to executives. Best practices for model building and experimentation are also provided.
On Thursday, April 24th, 2014, the Asian Community Resource Center (ACRC) will hold the Inaugural Seed of Hope benefit at the Springhill Suites Las Vegas Convention Center. The benefit will offer a night of celebration & inspiration with dinner, entertainment, honored guests, awards, and silent auctions featuring products and gift certificates from local businesses as well as trips and sporting packages.
For more information to RSVP or donate silent auction items:
Cynthia@lvacrc.org
702.763.0601
The proceeds will benefit the Asian Community Resource Center, a non-profit organization (pending 501(c)3 status) serving the families in Southern Nevada to enrich the human experience with hope, strength and dignity. The assistance the families receive from ACRC makes all the difference in their lives.
On Thursday, April 24th, 2014, the Asian Community Resource Center (ACRC) will hold the Inaugural Seed of Hope benefit at the Springhill Suites Las Vegas Convention Center. The benefit will offer a night of celebration & inspiration with dinner, entertainment, honored guests, awards, and silent auctions featuring products and gift certificates from local businesses as well as trips and sporting packages.
For more information to RSVP or donate silent auction items:
Cynthia@lvacrc.org
702.763.0601
The proceeds will benefit the Asian Community Resource Center, a non-profit organization (pending 501(c)3 status) serving the families in Southern Nevada to enrich the human experience with hope, strength and dignity. The assistance the families receive from ACRC makes all the difference in their lives.
"A Framework for Developing Trading Models Based on Machine Learning" by Kris...Quantopian
Presented at QuantCon Singapore 2016, Quantopian's quantitative finance and algorithmic trading conference, November 11th.
Machine learning is improving facets of our lives as diverse as health screening, transportation and even our entertainment choices. It stands to reason that machine learning can also improve trading performance, however the practical application is fraught with pitfalls and obstacles that nullify the benefits and present a high barrier to entry. Building on background information and introductory material, Kris will propose a framework for efficient and robust experimentation with machine learning methods for algorithmic trading. The framework's objective is to arrive at parsimonious models whose positive past performance is unlikely to be due to chance. The framework is demonstrated via practical examples of various machine learning models for algorithmic trading.
Bootstrapping of PySpark Models for Factorial A/B TestsDatabricks
A/B testing, i.e., measuring the impact of proposed variants of e.g. e-commerce websites, is fundamental for increasing conversion rates and other key business metrics.
We have developed a solution that makes it possible to run dozens of simultaneous A/B tests, obtain conclusive results sooner, and get more interpretable results than just statistical significance, but rather probabilities of the change having a positive effect, how much revenue is risked, etc.
To compute those metrics, we need to estimate the posterior distributions of the metrics, which are computed using Generalized Linear Models (GLMs). Since we process gigabytes of data, we use a PySpark implementation, which however does not provide standard errors of coefficients. We, therefore, use bootstrapping to estimate the distributions.
In this talk, I’ll describe how we’ve implemented parallelization of an already parallelized GLM computation to be able to scale this computation horizontally over a large cluster in Databricks and describe various tweaks and how they’ve improved the performance.
Monte Carlo Schedule Risk Analysis: The Concept, Benefits, and Limitations. How Monte Carlo schedule risk analysis works; how to perform Monte Carlo simulations of project schedules.
For more information how to perform schedule risk analysis using RiskyProject software please visit Intaver Institute web site: http://www.intaver.com.
About Intaver Institute.
Intaver Institute Inc. develops project risk management and project risk analysis software. Intaver's flagship product is RiskyProject: project risk management software. RiskyProject integrates with Microsoft Project, Oracle Primavera, other project management software or can run standalone. RiskyProject comes in three configurations: RiskyProject Lite, RiskyProject Professional, and RiskyProject Enterprise.
This presentaiton on Overall Equipment Effectiveness, Down Time Analytics and Assett Utilization was developed by me and a coleague during my tenure at ISS. Presentation was given to the Chattanooga, TN Chapter of the SME.
Stop Flying Blind! Quantifying Risk with Monte Carlo SimulationSam McAfee
Product development is inherently risky. While lean and agile methods are praised for supporting rapid feedback from customers through experiments and continuous iteration, teams could do a lot better at prioritizing using basic modeling techniques from finance. This talk will focus on quantitative risk modeling when developing new products or services that do not have a well understood product/market fit scenario. Using modeling approaches like Monte Carlo simulations and Cost of Delay scenarios, combined with qualitative tools like the Lean Canvas and Value Dynamics, we will explore how lean innovation teams can bring scientific rigor back into their process.
Have your Agile practices become stale or redundant? Does it feel like your team is just going through the motions? Have team members asked to discontinue “critical Agile practices” and ceremonies?
In Lean product development, the minimum viable product or MVP, is defined as the product with the highest return on investment versus risk. It’s a strategy to avoid building products that customers don’t need or want by maximizing our learning of what is valuable to the customer.
Agile is typically learned through exposure to a series of Agile practices, a recipe of sorts. But what if that recipe goes beyond minimal? Have we replaced heavy waterfall process with heavy Agile process?
This session will interrogate the thinking behind some of the Agile sacred cows like detailed sprint planning, detailed release planning, and even some popular estimation techniques. We will try to identify what is truly needed to be Agile, based on needs instead of prescribed recipes. What is minimally sufficient to start realizing the benefits of Agile?
What is your MVA? It might be different than you think!
A department, somewhere in EU, depends on having a steady input of 3000 new textual documents per day, 365 days a year. Documents come from 10 different sources and each document comes pre-classified into a single category of a large taxonomy. The department is unhappy: the accuracy of incoming document classifications seems to be low. Even after the department puts additional 800% FTE throughout the year to manually repair or discard wrongly classified documents, the accuracy still lags behind their targets. NIRI was hired to conduct a research and develop an accurate document classifier. The plan was to use NIRI’s classifier to replace the unreliable classes coming with documents, and thus solve the problem of low accuracy, as well as reduce the high cost of 800% FTE. In this talk we will share our experiences: classification approach used to meet the needs of our client, challenges in demonstrating progress during the project, and the approach used for the acceptance-validation of our classifier.
OSMC 2023 | Know your data: The stats behind your alerts by Dave McAllisterNETWAYS
Quick, what’s the difference between the mean, the mode and the median? Do you need a Gaussian or a normal distribution And does your choice impact the alerts and observations you get from your observability tools?
Come get refreshed on the impact some basic choices in statistical behavior can have on what gets triggered. Learn why a median might be the choice for historical anomaly or sudden change. Jump into Gaussian distributions, data alignment challenges, and the trouble with sampling. Walk out with a deeper understanding of your metrics and what they might tell you.
On how the current top-down (command-and-)control approach, and the \'middle-out\' modelling aproach, will and can not work in the end. A new paradigm, bottom-up KISS risk management will be needed.
We are doing Agile well..We have been Agile now.. Is it just an assumption or do we have data to support it? Do metrics add any value or they are just a fad? Good metrics affirm & reinforce Agile principles. They open up the conversation and help the teams to improve. They are not only for management, it is for everyone who wants to inspect and adapt.
So this presentation is about how metrics can be used effectively in Agile to enable transparency and improve the overall efficiency at the team/ program and portfolio level.
This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit.
MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme.
For the latest updates, follow us on Twitter - #MosesCore
Presented at All Things Open 2023
Presented by Dave McAllister - nginx
Title: Know Your Data: The stats behind your alerts
Abstract: Quick, what's the difference between the mean, the mode and the median? Which mean do you mean? Do you need a Gaussian or a normal distribution And does your choice impact the alerts and observations you get from your observability tools?
Come get refreshed on the impact some basic choices in statistical behavior can have on what gets triggered. Learn why a median might be the choice for historical anomaly or sudden change. Jump into Gaussian distributions, data alignment challenges and the trouble with sampling. Walk out with a deeper understanding of your metrics and what they might be telling you.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
"A Framework for Developing Trading Models Based on Machine Learning" by Kris...Quantopian
Presented at QuantCon Singapore 2016, Quantopian's quantitative finance and algorithmic trading conference, November 11th.
Machine learning is improving facets of our lives as diverse as health screening, transportation and even our entertainment choices. It stands to reason that machine learning can also improve trading performance, however the practical application is fraught with pitfalls and obstacles that nullify the benefits and present a high barrier to entry. Building on background information and introductory material, Kris will propose a framework for efficient and robust experimentation with machine learning methods for algorithmic trading. The framework's objective is to arrive at parsimonious models whose positive past performance is unlikely to be due to chance. The framework is demonstrated via practical examples of various machine learning models for algorithmic trading.
Bootstrapping of PySpark Models for Factorial A/B TestsDatabricks
A/B testing, i.e., measuring the impact of proposed variants of e.g. e-commerce websites, is fundamental for increasing conversion rates and other key business metrics.
We have developed a solution that makes it possible to run dozens of simultaneous A/B tests, obtain conclusive results sooner, and get more interpretable results than just statistical significance, but rather probabilities of the change having a positive effect, how much revenue is risked, etc.
To compute those metrics, we need to estimate the posterior distributions of the metrics, which are computed using Generalized Linear Models (GLMs). Since we process gigabytes of data, we use a PySpark implementation, which however does not provide standard errors of coefficients. We, therefore, use bootstrapping to estimate the distributions.
In this talk, I’ll describe how we’ve implemented parallelization of an already parallelized GLM computation to be able to scale this computation horizontally over a large cluster in Databricks and describe various tweaks and how they’ve improved the performance.
Monte Carlo Schedule Risk Analysis: The Concept, Benefits, and Limitations. How Monte Carlo schedule risk analysis works; how to perform Monte Carlo simulations of project schedules.
For more information how to perform schedule risk analysis using RiskyProject software please visit Intaver Institute web site: http://www.intaver.com.
About Intaver Institute.
Intaver Institute Inc. develops project risk management and project risk analysis software. Intaver's flagship product is RiskyProject: project risk management software. RiskyProject integrates with Microsoft Project, Oracle Primavera, other project management software or can run standalone. RiskyProject comes in three configurations: RiskyProject Lite, RiskyProject Professional, and RiskyProject Enterprise.
This presentaiton on Overall Equipment Effectiveness, Down Time Analytics and Assett Utilization was developed by me and a coleague during my tenure at ISS. Presentation was given to the Chattanooga, TN Chapter of the SME.
Stop Flying Blind! Quantifying Risk with Monte Carlo SimulationSam McAfee
Product development is inherently risky. While lean and agile methods are praised for supporting rapid feedback from customers through experiments and continuous iteration, teams could do a lot better at prioritizing using basic modeling techniques from finance. This talk will focus on quantitative risk modeling when developing new products or services that do not have a well understood product/market fit scenario. Using modeling approaches like Monte Carlo simulations and Cost of Delay scenarios, combined with qualitative tools like the Lean Canvas and Value Dynamics, we will explore how lean innovation teams can bring scientific rigor back into their process.
Have your Agile practices become stale or redundant? Does it feel like your team is just going through the motions? Have team members asked to discontinue “critical Agile practices” and ceremonies?
In Lean product development, the minimum viable product or MVP, is defined as the product with the highest return on investment versus risk. It’s a strategy to avoid building products that customers don’t need or want by maximizing our learning of what is valuable to the customer.
Agile is typically learned through exposure to a series of Agile practices, a recipe of sorts. But what if that recipe goes beyond minimal? Have we replaced heavy waterfall process with heavy Agile process?
This session will interrogate the thinking behind some of the Agile sacred cows like detailed sprint planning, detailed release planning, and even some popular estimation techniques. We will try to identify what is truly needed to be Agile, based on needs instead of prescribed recipes. What is minimally sufficient to start realizing the benefits of Agile?
What is your MVA? It might be different than you think!
A department, somewhere in EU, depends on having a steady input of 3000 new textual documents per day, 365 days a year. Documents come from 10 different sources and each document comes pre-classified into a single category of a large taxonomy. The department is unhappy: the accuracy of incoming document classifications seems to be low. Even after the department puts additional 800% FTE throughout the year to manually repair or discard wrongly classified documents, the accuracy still lags behind their targets. NIRI was hired to conduct a research and develop an accurate document classifier. The plan was to use NIRI’s classifier to replace the unreliable classes coming with documents, and thus solve the problem of low accuracy, as well as reduce the high cost of 800% FTE. In this talk we will share our experiences: classification approach used to meet the needs of our client, challenges in demonstrating progress during the project, and the approach used for the acceptance-validation of our classifier.
OSMC 2023 | Know your data: The stats behind your alerts by Dave McAllisterNETWAYS
Quick, what’s the difference between the mean, the mode and the median? Do you need a Gaussian or a normal distribution And does your choice impact the alerts and observations you get from your observability tools?
Come get refreshed on the impact some basic choices in statistical behavior can have on what gets triggered. Learn why a median might be the choice for historical anomaly or sudden change. Jump into Gaussian distributions, data alignment challenges, and the trouble with sampling. Walk out with a deeper understanding of your metrics and what they might tell you.
On how the current top-down (command-and-)control approach, and the \'middle-out\' modelling aproach, will and can not work in the end. A new paradigm, bottom-up KISS risk management will be needed.
We are doing Agile well..We have been Agile now.. Is it just an assumption or do we have data to support it? Do metrics add any value or they are just a fad? Good metrics affirm & reinforce Agile principles. They open up the conversation and help the teams to improve. They are not only for management, it is for everyone who wants to inspect and adapt.
So this presentation is about how metrics can be used effectively in Agile to enable transparency and improve the overall efficiency at the team/ program and portfolio level.
This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit.
MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme.
For the latest updates, follow us on Twitter - #MosesCore
Presented at All Things Open 2023
Presented by Dave McAllister - nginx
Title: Know Your Data: The stats behind your alerts
Abstract: Quick, what's the difference between the mean, the mode and the median? Which mean do you mean? Do you need a Gaussian or a normal distribution And does your choice impact the alerts and observations you get from your observability tools?
Come get refreshed on the impact some basic choices in statistical behavior can have on what gets triggered. Learn why a median might be the choice for historical anomaly or sudden change. Jump into Gaussian distributions, data alignment challenges and the trouble with sampling. Walk out with a deeper understanding of your metrics and what they might be telling you.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
Similar to Modeling, simulation & data mining agile 2012 (magennis & maccherone) (20)
5. My Mission
Arm my teams (and yours)
with the tools and techniques
to solve these problems
6.
7. 2 Minutes About Larry
• Larry is a Pisces who enjoys skiing, reading
and wine (red, or white in outdoor setting)
• We have a lot in common… over to Larry!
9. Why measure?
Feedback
Diagnostics
Forecasting
Lever
10. When to NOT take a shot
Good players?
• Monta Ellis
– 9th highest scorer (8th
last season)
• Carmelo Anthony
(Melo)
– 8th highest scorer (3rd
last season)
15. You will be wrong by…
• 3x-10x when assuming Normal distribution
• 2.5x-5x when assuming Poisson distribution
• 7x-20x if you use Shewhart’s method
Heavy tail phenomena are not
incomprehensible… but they cannot be
understood with traditional statistical tools.
Using the wrong tools is incomprehensible.
~ Roger Cooke and Daan Nieboer
16. Bad application of control chart
Control is an illusion, you infantile
egomaniac. Nobody knows what's gonna
happen next: not on a freeway, not in an
airplane, not inside our own bodies and
certainly not on a racetrack with 40 other
infantile egomaniacs.
~Days of Thunder
17. Time in Process (TIP) Chart
A good alternative to control chart
18. Collection
• Perceived cost
is high
• Little need for
explicit collection
activities
• Use a 1-question NPS survey for customer and
employee satisfaction
• Plenty to learn in passive data from ALM and
other tools
• How you use the tools will drive your use of
metrics from them
19. Summary of how to make good
metric choices
• Start with outcomes and
use ODIM to make metrics Data visualization is like
choices. photography. Impact is a
• Make sure your metrics are function of perspective,
balanced so you don’t illumination, and focus.
over-emphasize one at the ~Larry Maccherone
cost of others.
• Be careful in your analysis. The TIP chart is a good
alternative to control chart. Troy’s approach is excellent
for forecasting. We’ve shown that there are many out
there that are not so good.
• Consider collection costs. Get maximal value out of
passively gathered data.
22. A model is a tool
used to mimic a
real world process
A tool for low-cost
experimentation
23. Monte Carlo Simulation?
Performing a simulation of a
model multiple times using
random input conditions and
recording the frequency of
each result occurrence
24. Scrum
Run Sim Total
Backlog This Iteration Deployed Iterations
1 3
2 2
3 5
5 2 4 3
5 4
6 2
8 … …
25. Kanban
Run Time Total
Backlog Design Develop Test
1 – 2 days 1 – 5 days 1 – 2 days Deployed 1 5
2 4
3 3
4 9
2 5 5
6 6
… …
26. Result versus Frequency (50 runs)
More Often
25
Frequency of Result
20
15
10
5
1
10 15 20
Less Often
Result Values – For example, Days
27. Result versus Frequency (250 runs)
More Often
25
Frequency of Result
20
15
10
5
1
10 15 20
Less Often
Result Values – For example, Days
28. Result versus Frequency (1000+ runs)
More Often
25
Frequency of Result
20
15
10
5
1
10 15 20
Less Often
Result Values – For example, Days
29. Key Point
There is NO single
forecast result
There will always be many
possible results, some more likely
30. 50% 50%
Possible Possible
Likelihood
Outcomes Outcomes
Time to Complete Backlog
When pressed for a single number,
we often give the average.
31. 95% Outcomes 5%
Likelihood
Time to Complete Backlog
Monte Carlo Simulation Yields More
Information – 95% Common.
32. Key Point
“Average” is
NEVER an option
WARNING: Regression lines
are most often “average”
40. In this demo
• Basic Scrum and Kanban Modeling
• How to build a simple model
– SimML Modeling Language
– Visual checking of models
– Forecasting Date and Cost
– The “Law of Large Numbers”
43. Staff Skill Impact Report
Explore what staff
changes have the
greatest impact
44. Key Point
Modeling helps
find what matters
Fewer estimates required
45. In this demo
• Finding what matters most
– Manual experiments
– Sensitivity Testing
• Finding the next best 3 staff skill hires
• Minimizing and simplifying estimation
– Grouping backlog
– Range Estimates
– Deleting un-important model elements
47. Outsourcing Cost & Benefits
• Outsourcing often controversial
– Often fails when pursued for cost savings alone
– Doesn’t always reduce local employment
– An important tool to remain competitive
– I.Q. has no geographic boundaries
• Many models
– Entire project
– Augmentation of local team
48. Build Date & Cost Matrix
1x 1.5 x 2x
Estimates Estimates Estimates
1 x Staff Best Case
1.5 x Staff Midpoint
2 x Staff Worst Case
Benefit =
(Baseline Dev Cost – New Dev Cost) - Cost of Delay
+ Local Staff Cost Savings
49. NOT LINEAR & NOT YOUR PROJECT
$150,000
$100,000
$50,000
1x Multiplier
$- 1.5x Multiplier
1 1.5 2 2x Multiplier
$(50,000)
$(100,000)
$(150,000)
50. In this demo
• Model the impact of various outsourcing
models
51. New Project Rules of Thumb…
• Cost of Delay plays a significant role
– High cost of delay project poor candidates
– Increase staffing some compensation
• Knowledge transfer and ramp-up time critical
– Complex products poor candidates
– Captive teams better choices for these projects
• NEVER as simple as direct lower costs!
53. Speaking Risk To Executives
• Buy them a copy of “Flaw of Averages”
• Show them you are tracking & managing risk
• Do
– “We are 95% certain of hitting date x”
– “With 1 week of analysis, that may drop to date y”
– “We identified risk x, y & z that we will track weekly”
• Don’t
– Give them a date without likelihood
• “February 29th 2013”
– Give them a date without risk factors considered
• “To do the backlog of features, February 29th, 2013”
54. **Major risk events have the predominate role
in deciding where deliver actually occurs **
We spend all our
time estimating here
Plan Performance External Vendor
Issues Delay
1 2 3
59. Key Points
• There is no single release date forecast
• Never use Average as a quoted forecast
• Risk factors play a major role (not just backlog)
• Data has shape: beware of Non-Normal data
• Measurement → Insight → Decisions →
Outcomes : Work Backwards!
• Communicate Risk early with executive peers
60. Call to action
• Read these books
• Download the software FocusedObjective.com
• Follow @AgileSimulation
• Follow @LMaccherone
64. Sensitivity Model
Test (a little)
The Model
Creation
Cycle
Monte- Visually
Carlo Test Test
65. Make
Informed Baseline
Decision(s)
The
Experiment
Cycle
Make
Compare
Single
Results
Change
66. Best Practice 1
Start simple and add ONE
input condition at a time.
Visually / Monte-carlo test
each input to verify it works
67. Best Practice 2
Find the likelihood of major
events and estimate delay
E.g. vendor dependencies,
performance/memory issues,
third party component
failures.
68. Best Practice 3
Only obtain and add detailed
estimates and opinion to a
model if Sensitivity Analysis
says that input is material
69. Best Practice 4
Use a uniform random input
distribution UNTIL sensitivity
analysis says that input is
influencing the output
70. Best Practice 5
Educate your managers’
about risk. They will still want
a “single” date for planning,
but let them decide 75 th or
95 th confidence level
(average is NEVER an option)
77. Focused Objective
• Risk Tools for Software Dev
• Scrum/Agile Simulation
• Kanban/Lean Simulation
• Forecasting Staff, Date & Cost
• Automated Sensitivity Analysis
• Data Reverse Engineering
• Consulting / Training
• Book
78. We Use & Recommend: EasyFit
• MathWave.com
• Invaluable for
– Analyzing data
– Fitting Distributions
– Generating Random
Numbers
– Determining
Percentiles