Agenda
Agenda Introduction 01 Introduction to Statistics And Probability Getting Started 02 Concepts 03 Use Case 04 Getting Started With Python for Probability A practical Python use-case to understand Python faster! Overview of the simple concepts that's involved
Why Python For Statistics?
Why Python For Statistics? R is a language dedicated for statistics! Then why Python? Building complex analysis pipelines that mix statistics with Image Analysis, Text Mining etc.. Here, the richness of Python is an invaluable asset!
What is Probability?
What is Probability? What is the chance of an event happening? How do you answer this? We need to consider all the other events that can occur before coming to a conclusion!
The Coin Toss What are the outcomes for a coin toss? Flipping a heads Flipping a tails Any other outcome? NO! We call this the Sample Space!
The Coin Toss What are the outcomes for a coin toss? A 100 Heads and 10 Tails, is this fair? Yes, the outcome here is to gather data, use statistics to make predictions and compare!
The Coin Toss – Data Generation
Too early for code?
The Coin Toss – Code import random def coin_trial(): heads = 0 for i in range(100): if random.random() <= 0.5: heads +=1 return heads def simulate(n): trials = [] for i in range(n): trials.append(coin_trial()) return(sum(trials)/n) simulate(10) >> 5.4 simulate(100) >>> 4.83 simulate(1000) >>> 5.055 simulate(1000000) >>> 4.999781
The Coin Toss – The Theory Given enough data, statistics enables us to calculate probabilities using real-world observations
The Coin Toss – Python What are the chances of someone developing a disease over time? What is probability that a critical car component will fail when you are driving? Python making our lives simpler with this!
Data And Distribution
Data And Distribution Let's tackle "Which wine is better than average" You need to know the nature of the data! Normal Distribution Normal distribution refers to a particularly important phenomenon in the realm of probability and statistics.
Data And Distribution The high point in a normal distribution represents the event with the highest probability of occurring!
Revisiting The Normal
Revisiting The Normal Two major factors Central Limit Theorem Three Sigma Rule Central Limit Theorem dictates that the distribution of the estimates will look like a normal distribution. The Three Sigma rule dictates that given a normal distribution, 68% of your observations will fall between one standard deviation of the mean. 95% will fall within two, and 99.7% will fall within three. Learning Python
Z-Score Learning Python
Use-Case: Poker Probability
Use-Case: Poker Prediction Can we predict the outcome of probability of occurrence of a poker hand?
Use-Case: Poker Prediction Let's look at the basics 52 cards in a standard deck! 4 of each shape For an Ace - P(A) = 4/52
Use-Case: Poker Prediction Poker Without Python Poker With Python Texas Hold'em Pre-Flop: Each player is dealt two cards, known as "hole cards" Flop: Three community cards are dealt Turn: One community card is dealt River: Final community card is dealt
Use-Case: Poker Prediction Dependent Events: Flush Draw Your Hand Community Cards
Use-Case: Poker Prediction Dependent Events: Open-Ended Straight Draw Your Hand Community Cards
Use-Case: Poker Prediction Involve Opponents now! Your Hand Community Cards Opponent's Hand Total Pot = $60 Opponents Bet = $20
Conclusion
Conclusion