Home » How to Calculate Conditional Probability in Python

How to Calculate Conditional Probability in Python

by Tutor Aspire

The conditional probability that event A occurs, given that event B has occurred, is calculated as follows:

P(A|B) = P(A∩B) / P(B)

where:

P(A∩B) = the probability that event A and event B both occur. 

P(B) = the probability that event B occurs.

The following example shows how to use this formula to calculate conditional probabilities in Python.

Example: Calculate Conditional Probability in Python

Suppose we send out a survey to 300 individuals asking them which sport they like best: baseball, basketball, football, or soccer.

We can create the following table in Python to hold the survey responses:

import pandas as pd
import numpy as np

#create pandas DataFrame with raw data
df = pd.DataFrame({'gender': np.repeat(np.array(['Male', 'Female']), 150),
                   'sport': np.repeat(np.array(['Baseball', 'Basketball', 'Football',
                                                'Soccer', 'Baseball', 'Basketball',
                                                'Football', 'Soccer']), 
                                    (34, 40, 58, 18, 34, 52, 20, 44))})

#produce contingency table to summarize raw data
survey_data = pd.crosstab(index=df['gender'], columns=df['sport'], margins=True)

#view contingency table
survey_data

sport	Baseball	Basketball	Football	Soccer	 All
gender					
Female	      34	        52	      20	    44	 150
Male	      34	        40	      58	    18	 150
All	      68	        92	      78	    62	 300

Related: How to Use pd.crosstab() to Create Contingency Tables in Python

We can use the following syntax to extract values from the table:

#extract value in second row and first column 
survey_data.iloc[1, 0]

[1] 34

We can use the following syntax to calculate the probability that an individual is male, given that they prefer baseball as their favorite sport:

#calculate probability of being male, given that individual prefers baseball
survey_data.iloc[1, 0] / survey_data.iloc[2, 0]

0.5

And we can use the following syntax to calculate the probability that an individual prefers basketball as their favorite sport, given that they’re female:

#calculate probability of preferring basketball, given that individual is female
survey_data.iloc[0, 1] / survey_data.iloc[0, 4]

0.3466666666666667

We can use this basic approach to calculate any conditional probability we’d like from the contingency table.

Additional Resources

The following tutorials provide additional information on dealing with probability:

Law of Total Probability
How to Find the Mean of a Probability Distribution
How to Find the Standard Deviation of a Probability Distribution

You may also like