The conditional probability that event A occurs, given that event B has occurred, is calculated as follows:
P(A|B) = P(A∩B) / P(B)
where:
P(A∩B) = the probability that event A and event B both occur.Â
P(B) = the probability that event B occurs.
The following examples show how to use this formula to calculate conditional probabilities in R.
Example 1: Calculate Conditional Probability Using Values
Suppose we send out a survey to 300 individuals asking them which sport they like best: baseball, basketball, football, or soccer.
Suppose we know that the probability that an individual is male and prefers baseball as their favorite sport is 0.113.
Suppose we also know that the probability that any individual prefers baseball as their favorite sport is 0.226.
Given that an individual prefers baseball, we could calculate the probability that they’re male to be:
- P(Male|Prefers Baseball) = P(Male∩Prefers Baseball) / P(Prefers Baseball)
- P(Male|Prefers Baseball) = 0.113 / 0.226
- P(Male|Prefers Baseball) = 0.5
Given that an individual prefers baseball, the probability that they’re male is 0.5.
Here’s how we can calculate this probability in R:
#define probability of being male and preferring baseball p_male_baseball #define probability of preferring baseball p_baseball #calculate probability of being male, given that individual prefers baseball p_male_baseball / p_baseball [1] 0.5
Example 2: Calculate Conditional Probability Using a Table
Suppose we send out a survey to 300 individuals asking them which sport they like best: baseball, basketball, football, or soccer.
We can create the following table in R to hold the survey responses:
#create data frame to hold survey responses df frame(gender=rep(c('Male', 'Female'), each=150), sport=rep(c('Baseball', 'Basketball', 'Football', 'Soccer', 'Baseball', 'Basketball', 'Football', 'Soccer'), times=c(34, 40, 58, 18, 34, 52, 20, 44))) #create two-way table from data frame survey_data #view table survey_data Baseball Basketball Football Soccer Sum Female 34 52 20 44 150 Male 34 40 58 18 150 Sum 68 92 78 62 300
We can use the following syntax to extract values from the table:
#extract value in second row and first column
survey_data[2, 1]
[1] 34
We can use the following syntax to calculate the probability that an individual is male, given that they prefer baseball as their favorite sport:
#calculate probability of being male, given that individual prefers baseball
survey_data[2, 1] / survey_data[3, 1]
[1] 0.5
And we can use the following syntax to calculate the probability that an individual prefers basketball as their favorite sport, given that they’re female:
#calculate probability of preferring basketball, given that individual is female
survey_data[1, 2] / survey_data[1, 5]
[1] 0.3466667
We can use this basic approach to calculate any conditional probability we’d like from the table.
Additional Resources
The following tutorials provide additional information on dealing with probability:
Law of Total Probability
How to Find the Mean of a Probability Distribution
How to Find the Standard Deviation of a Probability Distribution