The field of statistics is concerned with collecting, analyzing, interpreting, and presenting data.
As technology becomes more present in our daily lives, more data is being generated and collected now than ever before in human history.
Statistics is the field that can help us understand how to use this data to do the following things:
- Gain a better understanding of the world around us.
- Make decisions using data.
- Make predictions about the future using data.
In this article we share 10 reasons for why the field of statistics is so important in modern life.
Reason 1: To Use Descriptive Statistics to Understand the World
Descriptive statistics are used to describe a chunk of raw data. There are three main types of descriptive statistics:
- Summary statistics
- Charts
- Tables
Each of these can help us gain a better understanding of existing data.
For example, suppose we have a set of raw data that shows the test scores of 10,000 students in a certain city. We can use descriptive statistics to:
- Calculate the average test score and the standard deviation of test scores.
- Generate a histogram or boxplot to visualize the distribution of test scores.
- Create a frequency table to understand the distribution of test scores.
Using descriptive statistics, we can understand the test scores of the students much more easily compared to just staring at the raw data.
Reason 2: To Be Wary of Misleading Charts
There are more charts being generated in journals, news outlets, online articles, and magazines than ever before. Unfortunately, charts can often be misleading if you don’t understand the underlying data.
For example, suppose some journal publishes a study that finds a negative correlation between GPA and ACT scores for students at a a certain university.
However, this negative correlation only occurs because the students who have both a high GPA and ACT score may go to an elite university while students who have both a low GPA and ACT score do not get admitted at all.
Although the correlation between ACT and GPA is positive in the population, the correlation appears to be negative in the sample.
This particular bias is known as Berkson’s bias. By being aware of this bias, you can avoid being mislead by certain charts.
Reason 3: To Be Wary of Confounding Variables
One important concept that you’ll learn about in statistics is the concept of confounding variables.
These are variables that are unaccounted for and can confound the results of an experiment and lead to unreliable findings.
For example, suppose a researcher collects data on ice cream sales and shark attacks and finds that the two variables are highly correlated. Does this mean that increased ice cream sales cause more shark attacks?
That’s unlikely. The more likely cause is the confounding variable temperature. When it is warmer outside, more people buy ice cream and more people go in the ocean.
Reason 4: To Make Better Decisions Using Probability
One of the most important sub-fields of statistics is probability. This is the field that studies how likely events are to happen.
By having a basic understanding of probability, you can make more informed decisions in the real world.
For example, suppose a high school student knows that they have a 10% chance of being accepted to a given university. Using the formula for the probability of “at least one” success, this student can find the probability that they’ll get accepted to at least one university they apply for and can adjust the number of universities they apply for accordingly.
Reason 5: To Understand P-Values in Research
Another important concept that you’ll learn about in statistics is p-values.
The textbook definition of a p-value is:
A p-value is the probability of observing a sample statistic that is at least as extreme as your sample statistic, given that the null hypothesis is true.
For example, suppose a factory claims that they produce tires that have a mean weight of 200 pounds. An auditor hypothesizes that the true mean weight of tires produced at this factory is different from 200 pounds so he runs a hypothesis test and finds that the p-value of the test is 0.04.
Here is how to interpret this p-value:
If the factory does indeed produce tires that have a mean weight of 200 pounds, then 4% of all audits will obtain the effect observed in the sample, or larger, because of random sample error. This tells us that obtaining the sample data that the auditor did would be pretty rare if indeed the factory produced tires that have a mean weight of 200 pounds.
Thus, the auditor would likely reject the null hypothesis that the true mean weight of tires produced at this factory is indeed 200 pounds.
Reason 6: To Understand Correlation
Another important concept that you’ll learn about in statistics is correlation, which tells us the linear association between two variables.
The value for a correlation coefficient always ranges between -1 and 1 where:
- -1 indicates a perfectly negative linear correlation between two variables
- 0 indicates no linear correlation between two variables
- 1 indicates a perfectly positive linear correlation between two variables
By understanding these values, you can understand the relationship between variables in the real world.
For example, if the correlation between advertisement spending and revenue is 0.87, then you can understand that there is a strong positive relationship between the two variables. As you spend more money on advertising, you can expect a predictable increase in revenue.
Reason 7: To Make Predictions About the Future
Another important reason to learn statistics is to understand basic regression models such as:
Each of these models allow you to make predictions about the future value of some response variable based on the value of certain predictor variables in the model.
For example, multiple linear regression models are used all the time in the real world by businesses when they use predictor variables such as age, income, ethnicity, etc. to predict how much customers will spend at their stores.
Similarly, logistics companies use predictor variables like total demand, population size, etc. to forecast future sales.
No matter which field you’re employed in, the odds are good that regression models will be used to predict some future phenomenon.
Reason 8: To Understand Potential Bias in Studies
Another reason to study statistics is to be aware of all the different types of bias that can occur in real-world studies.
Some examples include:
- Observer Bias
- Self-Selection Bias
- Referral Bias
- Omitted Variable Bias
- Undercoverage Bias
- Nonresponse Bias
By having a basic understanding of these types of biases, you can avoid committing them when performing research or be aware of them when reading through other research papers or studies.
Reason 9: To Understand the Assumptions Made by Statistical Tests
Many statistical tests make assumptions about the underlying data under study.
When reading the results of a study or even performing your own study, it’s important to understand what assumptions need to be made in order for the results to be reliable.
The following articles share the assumptions made in many commonly used statistical tests and procedures:
- What is the Assumption of Equal Variance in Statistics?
- What is the Assumption of Normality in Statistics?
- What is the Assumption of Independence in Statistics?
Reason 10: To Avoid Overgeneralization
Another reason to study statistics is to understand the concept of overgeneralization.
This occurs when the individuals in a study are not representative of the individuals in the overall population and therefore it’s inappropriate to generalize the conclusions from a study to the larger population.
For example, suppose we want to know what percentage of students at a certain school prefer “drama” as their favorite movie genre. If the total student population is a mix of 50% boys and 50% girls, then a sample with a mix of 90% boys and 10% girls might lead to biased results if far fewer boys prefer drama as their favorite genre.
Ideally, we want our sample to be like a “mini version” of our population. So, if the overall student population is composed of 50% girls and 50% boys, our sample would not be representative if it included 90% boys and only 10% girls.
Thus, whether you’re conducting your own survey or you’re reading about the results of a survey, it’s important to understand whether the sample data is representative of the total population and whether the findings of the survey can be generalized to the population with confidence.
Additional Resources
Check out the following articles to gain a basic understanding of the most important concepts in introductory statistics:
Descriptive vs. Inferential Statistics
Population vs. Sample
Statistic vs. Parameter
Qualitative vs. Quantitative Variables
Levels of Measurement: Nominal, Ordinal, Interval and Ratio