You can use the following basic syntax to randomly sample rows from a pandas DataFrame:
#randomly select one row df.sample() #randomly select n rows df.sample(n=5) #randomly select n rows with repeats allowed df.sample(n=5, replace=True) #randomly select a fraction of the total rows df.sample(frac=0.3) #randomly select n rows by group df.groupby('team', group_keys=False).apply(lambda x: x.sample(2))
The following examples show how to use this syntax in practice with the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], 'points': [25, 12, 15, 14, 19, 23, 25, 29], 'assists': [5, 7, 7, 9, 12, 9, 9, 4], 'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]}) #view DataFrame df team points assists rebounds 0 A 25 5 11 1 A 12 7 8 2 A 15 7 10 3 A 14 9 6 4 B 19 12 6 5 B 23 9 5 6 B 25 9 9 7 B 29 4 12
Example 1: Randomly Select One Row
The following code shows how to randomly select one row from the DataFrame:
#randomly select one row df.sample() team points assists rebounds 5 B 23 9 5
Example 2: Randomly Select n Rows
The following code shows how to randomly select n rows from the DataFrame:
#randomly select n rows df.sample(n=5) team points assists rebounds 5 B 23 9 5 2 A 15 7 10 4 B 19 12 6 6 B 25 9 9 1 A 12 7 8
Example 3: Randomly Select n Rows with Repeats Allowed
The following code shows how to randomly select n rows from the DataFrame, with repeat rows allowed:
#randomly select 5 rows with repeats allowed df.sample(n=5, replace=True) team points assists rebounds 6 B 25 9 9 7 B 29 4 12 5 B 23 9 5 1 A 12 7 8 5 B 23 9 5
Example 4: Randomly Select A Fraction of the Total Rows
The following code shows how to randomly select a fraction of the total rows from the DataFrame
#randomly select 25% of rows df.sample(frac=0.25) team points assists rebounds 2 A 15 7 10 1 A 12 7 8
Example 5: Randomly Select n Rows by Group
The following code shows how to randomly select n rows by group from the DataFrame
#randomly select 2 rows from each team df.groupby('team', group_keys=False).apply(lambda x: x.sample(2)) team points assists rebounds 0 A 25 5 11 2 A 15 7 10 7 B 29 4 12 4 B 19 12 6
Notice that 2 rows from team ‘A’ and 2 rows from team ‘B’ were randomly sampled.
Note: You can find the complete documentation for the pandas sample() function here.
Additional Resources
The following tutorials explain how to perform other common sampling methods in Pandas:
How to Perform Stratified Sampling in Pandas
How to Perform Cluster Sampling in Pandas
How to Perform Stratified Sampling in Pandas