Sub-setting of data set
In the previous topic, we have learned to concatenate multiple data set into a single data set. Now, in this topic, we are going to learn the sub-setting of the data set.
In the SAS programming language, sub-setting is done by selecting a particular number of variables or observations or both from a data set.
Sub-setting of data set means extracting and holding a particular number of variables or observations or both from data set. SAS provides three statements for sub-setting:
- KEEP Statement
- DROP Statement
- DELETE Statement
Sub-setting of variables is done by KEEP Statement and DROP Statement while DELETE statement is used for sub-setting of observations.
Besides, the resultant data from the sub-setting operation is organized in a new data set which can be used for further analysis. Sub-settings are primarily used to analyze a portion of the data set without using those variables or observations which may not be relevant for analysis.
Sub-setting of Variables
Sub-setting of variables includes extraction and insertion of the only particular number of variables from the entire data set.
Syntax:
Where,
Var1 and var2: These are the variables from the data set that needs to be kept or dropped.
KEEP Statement
We can use KEEP Statement to hold the values of required variables.
Example:
Consider the below data set containing the student’s details of an institute. If we want to use only the values of studyid and age from the data set student, then we can use KEEP Statement.
Execute the above code in SAS studio:
Output:
As you can see in the output, SASonly holds the value of variables that are mentioned under the KEEP statement.
DROP Statement:
We can use a DROP statement to remove the values of redundant variables.
Example:
Execute the above code in SAS studio:
Output:
As you can see in the output, SAS has removed both variables (studyid and age) that are mentioned under DROP statement.
DELETE Statement (Sub-setting of Observations)
In the sub-setting of observations, we can manipulate data set on the basis of a single variable. It extracts values of variable on the basis of the given condition.
Syntax:
Where,
Var: This is the name of the variable, on the basis of which, observations will be deleted.
Condition: This is the Boolean condition that return two values one is true, and another is false.All observations will be saved until the condition is true, and all observations will be removed when the condition is false.
Example:
Consider the below data set containing students details of an institute. If we required the details only of students whose id value is less than 5, then we can use the following code:
Execute the above code in SAS studio:
Output:
As you can see in the output, SAS has removed all the observations which id value is less than 5.