Suppose, we are provided a data set from a hospital wherein they are comparing two medicines for same disease. They have measured patients response on a scale of 0-2 as:
0= no effect, 1=good effect, 2 = excellent.
DATA patients; INPUT treatment impact @@; CARDS; 1 1 1 . 1 0 2 1 2 2 2 0 2 1 1 2 2 2 2 . 1 1 ;RUN;
The above is a small data set and we can manually see how many times what impact each treatment has. What if data is huge?
PROC FREQ DATA=patients; TABLES impact; RUN;
Here tables means that for which all variables we want to see tabular freq dist.
It shows how many times each value appears, and at bottom shows no. of missing values.
By default, PROC FREQ creates a table of frequencies and percentages for both character and numeric variables
Frequency distributions work best with variables that contain Categorical Values
Now, suppose we also want to see what is the Frequency Distribution for treatment variable also, then we can include it as well with TABLES:
PROC FREQ DATA=patients; TABLES impact treatment; RUN;
It shows freq dist for each variable entered against Tables.
We can repeat the TABLES keyword, in case we need multliple tables.
Example, below gives exactly the same output as last one above:
PROC FREQ DATA=patients; TABLES treatment; TABLES impact; RUN;
If we are not interested in Cumulative Freq and Cumulative Percent we use “/ NOCUM“:
PROC FREQ DATA=patients; LABEL treatment = 'Kind of Treatment provided' impact = 'Patient Feedback for Treatment'; TABLES impact treatment / NOCUM; RUN;
NOCUM = No Cumulative Frequency or Percentage This is ONLY Applicable for 1-way tables
Using WEIGHT in PROC FREQ
In the previous case we were given each outcome once. That is response of each patient once.
What if we are given total count?
If we ask 100 people from 3 companies if they like their boss (0) or dislike(1)
And summarize their responses below:
Company Response Count 1 0 35 1 1 65 2 0 40 2 1 60 3 0 20 3 1 80
Here we don’t have individual responses but a summary of responses, we call the counts here as WEIGHT
PROC FORMAT; VALUE resp_code 0='like my boss' 1='Hate the boss'; RUN;
PROC FREQ DATA=empresp; TITLE1 'DO EMPLOYEES HATE THEIR BOSSES'; WEIGHT counts; TABLES response; FORMAT response resp_code.; RUN;
Cross-Tabulate using PROC FREQ
This gets easier to understand by example, lets take a 2-way table:
PROC FREQ DATA=patients; TABLE treatment*impact; RUN;
As you see, we get a table of Treatment vs Impact.
Please look at the output again below:
In the above 2-way table (called cross tabulation), for each value of treatment variable, on vertical bar, we have different values of impact on horizontal bar.
Then for each treatment, we have 4 types of data for each value of impact, FREQUENCY, PERCENT, ROW PERCENT and COLUMN PERCENT.
In cross-tabulation we can easily suppress any of these bye:
NOFREQ = Suppresses the display of cell frequencies in crosstabulation tables
NOPERCENT suppresses cell percentages
NOROW suppresses row percentages
NOCOL suppresses column percentages.
Please remember, in cross tabulation tables, we get:
FREQUENCY, PERCENT, ROW Pct and Col Pct
These are shown Vertically
Creating N-Way Tables
In the same lines of Cross-Tabulation, we have N-way tables, Lets understand by example:
Suppose we wanna compare Treatment vs “Impact – Sex of patients”.
Our data is :
PROC FREQ DATA=patients; TABLE treatment*impact*sex; RUN;
Here its simple, mutiple tables are created for each treatment value. Inside each table,
we have impact on left vertical and sex on right horizontal.
To summarize above display, its like:
levels ↓ tables treatment * impact * sex; ↑ ↑ rows + columns = two-way tables
Creating LIST Type Results
In case we want to see the Cross-Tabulation in the form of a list, we can use “/LIST“. This will remove ROW Pct and Col Pct, as List wise these have no meaning.
It brings back Cumulative Freq and Cum. Pct:
PROC FREQ DATA=patients; TABLE treatment*impact*sex/LIST; RUN;
“/LIST” lists each variable in the TABLE statement one after the other on Left side vertically.
MIXING LIST and CROSS TABULATION RESULTS
In case we want to have a summary after every group of outer most variable value, and we want to see it as a list, we can use “/CROSSLIST“.
This brings back the cross tabulation results for Row Pct and Col Pct. Cum Freq and Cum Pct again go off, but we get totals after every group:
PROC FREQ DATA=patients; TABLE treatment*impact/CROSSLIST; RUN;
As you see an extra Total detail appear at end of every group and a Total value also appears at end.