Proc print is one of the most widely used Procedures in SAS and reasons we shall soon find out.
In a simple form proc print allows to print all obs in a dataset:
PROC PRINT DATA=emp; RUN;
Also we have seen that we can optionally select only certain obs to be printed by:
OPTIONS FIRSTOBS = 3 OBS = MAX; ==> From 3rd to last obs or PROC PRINT DATA=emp (FIRSTOBS = 3 OBS = 6);
By default the PROC PRINT shows the first variable as “OBS” for observation number. Suppose we print from 3rd to last obs:
PROC PRINT DATA=emp (FIRSTOBS = 3); RUN;
It shows obs from 3-4.
In case we don’t want to show the obs variable, we use NOOBS
PROC PRINT DATA=emp (FIRSTOBS = 3) NOOBS; RUN;
The first OBS variable is now gone….
SELECT VARIABLES TO PRINT
It is possible to not show all variables in output using “VAR” statement:
PROC PRINT DATA= emp NOOBS; VAR name salary; RUN;
INDEX OUTPUT WITH PROC PRINT
Though not what its called officially, but I call it this way. If we don’t want OBS variable to be listed, then also we might like some way to be able to refer individual obs in the report generated by PROC PRINT.
We use “ID” statement.
Suppose we want the output to be indexed by employee_id variable:
PROC PRINT DATA=emp; ID emp_id; RUN;
The emp_id variable now appears as the index, but the WORST PART, THE VALUES APPEAR AS IT IS. They are NOT SORTED BY THE ID VARIABLE.
We can have multiple variables as By Variables and they would all appear in GREY COLOUR.
CAUTION: if the index variables are also selected with VAR statement, they appear twice
Example: Below we have id variables as emp_id and name, later using Var statement we have selected emp_id and name again, they would appear twice.
PROC PRINT DATA=emp; ID emp_id name; VAR salary age name emp_id; RUN;
Therefore, unless we really want to see these variables again, we should not select them in VAR statement.
The major advantage of ID statement is that we don’t need to use NOOBS to avoid obs number to be printed.
Moreover, if we don’t use VAR statement with ID, then the ID variables are not duplicated.
DRK – PROC PRINT
As mentioned before, Drop Rename Keep, just hates all the PROC steps. We can’t use DRK in PROC, including PROC PRINT.
DRK statements can only we used for Data Set creation and manipulations.
In case we become shameless and try below code:
PROC PRINT DATA=emp; DROP name; RUN;
We get a warning:
The DROP and KEEP statements are not supported in procedure steps. Therefore, these statements are ignored.
The step doesn’t error, the data is printed but the DRK are ignored.
In case we use RENAME, then we do get error.
WORKAROUND FOR DRK IN PROC
Ok, so PROC step hates DRK, but we have a way around:
PROC PRINT DATA=emp (DROP = name); RUN;
Let us run this:
So not really that PROC hate DROP, they just allow them hidden in curly brackets at the top 😉
Similarly, we can use rename and keep:
PROC PRINT DATA=emp (rename = (name = jack))
RENAMING VARIABLES IN PROC PRINT
The only purpose of renaming variables in proc print would be to show more meaningful names in output. Therefore, we use LABELS:
PROC PRINT DATA=emp; LABEL name = 'EMPLOYEE NAME'; LABEL ID = 'EMPLOYEE IDENTITY' AGE = 'AGE IN YEARS' SALARY = 'MONTHLY TAKEHOME'; RUN;
Here we have defined two label statements, just to show that any number of label statements are fine.
In case of repeating labels for same variable, the last one stands.
Also, if dataset itself has any labels for variables, those would show unless those are overridden in proc print
Now lets check the output:
WTF! Nothing happened, and that is because first we need to SWITCH ON LABELS
PROC PRINT DATA=emp LABEL; LABEL name = 'EMPLOYEE NAME'; LABEL emp_id = 'EMPLOYEE IDENTITY' age = 'AGE IN YEARS' salary = 'MONTHLY TAKEHOME'; RUN;
The labels now appear, please note that we can’t label “OBS” using label statement.
Labels can be up to 256 characters long
PROC PRINT and WHERE clause
We can use WHERE statement in PROC PRINT to optionally select certain obs.
PROC PRINT DATA= emp NOOBS; WHERE salary > 1000; RUN;
Similarly other clauses can be:
WHERE name CONTAINS 'IT'; or WHERE name ? 'IT'; also, we can use "LIKE" : WHERE name LIKE '%it%'; That is we use % sign with Like operator
The “contains” keyword has a short name as “?” and it works the same way.
For char variables, we can check “contains” using single quotes or double quotes, both are fine.
Other where options:
AND & OR | NOT ^ WHERE name IN (‘SUMIT’,’JACK’); Complex ones like: WHERE (age<=55 and salary > 75000) or name=’SUMIT’;
Match a character at fixed position
We can optionally use LIKE condition to match a specific character at specific position. For this we use “_” that is underscore. example
NAME sumit tim nick
From above we want all name which have “i” as second character.
WHERE name LIKE ‘_i%‘
The underscore skips over one character. To skip over 2, use two underscores and so on. The above like ‘_i%’ gives tim and nick.
FORMATS and PROC PRINT
SAS Formats provide a nice way to view certain variables which otherwise might not make much sense.
PROC PRINT allows using the formats, which can be inbuilt ones or user defined by PROC FORMAT.
PROC PRINT DATA= emp NOOBS; FORMAT salary COMMA.; RUN;
Likewise we can even use our custom made formats
TITLES and FOOTNOTES
Titles and Footnotes can add a lot of meaningful insight into the output to an outside person.
We can have from 1-10 Titles in a report and 1-10 Footnotes as well.
TITLE 'MY EMPLOYEE REPORT'; TITLE2 'ALL EMPLOYEE'; PROC PRINT DATA=emp; RUN;
The first and second title appear correctly.
By the way Title is same as Title1
We could also modify the above and include the above inside proc print:
TITLE 'MY EMPLOYEE REPORT'; TITLE2 'ALL EMPLOYEE'; PROC PRINT DATA=emp; TITLE 'NEW TITLE1'; TITLE2 'NEW TITLE2'; RUN;
Here we have re-used Title and Title2 inside proc print, lets see what gets printed:
The titles inside the proc print get printed.
- Lower Title Cuts the Higher Title if it comes later
Let us validate this:
TITLE 'MY EMPLOYEE REPORT'; TITLE8 'ALL EMPLOYEE'; PROC PRINT DATA=emp FIRSTOBS=3; TITLE2 'NEW TITLE2'; TITLE6 'NEW TITLE6'; RUN; PROC PRINT DATA=MYLIB.finance; TITLE4 'TITLE4 IS HERE'; RUN;
Here first we have Title (or Title1), then Title8.
Inside First Proc Print we have used Title2 and 6. Inside 2nd proc print its Title4.
As at the top we have Title8, it was removed because we are referring to a lower title in both the reports.
Also, even though we had TITLE2 inside the first procedure, it still gets listed in 2nd procedure. But Title6 of first procedure doesn’t come over 2nd procedure, because Title4 being lower cuts it off.
Moreover, if we have 2 titles, say title and title10, then there will be a gap of 8 titles between them.
Similarly we have FOOTNOTES from 1-10, and it behaves in same way.
TITLE IS SAME AS TITLE1
FOOTNOTE IS SAME AS FOOTNOTE1
Redefining a title or footnote line cancels any higher-numbered title or footnote.
To cancel all footnotes or Titles
Just use keyword Title or Footnote without anything, all titles and footnote go away.
TITLE 'MY EMPLOYEE REPORT'; TITLE8 'ALL EMPLOYEE'; PROC PRINT DATA=emp; TITLE; RUN;
In the above case no title appears and Title1 being lowest and specified as empty shows nothing and kills all remaining titles.
Q) What if in above we wrote an empty Title6 in place of just an empty title?
Ans: Title6 being lower than title8, kills it, but is unable to kill Title1, so Title1 still appears
Please don’t use = sign with Titles and Footnotes, like: TITLE1 = “HELLO”…. This results in ERROR, not output will be produced, we can’t use = sign.
Remember with Labels we use “=” signs
Using ODS we can generate different types of outputs from proc print.
ods html file='c:/data/emp.html'; ods pdf file='c:/data/emp.pdf'; ods listing file='c:/data/emp.txt'; PROC PRINT DATA=emp; RUN; ODS _ALL_ CLOSE;
Now lets check all the 3 outputs:
We have only shown the Listing and PDF output as HTML is also very similar.
There is an option of specifying double line spacing, which leave ONE SINGLE line empty after each obs.
The “DOUBLE” spacing option only applies to LISTING output
ods listing file='c:/data/emp.txt'; PROC PRINT DATA=emp DOUBLE; RUN; ODS _ALL_ CLOSE;
SUBTOTALS IN PROC PRINT
What is the purpose of a report if we can’t do subtotals, and Proc Print allows this in an easy way.
Consider below dataset:
data test; input name $ age salary department $; datalines; sumit 35 1000 IT jack 44 2500 HR tim 22 5500 IT eric 34 3500 HR hulk 27 2000 IT mak 48 4000 IT ;run;
Now suppose we want to segregate the data into groups based on Department, we use BY variable statement.
For using BY statement, the dataset must be sorted in ascending order by the variables used in the by statement.
PROC SORT DATA=test; BY department; RUN; PROC PRINT DATA=test; BY department; RUN;
By statement (if sorted by the By statement variables), works perfectly.
We can create further sub-groups:
PROC SORT DATA=test; BY department age; RUN; PROC PRINT DATA=test; BY department age; RUN;
Suppose we want each sub-group on different page?
We use PAGEBY clause
PROC PRINT DATA=test; BY department; PAGEBY department; RUN;
As we see it draws a line after each department in HTML output, whereas in PDF and Listing, it will be on separate page.
Please note PAGEBY works only with BY statement. Else we get error:
ERROR: PAGEBY or SUMBY statement used but no BY statement present
If the By Variable is not sorted in Ascending order, we can use “NOTSORTED” keyword:
PROC PRINT DATA=test; BY NOTSORTED department; RUN;
Here data is NOTSORTED and we use NOTSORTED keyword, it generated groups for each department and we see duplicate departments.
Also, if we have data sorted by By variables in DESCENDING, we can use DESCENDING keyword.
“SUM” statement with PROC PRINT “BY”
If we segregate the data in PROC PRINT, based on some BY variables, then we can optionally generate SUM on numeric variables.
Assuming Data is sorted by the BY variable:
PROC PRINT DATA=test; SUM salary; BY department; RUN;
PROC PRINT DATA =emp; SUM SALARY; RUN;