Learn SAS for FREE Day12

PROC PRINT

Proc print is one of the most widely used Procedures in SAS and reasons we shall soon find out.

In a simple form proc print allows to print all obs in a dataset:

PROC PRINT DATA=emp;   RUN;

Also we have seen that we can optionally select only certain obs to be printed by:

OPTIONS  FIRSTOBS = 3     OBS = MAX;     ==> From 3rd to last obs

    or

PROC PRINT DATA=emp (FIRSTOBS = 3 OBS = 6); 

 

OBS VARIABLE

By default the PROC PRINT shows the first variable as “OBS” for observation number. Suppose we print from 3rd to last obs:

PROC PRINT DATA=emp (FIRSTOBS = 3); RUN;

It shows obs from 3-4.

In case we don’t want to show the obs variable, we use NOOBS

PROC PRINT DATA=emp (FIRSTOBS = 3) NOOBS; 
RUN;

The first OBS variable is now gone….



SELECT VARIABLES TO PRINT

It is possible to not show all variables in output using “VAR” statement:

PROC PRINT DATA= emp NOOBS; 
VAR name salary;
RUN;

 




INDEX OUTPUT WITH PROC PRINT

Though not what its called officially, but I call it this way. If we don’t want OBS variable to be listed, then also we might like some way to be able to refer individual obs in the report generated by PROC PRINT.

We use “ID” statement.

Suppose we want the output to be indexed by employee_id variable:

PROC PRINT DATA=emp;
ID emp_id; 
RUN;

The emp_id variable now appears as the index, but the WORST PART, THE VALUES APPEAR AS IT IS. They are NOT SORTED BY THE ID VARIABLE.

We can have multiple variables as By Variables and they would all appear in GREY COLOUR.

CAUTION:   if the index variables are also selected with VAR statement, they appear twice

 

Example: Below we have id variables as  emp_id and name, later using Var statement we have selected emp_id and name again, they would appear twice.

PROC PRINT DATA=emp; 
ID  emp_id  name;
VAR  salary  age  name  emp_id;
RUN;

Therefore, unless we really want to see these variables again, we should not select them in VAR statement.

The major advantage of ID statement is that we don’t need to use NOOBS to avoid obs number to be printed.

Moreover, if we don’t use VAR statement with ID, then the ID variables are not duplicated.



DRK – PROC PRINT

As mentioned before, Drop Rename Keep, just hates all the PROC steps. We can’t use DRK in PROC, including PROC PRINT.

DRK statements can only we used for Data Set creation and manipulations.

In case we become shameless and try below code:

PROC PRINT DATA=emp;
DROP name;
RUN;

We get a warning:

The DROP and KEEP statements are not supported in procedure steps. Therefore, these statements are ignored.

The step doesn’t error, the data is printed but the DRK are ignored.

In case we use RENAME, then we do get error.

 

WORKAROUND FOR DRK IN PROC

Ok, so PROC step hates DRK, but we have a way around:

PROC PRINT DATA=emp (DROP = name);
RUN;

Let us run this:

So not really that PROC hate DROP, they just allow them hidden in curly brackets at the top 😉

Similarly, we can use rename and keep:

PROC PRINT DATA=emp  (rename = (name = jack))




RENAMING VARIABLES IN PROC PRINT

The only purpose of renaming variables in proc print would be to show more meaningful names in output. Therefore, we use LABELS:

PROC PRINT DATA=emp; 
LABEL   name = 'EMPLOYEE NAME'; 
LABEL   ID = 'EMPLOYEE IDENTITY'    
        AGE = 'AGE IN YEARS'  
        SALARY = 'MONTHLY TAKEHOME';
RUN;

Here we have defined two label statements, just to show that any number of label statements are fine.

In case of repeating labels for same variable, the last one stands.

Also, if dataset itself has any labels for variables, those would show unless those are overridden in proc print

Now lets check the output:

WTF! Nothing happened, and that is because first we need to SWITCH ON LABELS

PROC PRINT DATA=emp  LABEL; 
LABEL   name   = 'EMPLOYEE NAME'; 
LABEL   emp_id = 'EMPLOYEE IDENTITY' 
        age    = 'AGE IN YEARS' 
        salary = 'MONTHLY TAKEHOME';
RUN;

The labels now appear, please note that we can’t label “OBS” using label statement.

Please note:

Labels can be up to 256 characters long

 



PROC PRINT and WHERE clause

We can use WHERE statement in PROC PRINT to optionally select certain obs.

PROC PRINT DATA= emp NOOBS; 
WHERE salary > 1000;
RUN;

Similarly other clauses can be:

WHERE  name CONTAINS  'IT';
 or
WHERE name ?  'IT';

also, we can use "LIKE" :   WHERE name LIKE '%it%';
That is we use % sign with Like operator

The “contains” keyword has a short name as “?” and it works the same way.

For char variables, we can check “contains” using single quotes or double quotes, both are fine.

Other where options:

AND    &

OR     |

NOT    ^

WHERE name   IN   (‘SUMIT’,’JACK’);

Complex ones like:
  WHERE (age<=55 and salary > 75000) or name=’SUMIT’;

 

Match a character at fixed position

We can optionally use LIKE condition to match a specific character at specific position. For this we use “_” that is underscore. example

NAME
sumit
tim
nick

From above we want all name which have “i” as second character.

WHERE name   LIKE  ‘_i%

The underscore skips over one character. To skip over 2, use two underscores and so on. The above like ‘_i%’ gives tim and nick.




FORMATS and PROC PRINT

SAS Formats provide a nice way to view certain variables which otherwise might not make much sense.

PROC PRINT allows using the formats, which can be inbuilt ones or user defined by PROC FORMAT.

PROC PRINT DATA= emp NOOBS; 
FORMAT salary  COMMA.;
RUN;

Likewise we can even use our custom made formats



TITLES and FOOTNOTES

Titles and Footnotes can add a lot of meaningful insight into the output to an outside person.

We can have from 1-10 Titles in a report and 1-10 Footnotes as well.

TITLE 'MY EMPLOYEE REPORT';
TITLE2 'ALL EMPLOYEE';
PROC PRINT DATA=emp; RUN;

The first and second title appear correctly.

By the way Title is same as Title1

We could also modify the above and include the above inside proc print:

TITLE 'MY EMPLOYEE REPORT';
TITLE2 'ALL EMPLOYEE';

PROC PRINT DATA=emp; 
  TITLE 'NEW TITLE1'; 
  TITLE2 'NEW TITLE2'; 
RUN;

Here we have re-used Title and Title2 inside proc print, lets see what gets printed:

The titles inside the proc print get printed.

 

  • Lower Title Cuts the Higher Title if it comes later 

Let us validate this:

TITLE 'MY EMPLOYEE REPORT';
TITLE8 'ALL EMPLOYEE';

PROC PRINT DATA=emp FIRSTOBS=3; 
 TITLE2 'NEW TITLE2'; 
 TITLE6 'NEW TITLE6'; 
RUN;

PROC PRINT DATA=MYLIB.finance;
TITLE4 'TITLE4 IS HERE';
RUN;

Here first we have Title (or Title1), then Title8.

Inside First Proc Print we have used Title2 and 6. Inside 2nd proc print its Title4.

As at the top we have Title8, it was removed because we are referring to a lower title in both the reports.

Also, even though we had TITLE2 inside the first procedure, it still gets listed in 2nd procedure. But Title6 of first procedure doesn’t come over 2nd procedure, because Title4 being lower cuts it off.

Moreover, if we have 2 titles, say title and title10, then there will be a gap of 8 titles between them.

Similarly we have FOOTNOTES from 1-10, and it behaves in same way.

TITLE    IS SAME AS   TITLE1

FOOTNOTE    IS SAME  AS   FOOTNOTE1

Redefining a title or footnote line cancels any higher-numbered title or footnote.

 

To cancel all footnotes or Titles

Just use keyword Title or Footnote without anything, all titles and footnote go away.

TITLE 'MY EMPLOYEE REPORT';
TITLE8 'ALL EMPLOYEE';

PROC PRINT DATA=emp; 
TITLE;
RUN;

In the above case no title appears and Title1 being lowest and specified as empty shows nothing and kills all remaining titles.

Q) What if in above we wrote an empty Title6 in place of just an empty title?

Ans: Title6 being lower than title8, kills it, but is unable to kill Title1, so Title1 still appears

 

Please don’t use = sign with Titles and Footnotes, like:  TITLE1 = “HELLO”…. This results in ERROR, not output will be produced, we can’t use = sign.

Remember with Labels we use “=” signs



GENERATING REPORTS

Using ODS we can generate different types of outputs from proc print.

ods html file='c:/data/emp.html';
ods pdf file='c:/data/emp.pdf';
ods listing file='c:/data/emp.txt';
PROC PRINT DATA=emp; 
RUN;
ODS _ALL_ CLOSE;

Now lets check all the 3 outputs:

We have only shown the Listing and PDF output as HTML is also very similar.

There is an option of specifying double line spacing, which leave ONE SINGLE line empty after each obs.

The “DOUBLE” spacing option only applies to LISTING output

ods listing file='c:/data/emp.txt';
PROC PRINT DATA=emp DOUBLE; 
RUN;
ODS _ALL_ CLOSE;




SUBTOTALS IN PROC PRINT

What is the purpose of a report if we can’t do subtotals, and Proc Print allows this in an easy way.

Consider below dataset:

data test;
input name $ age salary department $;
datalines;
sumit 35 1000 IT
jack 44 2500 HR
tim 22 5500 IT
eric 34 3500 HR
hulk 27 2000 IT
mak 48 4000 IT
;run;

 

Now suppose we want to segregate the data into groups based on Department, we use BY variable statement.

For using BY statement, the dataset must be sorted in ascending order by the variables used in the by statement.

PROC SORT DATA=test;
BY department; 
RUN;

PROC PRINT DATA=test;
BY department;
RUN;

By statement (if sorted by the By statement variables), works perfectly.

 

We can create further sub-groups:

PROC SORT DATA=test; 
BY department   age;  
RUN; 

PROC PRINT DATA=test; 
BY department   age; 
RUN;

 

Suppose we want each sub-group on different page?

We use PAGEBY clause

PROC PRINT DATA=test;
 BY  department;
 PAGEBY  department;
RUN;

As we see it draws a line after each department in HTML output, whereas in PDF and Listing, it will be on separate page.

Please note PAGEBY works only with BY statement. Else we get error:

ERROR: PAGEBY or SUMBY statement used but no BY statement present

If the By Variable is not sorted in Ascending order, we can use “NOTSORTED” keyword:

PROC PRINT DATA=test;
  BY  NOTSORTED  department;
RUN;

Here data is NOTSORTED and we use NOTSORTED keyword, it generated groups for each department and we see duplicate departments.

Also, if we have data sorted by By variables in DESCENDING, we can use DESCENDING keyword.




“SUM” statement with PROC PRINT  “BY”

If we segregate the data in PROC PRINT, based on some BY variables, then we can optionally generate SUM on numeric variables.

Assuming Data is sorted by the BY variable:

PROC PRINT DATA=test;
 SUM salary;
 BY  department;
RUN;
 
It shows sum of salary after each department and then an overall sum at the end.
PLEASE NOTE CAREFULLY:
TO GENERATE AN AGREEGATE TOTAL SUM, WE CAN SUM WITHOUT BY GROUPING:
PROC PRINT DATA =emp;
SUM SALARY;
RUN;
Here sum produced a GRAND TOTAL. and that too WITHOUT A BY group.
Posted in: SAS Filed under:

Leave a Reply

Your email address will not be published. Required fields are marked *