In above we defined a macro variable. The main difference between the macro variable like the one defined above and the variables we defined in previous tutorials, is that these macro variables are available from one data step to another.
There are two types of Macro Variables:
- User Defined: example through %LET
- System Defined: example, _N_ _ERROR_ SYSDATE etc
To find all available macro variables: %PUT _ALL_;
The above is not the complete list, however, we can see the AGE macro variable we defined as been defined as GLOBAL. This we will discuss a bit later.
USER DEFINED MACRO VARIABLES
We can quickly defined as many macro variables as need be using %LET:
%LET name = SUMIT;
If we define the same variable name again with a new value, the new value will replace the existing.
As we see above, the name seems char type, but its been assigned a value without quotes. This is because macro variables have NO TYPE.
Let us also see a quick usage of macro variable. Assume, employee dataset exists as shown below:
EMP Dataset: Name age salary sumit 35 1000 jack 40 2000 tim 50 5000
Now, let us define a macro variable:
%LET column_name = ge;
Now, we can use this macro to query the emp dataset:
DATA test; SET emp; WHERE a&column_name. > 45;
We get test dataset with just one last obs, whose age = 50
As you saw, we can refer to macro variable use “&var_name.“. The & at beginning and a period at the end of name is important.
WHY NOT and WHEN TO USE QUOTES
We can assign char values to a macro variable as:
%LET name = sumit bharti;
If we refer to this macro variable:
As we didn’t use quotes against the name, we still get the name as it is even though it contains spaces.
What if the name was put in quotes:
%LET name = 'sumit bharti'; %LET name1 = "sumit bharti"; %PUT &name.; %PUT &name1.;
The quotes, whether single of double appear in the value itself. Therefore, we should not put the macro variable’s value in any kind of quotes.
What if we tried to put the value using single of double quotes?
%LET name = sumit bharti; %PUT &name.; %PUT '&name.'; %PUT "&name.";
The one in single quotes, just doesn’t work, and returns &name. itself. Whereas the one in double quotes at least converted the value and returned it in double quotes.
Therefore, if you need output in double quotes use either of below:
- %LET &name. = “SUMIT BHARTI”
2. %LET &name = SUMIT BHARTI;
That is either text be in double quotes or you use double quotes with the variable name before ampersand.
WHY USE PERIOD AFTER MACRO NAME WHEN CALLED
This is easier to understand by example.
Suppose we need to configure email addresses as firstname (period) lastname @gmail.com
%LET fname = SUMIT; %LET lname = BHARTI; %PUT &firstname.lastname@example.org;
We get below output:
It printed the full name without a dot. This is because SAS macro already assumes that each macro name would have a period at end. Therefore, in order to get extra period, we need to use two of them together. And in order to completely end the chaos, we deliberately use a period at the end of variable names.
Now we get: SUMIT.BHARTI@gmail.com
Plus an additional reason is to avoid SAS having confusion:
%PUT &fnameis a good boy;
Here we want to print: SUMITis a good boy, what we get it error:
The error is because SAS tries to result macro name: fnameis, which doesn’t exist, in order for SAS to know that we are trying to join macro variable &fname and is, we need to use a period between them:
%PUT fname.is a good boy;
This time it resolves the name correctly. So, we should always use a period after macro variable name in order for sas to be able to resolve it.
MACRO VARIABLE PASSED BETWEEN DATA STEPS
The main usage of a macro variable is to get it some value from a dataset and pass it to another PROC or DATA step. For this we use : “CALL SYMPUT” function. Example:
DATA _NULL_; SET emp; IF _N_ = 1 THEN CALL SYMPUT("m_age", age); RUN;
In above, we assign just one obs value for age into macro variable m_age.
Let us understand above one with a clear example.
Assume, we have a dataset of employees, and we find average salary for all employee. Then our requirement is to create another dataset, consisting of all employees whose salary is above mean.
DATA employee; INPUT name $ age salary; DATALINES; sumit 35 100 jack 40 200 tina 31 250 mac 51 400 rahul 22 50 ;RUN
In above we first create the data set for employee records.
Next, we create a new dataset, which only has mean salary:
PROC MEANS DATA=employee MEAN; VAR salary; OUTPUT OUT = avgsalemp mean = sal_mean; RUN;
Next, we create a Null dataset to transport the mean salary into a macro variable:
DATA _NULL_; SET avgsalemp; CALL SYMPUT("m_avgsal", sal_mean); RUN;
Let us check what is available in the macro variable:
Next we can create a new dataset for subset of employee whose salary is above mean:
DATA emp_high_sal; SET employee; WHERE salary > &m_avgsal.; RUN; PROC PRINT DATA= emp_high_sal; RUN;
This works out perfectly. Macros make it so easy to do this kind of task, creating more datasets from some summary information in one dataset, that is passing information around
So far we have written short code to understand and use macros in SAS. Sometimes, we get repetitive tasks, like we have 2 datasets, and we need to join them to create another dataset. Macro programs can help us in writing one macro code and reusing it every time for similar task.
%MACRO concatdata(A,B,C); DATA &C.; SET &A. &B.; RUN; %MEND concatdata;
Now, let us test this macro. We create two simple datasets and then call this macro to append those:
DATA test1; a=1; b=2; DATA test2; a=3; b=4; %concatdata(test1, test2, test3);
This will create test3, which should be a concatenation of test1 and test2:
As you see, we create a macro function using “%MACRO” keyword. and we end the same using %MEND keyword.
In order to call such a macro function, we can directly by using %macro_name
DO LOOPS IN MACRO PROGRAMS
The syntax changes slightly for do loops when we use them inside macro programs:
%MACRO macro_name; %DO i = 1 %TO 10; :::: :::: %END; %MEND;
Note that loop variable “i” doesn’t have % sign.
Let us take an example. Suppose we are merging 3 companies, and we need to group all IT department employees into one dataset. Rather than repeating this step for each department, we create a macro code and reuse the same.
DATA company1; DATA company2; DATA company3; INPUT name $ id salary dept$; INPUT name $ id salary dept$; INPUT name $ id salary dept $ DATALINES; DATALINES; DATALINES; A 1 100 IT M 1 300 IT U 10 100 IT B 2 200 IT N 2 400 RND V 12 200 IT C 3 300 HR O 3 1000 HR W 17 300 IT D 4 500 VP P 4 500 IT X 18 400 HR E 5 350 RND Q 5 300 HR Y 21 500 HR ;RUN; ;RUN; ;RUN;
Now, lets write a loop inside a macro that will loop over each of the above datasets, create 3 new datasets for employees of IT department, and then we merge the three new datasets.
%MACRO merge_dept(In,Deptt, Out); %DO i = 1 %TO 3; DATA comp&i._&Deptt.; SET &In.&i.; WHERE dept = "&Deptt."; RUN; %END; DATA &Out.; SET comp1_&Deptt. comp2_&Deptt. comp3_&Deptt.; RUN; %MEND;
We have 3 input variables, input datasets, department and out dataset. Then we loop over each input dataset, selecting only those obs where department matches and we also create new datasets for those obs.
Now lets call this macro:
IF THEN ELSE & MACRO FUNCTION
Let us understand the usage of If then else inside a macro procedure with an example. We will use the same companies dataset as above.
%MACRO merge_datasets(In1, In2, ByVar, Out, Type); /* First we sort the 2 input datasets using the BY variable passed as input */ PROC SORT DATA = &In1.; BY &ByVar.; RUN: PROC SORT DATA = &In2.; BY &ByVar.; RUN: /* Next we join the 2 datasets using the technique selected by Type Input */ DATA &Out.; MERGE &In1. (In = dummy1) &In2. (In = dummy2) ; BY &ByVar.; %IF &Type. = 1 %THEN %DO; IF dummy1 and dummy2 THEN OUTPUT; %END; %IF &Type. = 2 %THEN %DO; IF dummy1 THEN OUTPUT; %END; %IF &Type. = 3 %THEN %DO; IF dummy2 THEN OUTPUT; %END; %IF &Type. = 4 %THEN %DO; IF NOT dummy1 AND NOT DUMMY2 THEN OUTPUT; %END; RUN; %MEND;
Now, we can create an inner join, using Type=1 :
%merge_datasets(company1, company2, dept, Out, 1);
or Get no common obs using Type=4:
%merge_datasets(company1, company2, dept, Out, 4);
SCOPE OF MACRO VARIABLES
It is possible that we define a variable inside a Macro function, in that case that variable will be a LOCAL MACRO VARIABLE. Because the scope is local to that function and its value is not available outside. E.g
%MACRO Test; %LET("m_name", SUMIT); %PUT the name is: &m_name; %MEND; %Test; %PUT Again the name is: &m_name;
Here we are referring to macro variable m_name which has been defined inside the macro function Test.
As we see the the macro variable’s value is unresolved outside the macro function where it has been defined. This variable is a local macro variable.
Now, let us define a global macro variable and check its values inside and outside a macro function.
%LET m_name = sumit; %MACRO Test; %PUT my name is &m_name.; %LET m_name = jack; %MEND; %PUT again my name is &m_name.;
This shows that the macro variable’s value which has been set globally can be referenced locally. But if its value is updated locally, it gets updated globally as well. This is an important SAS interview question.
DELIBERATELY MAKING A MACRO VARIABLE LOCAL/GLOBAL
As we have seen any macro variable defined inside a macro function using Let or Call Symput is by default local. But we can make it global also.
Similarly, any macro variable defined locally can also be made specifically local. example:
%MACRO Test; %GLOBAL m_name; %LET m_name = sumit; %PUT my name is &m_name.; %MEND;
We can pass parameters to SAS Macro Functions in two ways. Assuming function defined as:
%MACRO test(A, B, C, D);
We can pass parameters in two ways:
1. %test(A= company1, B= company2, C = out, D = 1);
2. %test(company1, company2, out, 1)
Both the options are fine.
DO LOOP INSIDE DATA STEP INSIDE MACRO
We have already seen that in case we need to use DO Loop inside a macro, we need to add % sign before DO, TO and END keywords.
But what if we are creating a data set inside a macro and we need a do loop inside that data step?
In such case we don’t need to use %sign inside the data step. example
%MACRO test; /* For a do loop in a macro we need % signs */ %DO i = 1 %TO 3; DATA emp_&i.; do j = 1 to 3; /* In this do loop we don't need % sign */ salary+ j; end; %END; %MEND; /* Now lets call this macro */ %test;
This results in two datasets as shown:
SAS EVAL Macro Function
We know that SAS macro functions return values that are neither numeric or character. They are TEXT type variables. Using %EVAL we can do some calculation on such macro variables. Example:
%LET age = 34+1;
The above returns:
It shows 34=+1, even though we expected 35. This is because SAS Macro variables are text, so it remains as 34+1 text value. In order to make it calculate, and assign result to the macro variable, we use %EVAL
%LET age = %EVAL(34+1);
This time it calculates the value correctly as 35.
If we need decimal calculations, %EVAL will return only Integer part:
%LET age = %EVAL(34/3);
This results in 11 and not 11.333.
For decimal calculations we use %SYSEVALF
%LET age = %SYSEVALF(34/3);
This results in 11.33
Let us take an example to better understand the use. In this example, we will input age as number of years of a person and print it as number of months.
%MACRO bdaydiff(Age); %LET mnths = %EVAL(&Age. *12); %LET decimal_mnths = %SYSEVALF(&Age. *12); %PUT &mnths.; %MEND;
Calling this function: %bdaydiff(34); This results:
MACRO CHARACTER FUNCTIONS
We have seen how we can utilize %EVAL and %SYSEVALF function for numerical calculations on Macro Variables. Now, we look at some of the functions which we can perform on character values.
%SCAN Function %LET address = 876 Dollard Road; %LET street = %SCAN(&address.,2); %PUT &street.;
%SUBSTR Function %LET address = 876 Dollard Road; %LET houseno= %SUBSTR(&address.,1,3); %PUT &houseno.;
Similarly, we have %LENGTH, %INDEX functions.
MASKING OF SAS FUNCTIONS
In SAS when we write = sign, we mean to check or equate something to something else. Like a=2. However, in SAS macro functions, often we need to pass a strings like “a=2” and don’t want SAS to evaluate or make the value of “a” as 2. Lets check by example:
Assume that we have a macro function which takes in two parameters as input, procedure statement, and an input data set:
%MACRO proc_macro(in_proc, in_data); DATA &in_data.; SET &in_data.; RUN; &in_proc.; %MEND;
Here suppose we want to pass in procedure as: “PROC PRINT; RUN;”.
%LET abc = PROC PRINT; RUN;
So, when we run the above, we get abc as “PROC PRINT”.
If however, we put it all in double quotes, then we will get the string in double quotes as output. To resolve the same we use %STR function:
%LET abc = %STR(PROC PRINT; RUN; );
Now, let us test this by calling the macro:
This time proc print works perfectly.
If you want to group a set of characters into a macro variable that have any of the following characters:
* ^ @ ( ) / ‘ ” + = – < NE LE GT LT AND NOT OR NOT EQ | \
Then use %STR, with that set of characters.
If the string contains % and & then use %NRSTR
%LET name = sumit; %PUT ("I am human &name is: " &name. );
Here our intention is to print it as: “I am human &name is :sumit”. However, SAS will evaluate both &name to sumit.
The above can be resolved by %NRSTR function.
%PUT %NRSTR(I am human &name is: ) &name.;
Similarly, when we need to forward a list of values to a macro function, we need to use %STR. Example:
%MACRO fruits_I_Like(input); %PUT My favorite fruits are : &input.; %MEND; %fruits_I_Like(apple, mango, banana);
Here the macro expects only one input but assumed we are passing 3 parameters and gives error:
To resolve we can try the below option, but thats wrong too:
%MACRO fruits_I_Like(input); %PUT My favorite fruits are : &input.; %MEND; %LET fruit_list = apple, mango, banana; %fruits_I_Like(&fruit_list.);
Here even thought the macro variable fruit_list comprises of entire comma delimited text of 3 fruits, but when we pass it to the macro function, the function still assumes it to be 3 input parameters and errors:
The error remains the same, the only way to resolve is using %STR:
%fruits_I_Like(%STR(apple, mango, banana)); OR %LET fruit_list = %STR(apple, mango, banana); %fruits_I_Like(&fruit_list.);
MASKING THE MACRO VARIABLES
In the last case, we defined a macro variable as:
%LET fruit_list = %STR(apple, mango, banana);
However, if we would have defined it as:
%LET fruit_list = apple, mango, banana;
That is without using %STR function.
Now, when we call the macro function, we get error:
%LET fruit_list = apple, mango, banana; %fruits_I_Like(%STR(&fruit_list.));
The reason it gives error that using %STR on a macro variable doesn’t work. So below is wrong
In the above case, we need to use %BQUOTE:
NEED FOR MORE FUNCTIONS
We saw that SAS macro has plenty of functions like %SCAN, %SUBSTR, %INDEX etc. However, the list is way tooo small. What if we need to MIN , MAX etc, like %min or %max, they ain’t available.
But then there is a GOD function in SAS Macros called %SYSFUNC. This allows to use any of the SAS Data step functions. Lets see:
%LET age1 = 10; %LET age2 = 20; %LET higher = MAX(&age1., &age2.); %PUT &higher.;
In the above case, we don’t get 20 as a higher value, instead its text “MAX(10,20)”. If we try %MAX :
%LET higher = %MAX(&age1., &age2.);
The above gives error:
WARNING: Apparent invocation of macro MAX not resolved.
This is because %MAX macro function is not provided. In order to make MAX function available, we use the GOD function called %SYSFUNC:
%LET higher = %SYSFUNC( MAX(&age1., &age2.));
This returns 20.
Similarly, if through a macro we are printing a report, and need to include DATE() function in Title of the report, we can use %SYSFUNC:
TITLE “My Report on date %SYSFUNC(date() );
We can even check if a data set exists or not or even if a Libref exists or not using %SYSFUNC. Example:
/* The below macro checks if a data set exists */ %MACRO test(A); %IF %SYSFUNC(EXIST(&A.)) = 1 %THEN %PUT &A. EXISTS; %ELSE %PUT &A. DOESN'T EXIST; %MEND;
Now, lets call this macro for Emp dataset in work libref.
Similarly, we can test for whether libref or fileref exists:
%IF %SYSFUNC(LIBREF(&mylib.)) EQ 0 %THEN.. /* libref/File we check = 0 for existence %IF %SYSFUNC(FILEREF(&myfile.)) = 0 %THEN....
File Delete Example Using SYSFUNC
Now, let us look at an example where in we will check if a file exists, if yes, then we will delete the same. First, let us create a sample.csv file at c:\data path.
%MACRO file_remove(infile); /* This checks if specified PATH exists */ %LET checkpath = %SYSFUNC(FILEEXIST(&infile.)); /* if path is valid check that file exists */ %IF &checkpath.= 1 %THEN %DO; /* Load that file in to a fileref act_file */ %LET getfile = %SYSFUNC(FILENAME(act_file,&infile.)); %LET del = %SYSFUNC(FDELETE(&act_file.)); %IF &del=0 %THEN %LET log = File has been deleted; %ELSE %LET log = The file could not be deleted; %END; %ELSE %LET log= The file does not exist; %PUT &log; %MEND;
Now, in Order to delete any file:
Checking a Data Set
We can open any dataset, check number of variables, number of obs, add or remove a variable etc, and all this using %SYSFUNC.
%MACRO get_dataset(in_data); /* Open data set passed as the macro variable */ %LET dsid = %SYSFUNC(OPEN(&in_data)); /* If the data set exists */ %IF &dsid. %THEN %DO; /* Get number of obs */ %LET num_obs =%SYSFUNC(ATTRN(&dsid,NOBS)); /* Get number of variables */ %LET num_vars= %SYSFUNC(ATTRN(&dsid,NVARS)); /* Close the data set */ %LET closure = %SYSFUNC(CLOSE(&dsid)); %END; /* Otherwise, write a message that the data set could not be opened */ %ELSE %PUT data set &in_dset. could not be opened - %SYSFUNC(SYSMSG()); %put "number of observation = " &nobs; %put "number of variable = " &num_vars; %MEND;
Now, let us call this macro for any dataset:
Changing Variable Names & Values
Suppose our requirement is to change variable names, or update all the numerical variables, say multiply all numerical values by 100, create new variables based on existing variables etc. All this can easily be achieved by a macro:
%MACRO get_dataset(in_data); /* Open data set passed as the macro variable */ %LET dsid = %SYSFUNC(OPEN(&in_data)); /* If the data set exists */ %IF &dsid. %THEN %DO; /* create new variables to hold new and to be dropped variables */ %let var_drop =; %let var_add =; /* Get variables from existing data set*/ %DO i = 1 %TO %SYSFUNC(ATTRN(&dsid,NVARS)); %LET var_name = %SYSFUNC(VARNAME(&dsid, &i.)); %LET var_type = %SYSFUNC(VARTYPE(&dsid, &i.)); %IF &var_type. eq C %THEN %DO; /* Rename char variables */ %LET var_drop = &var_drop. &var_name.; %LET var_add = &var_add. &var_name._same=&var_name. %str(;); %END; %IF &var_type. eq N %THEN %DO; /* Create new Numeric variables */ %LET var_add = &var_add &var_name._mul=&var_name.*100 %str(;); %END; %END; /* Close the data set */ %LET closure = %SYSFUNC(CLOSE(&dsid)); %END; /* Otherwise, write a message that the data set could not be opened */ %ELSE %PUT data set &in_dset. could not be opened - %SYSFUNC(SYSMSG()); DATA class_new; SET &in_data.; &var_add.; DROP &var_drop.; %RUN; %MEND;
Now, lets call this macro:
Sometimes we need to run a PROC step inside a datastep based on some values in the data set. Or, a more complex case, when we need to call different PROC steps based on data.
For such cases, the simple solution is CALL EXECUTE.
Assume, that we have a requirement wherein we have a text file which has a list of datasets, and we want to print each of those datasets.
DATA _NULL_ ; INPUT data_sets $20.; CALL EXECUTE ("PROC PRINT" || " DATA= " || data_sets ||"; RUN;" ); DATALINES; mylib.finance mylib.mggarlic ; RUN;
Here, the input to the datastep is through datalines, but we can even use infile. The input is individual datasets which we want to print.
A better use case is to call some PROC step based on variable value for a particular observation.
Suppose, in a datastep we do a sum of all students heights, and then check END=k option to print just the last obs with sum of heights. If there is a requirement to run proc means based on the total age:
DATA _NULL_; SET classes END=k; TOTAL + height; IF k; IF TOTAL < 1000 THEN DO; CALL EXECUTE ('PROC MEANS DATA=myclass; CLASS height;'); END; ELSE DO; CALL EXECUTE ('PROC MEANS DATA=myclass; CLASS age;'); END; RUN;
Now, lets take another example, in this case suppose we have a global macro variable, city set as “NEW YORK”
%LET city= NEW YORK
Now, through a datastep, using CALL EXECUTE, we will run PROC STEP
DATA _NULL_; IF ("&city") = "NEW YORK" THEN CALL EXECUTE ('PROC CONTENTS DATA=SASHELP.AIR;' ); ELSE CALL EXECUTE ('PROC PRINT DATA=SASHELP.CLASS;' ); RUN;
To better understand the power of Call Execute on how it can work with dataset values, consider below example. Here we will PUT each name variable out in log:
DATA _NULL_; SET sashelp.class; CALL EXECUTE('%PUT Students name is '||(name)||'; '); run;
Within this DATA step a Call Execute will execute the PUT statement for every observation read from the SASHELP.CLASS dataset.
Let us take another case, wherein using Call Execute we will call a data step inside another dataset. And using the existing dataset, for each obs, we will create a new dataset.
DATA _NULL_; SET sashelp.class; CALL EXECUTE('DATA work.'||name||'; SET sashelp.class; WHERE name="'||name||'";run;'); RUN;
For each name read from external sashelp.class, it will read obs from internal sashelp.class, and create one dataset for each name.
PROC SQL & MACROS
It is possible that we run a PROC SQL statement and pass the retrieved info in a macro variable. Again, we learn by example:
PROC SQL; SELECT AVG(invoice) INTO :avg_price FROM SASHELP.CARS; QUIT; %PUT &avg_price.;
Using the INTO keyword, we can assign the value of average invoice into macro variable avg_price. Later we can check the value assigned using PUT statement.
But what if there are more rows returned by the SQL?
In that case, we can optionally choose a field delimiter as shown:
PROC SQL; SELECT INT(AVG(INVOICE)) as Average_Price , TYPE INTO :avg_price SEPARATED BY ',' , :car_type SEPARATED BY ' ' FROM SASHELP.CARS GROUP BY TYPE ORDER BY Average_Price; QUIT; %put &car_type.; %put &avg_price.;
This results as:
Suppose from a data set, we want to create a new dataset which provides means for all numeric variables only and the other variables are not listed?
PROC CONTENTS DATA=sashelp.class OUT = test; (KEEP = name type); RUN; PROC SQL; SELECT name INTO :numeric_variables SEPARATED BY ' ' FROM test WHERE type = 1; QUIT; PROC MEANS DATA=sashelp.class; VAR &numeric_variables.; RUN;
This results in