Learn SAS for FREE Day9

DO Loops in SAS

Do loops are widely used in SAS programming, just like other languages like Java, C etc, which have loops for repeating certain steps in a program, SAS has DO LOOPS.

We have many types of DO loops:

1. Simple Do Loop, that just runs once:

IF age > 50 THEN DO;
       a=1;
       b=2;
END;
ELSE DO; ..............

Can you guess the reason why we use these type of simple loops, which ideally are not even a loop, as it executes just ones. The answer:

Using IF statement, we can’t set more than a single variable

That is, below will work:

IF age > 50 then a=1;

But below results in error

IF age > 50 then a=1 b=2;

This gives below error:

ERROR: No matching DO/SELECT statement.

We can’t set more than one variable using IF statement, hence we use DO with IF.

Therefore, we use DO with IF for setting multiple variables.



Iterative DO Loop

Often we would like to repeat certain steps, then we use Iterative DO Loop:

2. Iterative Do Loop, that just runs once:

DO i = 1 to 10;
   a=100;
   b=20;
END;

Let us try this:

DATA annualsales;
mnthly_invest = 100;
mnthly_return = 20;

DO i = 1 to 12;
 totreturn + (mnthly_invest+mnthly_return);
END;
RUN;

Let us check the output:

The loop works correctly, and it loops 12 times. The iterative variable ‘i’ also comes out in the output, and please note, then it increments ‘i’ even after it reaches end of interval. But after final iteration, its value is 12, it will increment to 13, go back to top and find it doesn’t have to run the 13th time. Do loop ends, but ‘i’ remains at 13.

But really loop runs for 12 times and not 13? Let us confirm:

DATA annualsales;
mnthly_invest = 100;
mnthly_return = 20;

DO i = 1 to 12;
  totreturn + (mnthly_invest+mnthly_return);
  check+1;
END;
RUN;

We have put the condition “check+1”. This will provide a final value of check variable to confirm if its 12 or 13.

This shows the loop actually runs 12 times but “i” increments one extra.

If we don’t want to see “i” the iterator, then better use drop statement.

 

What if we are still not convinced that loop runs for 12 times and not 13?

We use OUTPUT statement inside the DO Loop to check values at each iteration

 

DATA annualsales;
mnthly_invest = 100;
mnthly_return = 20;

DO i = 1 to 12;
  totreturn + (mnthly_invest+mnthly_return);
  OUTPUT;
END;
RUN;

This results in 12 observations and i has maximum value 12 and not 13.

That is, after we use OUTPUT statement inside the DO Loop, ‘i’ cleverly gets value 12 as maximum and not 13. 🙂

 

We can even have decrementing Do Loops:

DATA annualsales;
mnthly_invest = 100;
mnthly_return = 20;

DO i = 12 to 1  BY -1;
 totreturn + (mnthly_invest+mnthly_return);
END;
RUN;

This will show the same result as our first output. But can you guess the value of ‘i’?

Will ‘i’ be 1, 0 or -1? Well as last value of i is 1, it further goes one step lower and the last value of i is 0.


We have seen that by using “BY” condition in DO Loop, we can reverse the order of loop. Similarly using “BY” we can do the do loop in other orders too.

Say, we want to do loop by ODD numbers.. that is 1,3,5,7,9 and 11..

DATA annualsales;
mnthly_invest = 100;
mnthly_return = 20;

DO i = 1 to 12  BY 2;
 totreturn + (mnthly_invest+mnthly_return);
 OUTPUT;
END;
RUN;

We have used OUTPUT statement, therefore i behaves well and remains within limit. But what if we don’t use the OUTPUT will the value of i be 11, 12 or 13?

It will be 13, after 11 it will increment by 2.

To loop for even numbers:

DO i = 2 to 12 BY 2;

In this case the final value of i will be 14.



Customized Looping

What if we want to loop for specific values, say i=2, i=9, i = 23?

DO i = 2, 9, 23;

In this case, we loop by these values, mostly because we want to use these values inside the do loop.

DATA annualsales;
mnthly_invest = 100;
mnthly_return = 20;

DO i = 2, 9, 23;
 totreturn + (mnthly_invest+mnthly_return);
 OUTPUT;
END;
RUN;

In the above case the loop runs 3 times only once for each comma separated value.

Can you guess what will be final value if i, if we didn’t use “OUTPUT”? Well, it will be 23, it will not increment since it doesn’t know how to.



DO LOOPS Based on a Dataset Variable

Consider below dataset:

DATA emp;
INPUT name $ id sal;
DATALINES;
sumit 2 1000
jack 3 2000
;
RUN;

Now, suppose we create a new data set, and use a DO Loop using an iterator over the ‘id’ variable:

DATA annualsales;
SET emp;

DO i = 1 TO id;
 sal+salary;
 OUTPUT;
END;
RUN;

Here we have id values as 2 and 3. In this case for first obs, it will iterate TWO TIMES, and for the 2nd obs, it will iterate 3 times. Lets check:

The above has resulted in 5 obs (2+3). If we didn’t use output, then i for first obs, will show as 3 and for 2nd obs will be 4. Please note that it retains the value of “sal” when it goes over the 2nd observation from emp.

The above is something which we might not want to use, I mean if someones id is 100, or salary is 1000, we don’t want to loop from 1 to 100 or 1000. Instead generally we would like “i” to have the exact value which that variable has.

To achieve the same, we don’t use “1 to” keywords. Just like we did “i = 2,9, 23”.

DATA annualsales;
SET emp;

DO i = id, sal;
 salary+sal;
 OUTPUT;
END;
RUN;

Let us check the output:

That means i took the value of id and salary for each obs in original dataset.



MULTIPLE SEQUENTIAL DO LOOPS

It is possible to run multiple do loops one after another, however, there needs to be a solid reason to do it.

DATA test;
DO i = 1 TO 5, 16 TO 20;
 OUTPUT;
END;
RUN;

In the above case, first the do loop runs from 1 to 5, and then from 16-20. Lets check:

In the same way we can run a sequential do loop over different variables in a data set:

DATA annualsales;
SET emp;
DO i = 1 TO id,  500 TO sal BY 250;
 salary+sal;
 OUTPUT;
END;
RUN;

In this case, the do loop runs from 1 to the id value for each obs, and then from 50 to sal value for each obs in the multiples of 10. Each obs is handled first by each loop and then it goes to next obs.



Looping Over Char Values

We can have do loops as :

DO     i = ‘Jan’, ‘Feb’, ‘Aug’;

Or, for dates:

DO   i=’01JAN1960’d,  ’14AUG1947’d;



STOPPING   DO LOOP   PREMATURELY

Sometimes, based on a condition, we would like to stop the do loop.

Simply set the value of “i” the iterator such that it ends the loop:

DATA annualsales;
SET emp;

DO i = 1 to 12;
 salary+sal;
 OUTPUT;
IF (salary > 2000) THEN i=13;
END;
RUN;

Please note that in this case, we have the IF statement after the salary is being set. Therefore, if the salary exceeds 2000, even then that amount will appear. After that it will exit the loop as i is set to 13.

Can you guess what will be salary and the values of i for both the employees if we didn’t have “output”

It still increments i to 14 after we set it to 13. So remember it.



Q) How many times do the below two loops execute?

a)
DO i = 1 to 12 by 3;
  :
  :
END;

b) 
DO i = 1 to 4;
 :
 :
END;

Ans: Both will execute 4 times only, except that “i” will take on values 1,2,3 and 4 in loop b, and at end its value will be 5. In loop a, it will take on values, 1,4,7 and 10, at end its value will be 13.



NESTED DO LOOPS

Lets understand by example:

DATA test;
DO i = 1 to 3;
  DO j = 1 to i;
    OUTPUT;
  END;
END;
RUN;

Lets check output:




DO LOOP OVER SET STATEMENT

Consider below dataset with do loop:

DATA test;
SET emp;

 DO i = 1 to 10; 
  m+1;
 END;
RUN;

In the above, the DO loop runs 10 times for each observation in the dataset emp. That is if emp has 3 obs, do loop overall will run 3*10 = 30 times.

Now consider below dataset assuming emp has 3 obs:

DATA test;
 DO i = 1 to 10; 
  SET emp;
   m+1;
 END;
RUN;

can you answer what happens in the above case?

Ans:

1. When the first iteration starts, i =1. 
     Then it reads first obs (_N_ = 1) from emp data set. Increments m to 1.

2. On 2nd iteration, i=2. SAS has now already advanced to next obs in emp.
   It reads the 2nd obs, overwrites the values in pdv to those from 2nd obs
   do loop here sets _N_ = 1.
   Increments m = 2

3. On 3rd iteration i = 3, SAS now already moves to 3rd obs in emp. 
   Do loop writes this obs over the existing 2nd obs and sets _N_ = 1. 
   Now m= 3

4. On 4th iteration, i=4, SAS now moves past end of data record. 
   Do loop on not finding anything, clears PDV.
   Now m=4

5,6 7, 8, 9 and 10 are same as 4th, PDV is empty

After 10th iteration, it reaches to top of do loop, has nothing in pdv,
and as such we get no output.

 

To confirm our analysis, we will output _N_ variable on each iteration using Explicit Output:

DATA test;
DO i = 1 to 10;
 SET emp;
  m+1;
  row = _N_;
  OUTPUT;
END;
RUN;

here we get:

 

Q) What will be the output here, assuming emp has 3 obs. Tell number of obs and value of i.

DATA test;
DO i = 1 to 3;
 SET emp;
END;
RUN;

Ans: There will be just one obs, with the last obs from emp dataset, and i will be 3+1 = 4




DO UNTIL Statement

In this we specify a condition like:

Do the loop  UNTIL this condition gets true

As long as its false it will execute. It stops when condition is true +1 more time.

The Do Until is evaluated at the end of loop

That means the Do Until loop is at least evaluated once, no matter what.

Consider below:

DATA test;
age = 50;
DO until (age > 60);
 employment ='YES';
 age+1;
 OUTPUT;
END;
RUN;

Here, we expect to output age as 50, 51, 52 to 60, with employment = yes.

Lets check:

Actually age starts from 50, but before we output we increment it by 1. Thats why first one is 51. Then when age =60, the condition age>60 is false. It executes and age becomes 61.

Then 61 > 60 is true but it executes one more time that is +1 time

If we don’t include output statement, can you guess what will be the output?

Ans: It will be just one obs, with age=61 and employment =Yes

If we want it to run only till 60, and not have a record for 61, then use “GE” or “>=”, that is greater than equal to operator.

Its like:

DO UNTIL THIS GETS TRUE, IF ALREADY TRUE? THEN DO IT ATLEAST ONCE .

ONE TIME AT LEAST: IF THE CONDITION IS TRUE ALREADY, THEN DO UNTIL IS VERY FUSSY, ITS SAYS RUN ME ONCE AT LEAST.

E.g.1 

a=20;
DO UNTIL (A > 10);
A+1;
END;

For above A is already greater than 10, so if this was a while loop, we will get 20 as output. Whereas the fussy nature of do until causes it to run once and output is 21.

E.g2

DO UNTIL (A > 6);
A+1;
OUTPUT;
END;

Here a is missing, and less than 6, so we get 1,2,3,4,5,6,7. Because at end 6 is not > 6. So it goes inside loop, sets a =7, and 7 comes as output, then it goes at top and finds yes 7>6, then do loop does not run.

Some people feel that do until will run extra one step if the condition becomes true, but its not the case. It has to run once at least once even if the condition is true already. But later if loop as run at least once, then the condition gets true, it will not run again one extra step.




DO  WHILE

The DO WHILE is very similar to DO UNTIL, with some differences:

Do While: Evaluates at the beginning

Do While: Executes as long as the condition is true.

It DOESN’T RUN 1 EXTRA TIME 

DATA test;
age = 50;
DO WHILE (age > 60);
  employment ='YES';
  age+1;
 OUTPUT;
END;
RUN;

In this case, Do While will not even run once as condition is false. Although, we have one variable age, set to 50 outside the loop, we expect it to come in the output….

But as we have written an explicit “OUTPUT” statement, the SAS will not use implicit output statement.

Therefore, for above we get an empty dataset, with 2 variables, age and employment and no obs.

DATA test;
age = 50;
DO WHILE (age < 60);
 employment ='YES';
 age+1;
 OUTPUT;
END;
RUN;

Can you tell what will be output? Will we get from 50-59, 51-60 or 51-61?

We get 51-60. This is because, it starts at 50, but we add 1 before “OUTPUT”. Therefore, first obs has 51. Then it runs till 59, that is 50-59, total 10, but we keep adding 1 to it, so last is 59+1 = 60.

DO WHILE  IS EVALUATED IN BEGINNING




Q1) How many times the below loop runs and what is the output dataset:

DATA test;
x=15;
DO UNTIL (x>12); 
 x+1;
END;
RUN;

Ans: The loop runs just once, and then it finds out the condition is already true. The output is x=15+1 = 16

Q2) How many times the below loop runs and what is the output dataset:

DATA test;
x=15;
DO WHILE (x<12); 
 x+1;
END;
RUN;

Ans: As the condition is false, the do while loop not even runs once. The explicit OUTPUT is not specified, therefore, it used implicit one and we get x=15 as output.



JOINING CONDITIONS IN  DO LOOPS

Sometimes we would like to loop say from 1 to 20, but would like to stop in case a condition is met.

For this we can use a simple do loop and join with a DO WHILE or DO UNTIL:

DATA annualsales;
invest = 5000;

DO i=1 to 20 until (amount >=50000); 
 amount+ (invest+ (invest + amount)*0.10) ;
 OUTPUT;
END;
RUN;

In the same way we could have written:      DO i=1 to 20 WHILE (amount <=50000);

But what if the condition is not met after 20th iteration, that is amount doesn’t reach 50000?

In that case loop will still end.

The loop stops in between if the condition is reached.



SAMPLING BASED ON DO LOOPS

Using do loop we can select only certain obs from an existing data set. However, this is plainly for learning the do loops and otherwise, we don’t use this in the industries.

DATA samplecars; 
  DO i=10 TO 5000 BY 10; 
     SET sasuser.cars  POINT=i; 
     OUTPUT; 
  END; 
STOP;
RUN;

Here we use the POINT=num concept to select only certain obs based on value of iterator. And as we discussed, with POINT=num, we need the STOP condition and an EXPLICIT output.



Q) What happens in below case:

DATA test;
age = 7;

DO WHILE (age ge 8);
END;

RUN;

Here age=7 is NOT  GE 8, hence loop doesn’t run and we get age=7 as output.

Q) What happens below:

DATA test;

age = 7;
DO WHILE (age le 8);
END;

RUN;

Ans: Here age is less than 8, it goes inside loop, but there is no condition in loop, age remains at 7 and it keeps on INFINITELY looping over and over again.

This is a case of infinite loop. This must be avoided.

Similarly using DO Until, we get INFINITE LOOP:

DATA test;

age = 7;
DO UNTIL (age ge 8);
END;

RUN;



IMPORTANT SAS FUNCTIONS

For basic maths calculation we have several inbuilt functions like:

 SUM,  MEAN,   MEDIAN,  VAR,   STD/STDEV   and so on....

Many a times we will read below type of data file into SAS:

1234 423423 34234 987989..........

Although the data dictionary would provide basic idea on what these fields are, but in reality we might have to combine/manipulate some fields to generate meaningful information. As a result often we will read the fields has a,b,c… etc or  y1, y2,y3.. and so on. This saves the effort.

On such numerical variables we can easily perform these maths functions like:

MEAN(x1,x2,x3, x4)….

Lets give it a try:

DATA test;
INPUT Y1 Y2 Y3;
DATALINES;
1 2 3
4 5 6
7 8 9
;

If we need to create a new dataset, which has a variable averageY, which basically has mean from Y1-Y3:

DATA test1;
SET test;
averageY = MEAN(Y1, Y2, Y3);

We get:

Prett good, but what if the fields were Y1 to Y25 or Y100 and we need to sum them? We could do:

SUM(Y1 – Y3)

But the above is wrong, it will take Y1 minus Y3

The right way is as below:

DATA test1;
SET test;
averageY = MEAN( OF Y1 - Y3);
sumY = SUM(OF Y1-Y3);

On the same lines, this is wrong:   SUM( Y1 Y2 Y3)

But this is correct: SUM(OF  Y1 Y2 Y3)

 

Q) a = Sum(12345678912345,  1, 3009). For this function, what will be length of variable ‘a’?

Let us actually try this:

DATA test1;
SET test;
sumY = SUM(12345678123456 ,1, 3009);

As we see it gets a default length of 8 Bytes. Which in fact is the maximum length of Numeric variables. Even if we try Length statement, it will fail:

DATA test1;
LENGTH sumY 20.;
SET test;
sumY = SUM(12345678123456 ,1, 3009);
RUN;

This gives error in log as:

That is: The length of numeric variables is 3-8.

No matter what you do, you can’t go beyond length of 8 bytes on numerics. You can read in as much size as you need, you can format to appropriate value display as shown:

DATA test1;
SET test;
FORMAT sumY 20.;
sumY = SUM(12345678123456 ,1, 3009);
RUN;

Format helps in viewing the actual value.


 


SUBSTR

Using SUBSTR, we can get a certain string cut out from a larger string:

SUBSTR(‘ABCDEFG’, 2, 1) = B

That is start at 2nd position and give out 1 character.

Length of the variable created by Substr is same as input variable



Another use usage of SUBSTR is to replace certain parts of a string.

Example, suppose our company called ‘alpha’ has taken over by ‘beta’ and they want all employee ids should now be replacing alpha by beta.

Earlier employee ids were like alpha001, alpha002. Now we want to replace alpha by beta.

DATA emp;
INPUT name $ id $;
DATALINES;
sumit alpha001
jack alpha002
;

DATA emp1;
SET emp;
SUBSTR(id,1,5) = 'beta';

Please note that alpha is 5 chars and beta is 4, lets check what happens:

As you see it puts a blank space after beta and keeps the length same.

What if we try to replace alpha by betabeta, that is 5 chars by 8 chars?

DATA emp1;
SET emp;
SUBSTR(id,1,5) = 'betabeta';

As you see it only picks up 5 chars. If chars are less it puts up blanks, if chars are more it selects only the needed ones.

 

LENGTH OF VARIABLE CREATED BY SUBSTR

What will be the length of a variable created by substr? Please note that in SAS except TRANWRD and CATX, no other function is SAS sets its own length, but it depends of INPUT VARIABE.

Example:

DATA test;
a = 'sumit';
b = SUBSTR(a, 1,3);
c = SUBSTR(a, 1,3)||'-'||'bharti';
RUN;

Let us check the lengths of variable a,b,c.

As you see, the variable b, which has 3 chars, has length as 5, which is same as input variable.

You can see that || the concatenation, allows to extend the length as that makes sense.

 

Q1) What if we miss to specify the characters to replace? example:  SUBSTR(id, 3) = ‘betagamma’. What happens, does it give error?

Ans: When you don’t specify the character limit, it assumes you wanna read/replace till the end. earlier values were alpha001 => 8 characters. Now it will keep in place 2 initial characters of existing string and from 3rd position, it will pick 5 more characters from betagamma= betag, and replace.. Lets try.

DATA emp1;
SET emp;
SUBSTR(id,3) = 'betagamma';

 

Q2) What will be length of b here:

DATA test;
a = 'SUMIT BHARTI';
b = SUBSTR(a,1,3) || "it is here";
RUN;

Ans: We know substr keeps the length of input variable. Here its 12. And then we know Concatenation adds to length. So here b has length of 12+10 = 22.

Also, although substr keeps original variable length, it doesn’t pad extra spaces to end. As a result we get “SUMit is here”.

 

Q3) Consider below data set:

DATA test;
a = 'SUMIT';
b = '1';
a = a||b;
RUN;

What will be output of above how many obs and what values?

Ans: This is an easy looking question and a tricky one. It confuses us with Length of variable and Concatenation. We know concatenation can increase length, but that is true only for new variables created by concatenation

For above, length of a = 2, and even concatenation can’t change this. So a remains = ‘sumit’ with length 5 and b is ‘1’. Infact even catx and tranwrd can’t change length of existing variable.



SCAN

The scan function can scan a string for any special characters and then use them as separators for individual words. For example  ‘MADE IN CHINA’, using scan we can use space as separator, and get ‘made’, ‘in’ and ‘china’ as 3 separate words. Lets try:

Consider below dataset:

DATA company;
INPUT product $
DATALINES;
valve:inline a001
whistle:offline b897
;

Suppose in variable product we have first a product type, application and then product number. Using scan we shall extract them.

DATA test;
SET company;
type = SCAN(product, 1 , ': ' );
appl = SCAN(product, 2 , ': ' );
num  = SCAN(product, 3 , ': ' );

Here the field separators are “:” and space.

In this way we can bundle in multiple separators

 

By default the following separators are included in SAS:

blank . < ( + | & ! $ * ) ; ^ – / , %

That means if we don’t specify any separators, then the above are included anyhow

 

So to separate first name and last name from ‘SUMIT BHARTI’, which is separated by space, we could do:

fname = SCAN(name, 1)

lname = SCAN(name, 2)

Its not mandatory to specify separators, like above and then it will use defaults separators all of them, however, if say we specfify ‘,’ that is a comma, then it will only search for a comma.

 

Q) What is the length of variables fname and lname in this case?

Ans: When scan is used to separate words from a large string, the each extracted word would have same length as the original string.


More functions:

TRIM – Trims trailing spaces only.  If suppose we have name = ”  SUMIT     “, if we use TRIM on name, the front spaces will not be removed.

Moreover, if we trim name and assign to a new variable, then new variable will have same length as original name, that is it will also pad up so many spaces, and it will be waste of our effort.

DATA test;
name  = " sumit ";
name1 = trim(name);
RUN;

We may assume that name1 will only be ” sumit”, but no it will have same length and data as ” sumit    “. To avoid, use Length statement with name1 and then it will be truncated accordingly.


UPCASE / Lowcase – To convert chars to up/low case


INDEX

Index searches a given char variable (or string) for a set of characters and returns the position at which it was found. if not found it returns 0. example;

DATA emp;
INPUT name $ salary;
DATALINES;
sumit 100
jack 200;
kimi 300;

Now, lets search for characters ‘mi’:

DATA test;
SET emp;
ind = INDEX(name, 'mi');

The best use of INDEX is to remove certain obs from data. Suppose we don’t want to keep records of users which have ‘mi’ in their names, then we do:

DATA test;
SET emp;
IF INDEX(name, 'mi') = 0;

 



FIND

As name suggests its used to search for a substring. Find is an Elder brother of Index.

pos = FIND(“my name is sumit bharti”, ‘sumit’);  ==> This gives 12, and will be 0 if not found.

What if we are not sure about case of string or search character?

pos = FIND(“my name is sumit bharti”, ‘SuMit’, ‘i’);

This still returns 12, as we have used a modifier “i” to ignore casing.

We can also specify a start position to start searching from that position, which defaults to 1.



FIND AND REPLACE CHARACTERS

To replace some particular set of characters from an existing char variable, we use TRANWRD. Which means TRANsfer WoRD.

DATA test;
SET emp;
name = TRANWRD(name, 'mi', 'bommo');

Again, note that TRANWRD allows you to specify more or less than the number of chars you are replacing, but you can’t exceed the original variable’s length. Therefore, if you specify more characters that will cross existing var length, it will pick only those many as needed.

TRANWRD is perhaps the only function is SAS character function, which sets a fixed length of 200.

example:

DATA test;
name = 'SUMIT BHARTI';
newname = TRANWRD(name, 'MIT', 'JEET');
RUN;

Here length of name is 12, but TRANWRD not only replaces given characters within a string, it creates space to add more characters if need be. Here, MIT is replaced by JEET. The result is “SUJEET BHARTI”. It replaced 3 chars with 4 chars but didn’t eat into existing space.

If you see, the length is 200. If we take a substr of chars after the name, they will be empty, so always use TRIM with TRANWRD, but unlike TRANWRD, trim take length based on input variable, which here would be from TRANWRD and of 200 length. So even after TRIM length would be 200.



CATX Function

It allows to concatenate two or more strings and remove leading and trailing blanks, and insert separators. Example:

DATA test;
name = " sumit  ";
age = 35;
dept = "  IT Team   ";
full = CATX(" *** ", name, age, dept);
RUN;

Here we specify the field separator first as ” *** “, let us check the output:

The catx does a wonderful job, removes leading and trailing spaces from each variable and also allows to use numeric age to concatenate. But there is a reason why SAS calls it a CATx function. We know cats are sweet but naughty. This CATx is also naughty, and sets the length of the new variable as 200. How bad!

To teach CATx a lesson, use Length statement.

Example:  CATX(“”, ‘abcd    ‘,     ‘      efgh     ‘)

Here we are joining abcd which has spaces to right and efgh which has spaces on either side. CATX here used no delimiter, what CATX does is trim spaces from abcd and efgh on both sides on these. BUT DOES NOT RETURN  “abcdefgh”   BUT RETURNS

abcd efgh

That is abcd followed by space then efgh. This is because CATX is a concat function to add delimiter, if you remove delimiter it takes default delimiter of space. Remember a CAT is sharp, with extra X, its smarter. 🙂

Please note, there is CAT function in SAS, to join two or more char values:    

c = “sum” || “it”   , or

c = CAT(‘sum’,’i”,’t’);

That is use double pipe for concat or use CAT function.



Concatenation of Strings in SAS

After understanding this concept you would also understand why SAS is one of the most trickiest systems available in the market. Believe me… Consider below example:

DATA test;
name = 'SUMIT BHARTI';
fname = SUBSTR(name, 1,5);
fullname1 = fname || 'BHARTI';
fullname2 = SUBSTR(name, 1,5) || 'BHARTI';
RUN;

Can you answer what will be length of all variables and values in fullname1 and 2?

Ans: Length of name = 12. Length of fname = 12 as well.

Length of fullname 1 and 2 = 12+6 = 18.

But what about the values in fullname1 and 2? Are they same, lets check:

The above is because:

When we assign SUBSTR value to another variable, it pads up spaces at end

But if we include SUBSTR in a calculation, it doesn’t pad up spaces.

Similarly, we have:

DATA test;
name = 'SUMIT';
fname = TRANWRD(name, 'IT','JEET');
fullname1 = fname || 'BHARTI';
fullname2 = TRANWRD(name, 'IT','JEET') || 'BHARTI';
RUN;

Here we have length of fname = 200, lengths of fullname1 and 2 = 200+6 = 206. But output:

The problem with concatenation remains the same, even if we use TRANWRD or any other function.

This concept is quite important in practical aspect as well as for Base Certification exam.


REVERSE FUNCTION

DATA test;
a = 'sumit';
b = REVERSE(a);
RUN;

This results in b = ‘timus’





GETTING NUMBER FROM CHARACTERS.

Assume the number ‘323.98’, has been stored as a char variable.

To get Integer part out of it we use:

INT(‘323.98’), This gives 323, no rounding happens, just the decimals are ignored.

DATA test;
a = 323.5876545;
b = ROUND(323.5876545, 0.1);

Here b, will be rounded to first place of decimal and becomes 323.6

To round to two places of decimal we use: 0.01

b = ROUND(323.5876545, 0.01) =>    323.59… and so on

To round to nearest 100, ROUND(323.5876545, 100) => gives 300.

If no decimal is specified its rounded to nearest integer.



RAISING A NUMBER TO A POWER

We can use “**” that is double multiplier to raise a number to a power. Example, suppose we want to raise 2 to the power 3, which should return 8, we use 2**3:

DATA test;
age = 2;
newage = age**3;
RUN;

This gives:

age   newage
2      8

Quotient and Remainder in SAS

divisor = 11/3 ==> returns  3.6667

quotient = MOD(11,3) ==> returns 2

 

Posted in: SAS Filed under:

Leave a Reply

Your email address will not be published. Required fields are marked *