This is a collection of SAS interview question answers. Most of the questions are commonly asked in the SAS Developer/Programmer position interview. Every post has covered 10 related questions and answers.
Frequently used character functions are:
Frequently used numeric functions are:
Some SAS DDATE functions are:
In data step INPUT function is used to change character to numeric variable with different name. Expression used to convert character to numeric variable is as below:
Numvar=input(charvar,Informat.);
Sample dataset cration:
DATA CHAR_NUM_1;
INPUT X $10.;
DATALINES;
2323456789
5433333
66
;
RUN;
New Dataset and new var with numeric data type.
DATA CHAR_NUM_2;
SET CHAR_NUM_1;
NEW_VAR=(INPUT(X,10.));
RUN;
Validating New_Var data type:
PROC CONTENTS DATA = CHAR_NUM_2;
RUN;
In data step PUT function is used to change numeric to character variable with different name. Expression used to convert character to character variable is as below:
Charvar=put(Numvar,Informat.);
Sample dataset cration:
DATA NUM_CHAR_1;
INPUT Y $10.;
DATALINES;
2323456789
5433333
66
;
RUN;
New Dataset and new var with character data type.
DATA NUM_CHAR_2;
SET NUM_CHAR_1;
NEW_VAR=PUT(Y,10.);
RUN;
Validating New_Var data type:
PROC CONTENTS DATA = NUM_CHAR_2;
RUN;
Default length for numerical variable is 8. Default legth of character function is 8.
Unwanted special characacters can be remove using sas character function compress with modifier.
Sample dataset:
DATA JUNK_DATA;
INPUT ID NAME $20. SALARY;
DATALINES;
101 KRI%%%SH 66669
102 $$$MA@GAN# 89000
103 JAFFERY&! 90000
105 A#NNAYA*** 75900
;
RUN;
Clean data after using SAS character function:
DATA CLEAN_DATA;
SET JUNK_DATA;
NAME=COMPRESS(NAME,'$,@,%,#,!,*,&');
RUN;
There are several ways to change area code, one of the simplest way is using subrstr function.
DATA OLD_AREA_CODE;
INPUT ID NAME$ PHONE_NUM $12.;
DATALINES;
101 KRISH 778-324-6767
102 MAGHI 898-332-8908
103 KAFLY 778-312-1234
104 GAINT 778-231-6543
;
RUN;
Area code has changed by 757:
DATA UPDATED_AREA_CODE;
SET OLD_AREA_CODE;
SUBSTR(PHONE_NUM,1,3)='757';
RUN;
If we use function inside function then it is called nested function. One example is clculating length of numeric variable.
DATA NUMERIC_LENGTH_INCORRECT;
VAR = 564390;
NUM_CNT= LENGTH(VAR);
RUN;
DATA NUMERIC_LENGTH_CORRECT;
SET NUMERIC_LENGTH_INCORRECT;
NUM_CNT = LENGTH(STRIP(PUT(VAR,12.)));
RUN;
We can us following two nested function method can be used to get last 5 digit from product ID and data distribution can be obtained using proc freq.
DATA PRODUCT_ID;
INPUT PRODUCTID $15.;
DATALINES;
ABBSA1234Z1X253
ABBQA1234Q1X253
AEBSA1234Z2X252
ABGSA1234Z1Y253
ABBSA1234Q1X253
ABBSW1234Z1X253
AQBSA1234Z2X252
;
RUN;
DATA CATEGORY;
SET PRODUCT_ID;
CAT_ID = SUBSTR(PRODUCTID,LENGTH(PRODUCTID)-4,5);
CAT_ID1 = REVERSE(SUBSTR(REVERSE(PRODUCTID),1,5));
RUN;
PROC FREQ DATA =CATEGORY;
TABLES CAT_ID/LIST ;
RUN;