Play Dataset Like a CS Pro: Binary Tree
The challenge I posed for myself this week is to iterate a binary tree within one single data step.
To start, we build a binary tree in a SAS data set: The nodes are coded as integers from 1 to 15. The value of the node also serves as a “pseudo-address” for reference. Left and right pointers are defined in the data set variables “L” and “R”. Null pointer is represented as SAS missing value. 5 is used as the root.
data _; input pseudoAddress L R; datalines;
5 2 9
2 4 6
9 7 10
4 1 8
6 3 11
7 15 12
10 14 13
1 . .
8 . .
3 . .
11 . .
15 . .
12 . .
14 . .
13 . .
;run;
Before pursuing the hardcore single data step approach, we point out here there are (at least) easy-going recursive solutions:
TAKE I
%macro It(R);
%local Lprt Rprt;
%put [&R];
proc sql noprint; select L,R into :Lprt,:Rprt from _ where pseudoAddress=&R;quit;
%if %eval(&Lprt>0) %then %It(&Lprt);
%if %eval(&Rprt>0) %then %It(&Rprt);
%mend;
The output is in the format of so-called prefix notation.
option nonotes;
%It(5)
option notes;
TAKE II
%macro It1(R);
%local Lprt Rprt;
%put {&R};
data _null_; i=&R; set _s point=i;
call symput('Lprt',put(L,best.));
call symput('Rprt',put(R,best.));
stop;
run;
%if %eval(&Lprt>0) %then %It1(&Lprt);
%if %eval(&Rprt>0) %then %It1(&Rprt);
%mend;
To use the second recursive approach, as it deploys the random access of SAS data set, the tree has to be sorted by pseudo-address:
proc sort data=_ out=_s; by pseudoAddress;run;
Now the invocation:
option nonotes;
%It1(5)
option notes;
Now the cool part: Essentially we need to manually code the part that is hidden or taken care by SAS Macro facility in the previous two recursive approaches. To do that, we need to think the process to traverse a binary tree very clearly. Structure-wise, a run-time stack must be manually implemented so two pieces of information can be pushed, node and the state of the processing of that node. Three states can be defined in the case of binary tree: to go to the left (state=0), to go to the right (state=1), and to go back (state=2).
data _null_;length stck $100;
c=0; i=5; state=0; stck='';
do until (c=n);
set _s nobs=n point=i;
*put _all_;
if state=0 then do;
put pseudoAddress=;c+1;
if L>.Z then do;stck=strip(put(i,best.))||',1;'||strip(stck);i=L;end;
else if R>.Z then do;stck=strip(put(i,best.))||',2;'||strip(stck);i=R;end;
else do;
i=input(scan(stck,1,','),4.);state=input(scan(stck,2,',;'),4.);
stck=substr(stck,index(stck,';')+1);
end;
end;
else if state=1 then do;
if R>.Z then do;stck=strip(put(i,best.))||',2;'||strip(stck);i=R;state=0;end;
else do;
i=input(scan(stck,1,','),4.);state=input(scan(stck,2,',;'),4.);
stck=substr(stck,index(stck,';')+1);
end;
end;
else if state=2 then do;
i=input(scan(stck,1,','),4.);state=input(scan(stck,2,',;'),4.);
stck=substr(stck,index(stck,';')+1);
end;
end;
stop;
run;
Some background and reference:
William E. Benjamin , Jr. has a pioneer paper on how to build run-time stack manually. Lately we extensively discussed the recursion in SAS. See posts Recursive SAS Macro, Recursive SQL Query, Solve Eight Queens Puzzle by SAS, and Source line counting, as well as my PharmaSUG paper Permutation via Recursive SAS® Macro. A sister post can be found here.
Programming Challenge: Source line counting
The winner of our third programming challenge is Kalyani Chilukuri. She got all 92 solutions to Eight-Queens-Puzzle by using SAS. Congratulation!
This time I want to do more on tree-traversal. And the question is actually already sent to an sas programmer mail list I am maintaining.
- Option I: Take “CDISC Express“. How many lines of SAS source code are there in the package? If you haven’t installed “CDISC Express”, you can download it from here.
- Option II: Take “
C:\Program Files\SAS\” which is the default SAS installation folder. Count the lines of all SAS source code going with your SAS installation.
This challenge is actually quite relevant–from time to time when you market a new SAS package, you may want to tell your prospect the total number of source lines as a good measure of the scale of your product.
Please send your code to me, be it in SAS or not
The deadline is July 1, 2011 and the prize is a $25 gift card from either Macy’s or Fry’s.
Programming Challenge: Solve Eight Queens Puzzle by SAS
Megha has kept her momentum to become the winner of the second programming challenge: Web Crawling (L1). Congratulation, Megha
!
About web crawling, we’ve done with single page mining, and cross-page mining. The next step is to move to a single domain mining. However, to do that, we’ve got to make some preparation. That is where “Eight queens puzzle” comes from.
By definition from wikipedia,
The eight queens puzzle is the problem of placing eight chess queens on an 8×8 chessboard so that no two queens attack each other. Thus, a solution requires that no two queens share the same row, column, or diagonal.
So our challenge is: Write in SAS to solve eight queens puzzle. Please follow an output format: for the example shown in the figure below, the solution is expressed as 24683175. So every solution is written as an eight-digit number: The 1st digit from the left represents the queen’s rank position in file a, the second the queen’s in file b, et cetera.
| a | b | c | d | e | f | g | h | ||
| 8 | 8 | ||||||||
| 7 | 7 | ||||||||
| 6 | 6 | ||||||||
| 5 | 5 | ||||||||
| 4 | 4 | ||||||||
| 3 | 3 | ||||||||
| 2 | 2 | ||||||||
| 1 | 1 | ||||||||
| a | b | c | d | e | f | g | h |
Please send your code to me. The deadline is June 1, 2011 and the prize is a $25 gift card from either Macy’s or Fry’s.
Recursive SQL Query
A family tree is a relation (Father, Mother, Child). The restriction of this relation to subrelation (Father, Son) or (Mother, Daughter) looks more like a tree in the sense of computer data structure.
It is a quite common question in genealogy to trace the ancestry of a given offspring. The algorithm can be specified recursively in a language agnostic way:
Take the paternal branch of the family tree. The ancestry of a male offspring A is the union of the father of A and the ancestry of the father of A.
How could this question be addressed from within SAS? Consider the following three generation family tree where each individual is coded with an id number:
data FamilyTree; input Father Mother Child;
datalines;
1 2 9
1 2 10
1 2 11
3 4 12
3 4 13
3 4 14
3 4 15
5 6 16
5 6 17
7 8 18
7 8 19
7 8 20
9 12 21
9 12 22
9 12 23
10 16 24
10 16 25
15 18 26
15 18 27
;
run;
The following SAS macro prints out the entire paternal ancestry of a given offspring:
%macro Tracing(OffSpring,
Q=%str(select Father from FamilyTree where Child in (&OffSpring))
);
proc sql;
%unquote(&Q)
;quit;
%if &sqlobs %then
%Tracing(&OffSpring,
Q=%str(select Father from FamilyTree where Child in ( &Q ))
)
;
%mend;
To test, try
%Tracing(21)
The following SAS macro searches and puts the earliest forefather of a given offspring into a macro variable:
%macro ForeFather(Id);
%local Tmp;
proc sql noprint;
select Father into :Tmp from FamilyTree
where Child = &Id;
quit;
%if &sqlobs %then %do;
%ForeFather(&Tmp)
%end;
%else %do;
%global FFather;
%let FFather=&Id;
%end;
%mend;
To test, try
%ForeFather(26)
Reference:
http://tech.groups.yahoo.com/group/sas081708/message/325
Recursive SAS Macro
Yes, you can use SAS Macro recursively.
For starters, let’s set out with this super classic example of factorial n! for n=1,2,3,… . Actually, this is my bonus interview project for our candidates
%macro Factorial(n);
%* Assume &n=1,2,3,4,... ;
%if &n=1 %then 1;
%else %eval(&n*%Factorial(%eval(&n-1)));
%mend;
To test, you can do something like this: Read more »
Categories
- Best Practices (3)
- Best-Practices (16)
- BioNews (3)
- Business Best Practices (5)
- Case studies (2)
- CDISC (11)
- Clinical Data Management (6)
- Clinical Stories (1)
- Code (13)
- EDC (7)
- Event (3)
- Events (7)
- Menu (3)
- Monthly Contest (12)
- New Technologies (15)
- OpenClinica (2)
- SAS Library (4)
- Scripting (2)
- Tips & Techniques (14)
- Trends (11)




Posted under: 