Programming Challenge: Infinitesimal in the SAS universe of discourse
First of all, thank Jonathan Lee, Jiangtang Hu, Jacques Thibault, Neha Mohan and Na Li for their feedback on the question about the largest 3-byte integer in sas dataset in our Code Jeet Kune Do email list. Code Jeet Kune Do is expanded from our internal Q&A mail list for technique and knowledge sharing. If you are interested to subscribe, please shoot me a message at jian.dai@clinovo.com to add your email there.
Second, the term “infinitesimal” in the title is a bit misleading. Here is how the game is specified: Please use a program to find the minimal positive number that SAS can represent and send your code to me by August 1, 2011. The one who gets it right first wins
Hint: The following code is a brutal-force approach to pin down the maximal integer that SAS can represent.
data _null_;do until(x=x+1);x+1;end;put x= binary64.;run;
Don’t run it as I estimate it takes about 456 days to exit the loop on my laptop (unless you can access to a super super fast machine)
Reference:
Jacques and I discovered an excellent paper on this subject that you can use: Numeric Length: Concepts and Consequences. This topic is along the line of the post Play Dataset Like a CS Pro: Binary Tree as it touches some fundamental issues in the field of computation.
Megha becomes the third time winner: June Programming Challenge now is finished
The top three contenders are Kalyani Chilukuri from Clinovo, Jiangtang Hu from Sanofi Pasteur, and Megha Agarwal from Clinovo. Thank you all for excellent work! The winner is Megha: Her code is the closest to beat the benchmark when tested on a version of CDISC Express. Congratulation!
Benchmark code
%let _=%sysfunc(time());
filename _ temp;
data _null_;
infile 'dir/s/b "C:\CDISC Express\*.sas"|findstr/v "sas7 sas~"' pipe end=EOF;input;
if _n_=1 then call execute('proc printto log=_ new;');
call execute('data _null_; infile "'||_infile_||'"; input;');
if EOF then call execute('proc printto log=log;');
data _null_;retain s;
x=prxparse('/(\d+) records were read from the infile/');
infile _ end=EOF;input;
if prxmatch(x,_infile_) then s+input(prxposn(x,1,_infile_),best.);
if EOF then put s=;
run;
%put %sysevalf(%sysfunc(time())-&_);
Nuance The next two solutions:
i) A terser and faster sas approach by deploying more sophisticated shell command:
data _null_; infile 'for /f "tokens=* usebackq" %f in (`dir/s/b "C:\CDISC Express\*.sas"^|findstr /i /v "sas7 sas~"`) do @type "%f"' pipe;input;run;
The total number of lines can be found in the log file.
ii) If you have Cygwin or MinGW minimal system installed on your Windows machine then you can pipe the unix command “wc -l” into sas to get the line counting of each program.
However, either approach encounters a systematic error rooted in the fact that CDISC Express originally is developed on Solaris, a Unix platform. For example, apply the unix command “cat -v” on the macro “attrn.sas” (the “v” switch lets this utility print nonprinting characters) and you will get something like this:
/*******************************************************************************^M
* PROGRAM NAME: attrn.sas^M
* DESCRIPTION: ^M
* - Open a dataset and get one of its attributes^M
*^M
* PROGRAMMER: Ale Gicqueau^M
*******************************************************************************/^M
^M
%macro attrn(ds,attrib);^M
^M
%local dsid rc;^M
^M
%let dsid=%sysfunc(open(&ds,is));^M
%if &dsid EQ 0 %then %do;^M
%put ERROR: (attrn) Dataset &ds not opened due to the following reason:;^M
%put %sysfunc(sysmsg());^M
%end;^M
%else %do;^M
%sysfunc(attrn(&dsid,&attrib))^M
%let rc=%sysfunc(close(&dsid));^M
%end;^M
^M
%mend;
“^M” is carriage return, which is missed in the last line. The above-mentioned two approaches do not count the last line in this and similar case.
Powershell We discussed the issue of “zero installation programming” before (see here and here). If you are using Windows 7 then the following one-liner script is ready to go:
$c=0;foreach ($x in ls -r | where {$_.extension -eq ".sas"} ){
$c=$c+$(get-content $x.FullName|measure).Count
}; $c
Play Dataset Like a CS Pro: Binary Tree
The challenge I posed for myself this week is to iterate a binary tree within one single data step.
To start, we build a binary tree in a SAS data set: The nodes are coded as integers from 1 to 15. The value of the node also serves as a “pseudo-address” for reference. Left and right pointers are defined in the data set variables “L” and “R”. Null pointer is represented as SAS missing value. 5 is used as the root.
data _; input pseudoAddress L R; datalines;
5 2 9
2 4 6
9 7 10
4 1 8
6 3 11
7 15 12
10 14 13
1 . .
8 . .
3 . .
11 . .
15 . .
12 . .
14 . .
13 . .
;run;
Before pursuing the hardcore single data step approach, we point out here there are (at least) easy-going recursive solutions:
TAKE I
%macro It(R);
%local Lprt Rprt;
%put [&R];
proc sql noprint; select L,R into :Lprt,:Rprt from _ where pseudoAddress=&R;quit;
%if %eval(&Lprt>0) %then %It(&Lprt);
%if %eval(&Rprt>0) %then %It(&Rprt);
%mend;
The output is in the format of so-called prefix notation.
option nonotes;
%It(5)
option notes;
TAKE II
%macro It1(R);
%local Lprt Rprt;
%put {&R};
data _null_; i=&R; set _s point=i;
call symput('Lprt',put(L,best.));
call symput('Rprt',put(R,best.));
stop;
run;
%if %eval(&Lprt>0) %then %It1(&Lprt);
%if %eval(&Rprt>0) %then %It1(&Rprt);
%mend;
To use the second recursive approach, as it deploys the random access of SAS data set, the tree has to be sorted by pseudo-address:
proc sort data=_ out=_s; by pseudoAddress;run;
Now the invocation:
option nonotes;
%It1(5)
option notes;
Now the cool part: Essentially we need to manually code the part that is hidden or taken care by SAS Macro facility in the previous two recursive approaches. To do that, we need to think the process to traverse a binary tree very clearly. Structure-wise, a run-time stack must be manually implemented so two pieces of information can be pushed, node and the state of the processing of that node. Three states can be defined in the case of binary tree: to go to the left (state=0), to go to the right (state=1), and to go back (state=2).
data _null_;length stck $100;
c=0; i=5; state=0; stck='';
do until (c=n);
set _s nobs=n point=i;
*put _all_;
if state=0 then do;
put pseudoAddress=;c+1;
if L>.Z then do;stck=strip(put(i,best.))||',1;'||strip(stck);i=L;end;
else if R>.Z then do;stck=strip(put(i,best.))||',2;'||strip(stck);i=R;end;
else do;
i=input(scan(stck,1,','),4.);state=input(scan(stck,2,',;'),4.);
stck=substr(stck,index(stck,';')+1);
end;
end;
else if state=1 then do;
if R>.Z then do;stck=strip(put(i,best.))||',2;'||strip(stck);i=R;state=0;end;
else do;
i=input(scan(stck,1,','),4.);state=input(scan(stck,2,',;'),4.);
stck=substr(stck,index(stck,';')+1);
end;
end;
else if state=2 then do;
i=input(scan(stck,1,','),4.);state=input(scan(stck,2,',;'),4.);
stck=substr(stck,index(stck,';')+1);
end;
end;
stop;
run;
Some background and reference:
William E. Benjamin , Jr. has a pioneer paper on how to build run-time stack manually. Lately we extensively discussed the recursion in SAS. See posts Recursive SAS Macro, Recursive SQL Query, Solve Eight Queens Puzzle by SAS, and Source line counting, as well as my PharmaSUG paper Permutation via Recursive SASĀ® Macro. A sister post can be found here.
Categories
- Best Practices (3)
- Best-Practices (16)
- BioNews (3)
- Business Best Practices (5)
- Case studies (2)
- CDISC (11)
- Clinical Data Management (6)
- Clinical Stories (1)
- Code (13)
- EDC (7)
- Event (3)
- Events (7)
- Menu (3)
- Monthly Contest (12)
- New Technologies (15)
- OpenClinica (2)
- SAS Library (4)
- Scripting (2)
- Tips & Techniques (14)
- Trends (11)




Posted under: 