Browsing"Code"

# The Winner is Jichun Wang

Dec 4, 2011 by     1 Comment     Posted under: Code, Monthly Contest, New Technologies, Tips & Techniques

The winner of our last programming challenge is Jichun Wang. Jichun is a Sun Microsystems alumnus, now a senior software engineer at Synopsis. He uses perl to call a Google RESTful API and to parse the JSON outcome from google to get the result counting of google search. You can find his program here on my SAS_ACADEMY group.

Because the API he used is deprecated, you can find there is a significant difference in the counting using this API and that you get by googling directly in browser; however, he got the order correct!

A-Hero Cnt By API Cnt By Browser
Batman 30,600,000 332 M
Iron Man 18,700,000 266 M
Superman 13,900,000 184 M
Spiderman 13,000,000 122 M

# October Programming Challenge: Ranking American Heroes

Oct 5, 2011 by     2 Comments    Posted under: Code, Monthly Contest

At the beginning of each month, I follow the update of TIOBE Index very closely. TIOBE Index is a measurement of popularity of programming languages based on the result of search engines. By the way, SAS ranks 24 there last month. The way TIOBE Index is defined inspires me so I come up with the following game to play for this month:

Use any programming language you feel comfortable to rank the popularity of the following American heroes (each picture contains a link pointing to its Wikipedia entry):

Send me your source code by 31/10 to win a \$25 gift card.

As for our previous challenge Motto of Hogwarts School, the winner is Jiangtang Hu. This is Jiangtang’s second time to become the winner of our Programming Challenge. Congratulation! His method is to tap into an online Latin-English dictionary. Other contenders approach this problem by using Google translate or digging into the search result. As said last time, the Google Translate does not give a correct answer. For example, the following perl one-liner
```use LWP::UserAgent;\$ua=LWP::UserAgent->new;\$ua->agent('Mozilla/6.0'); print \$ua->request(HTTP::Request->new('GET','http://translate.google.com/translate_a/t?client=xxxxxxx&text=draco%20dormiens%20nunquam%20titilandus&sl=la&tl=en'))->content;```
returns “the dragon never sleeping titilandus” in the JSON format. To me the most Q&D solution is found in the following perl one-liner:
```use LWP::UserAgent;\$ua=LWP::UserAgent->new;\$ua->agent('Mozilla/6.0'); foreach(split /\n/,(\$ua->request(HTTP::Request->new('GET','http://www.google.com/search?&q=draco+dormiens+nunquam+titillandus')))->content){ if(/mean/i){s/]*>//g,s/^.*means\s+&quot\;([^&]*)&quot\;.*\$/\1/;print;}}```

# Dive into CDISC Express (5) : Generate and validate SDTM domains and define.xml

Here is the fifth part of Dive into CDISC Express.

The following tasks, such as generating SDTM domains and define.xml, need just some clicking button work in CDISC Express using a well designed mapping file. Few words needed due to the software.

Step 3 of 6: Validate mapping file (Validate_Mapping_File.sas)

It would be back and forth to design, validate then modify and re-validate the mapping file. And sure finally, you will get all the work done, at least no syntax error (how to avoid semantic errors is upon your domain knowledge). A validated mapping file, named mapping.xls will be copied to …\ doc\Mapping file – validated version\ from the working file, tmpmaping.xls. You will see

The corresponding log file in folder …\ log\

A report in …\results\Mapping Validation\, named Mapping_validation.html

Also the temporary datasets in …\tempdata\ and …\temp\:

Step 4 of 6: Generate SDTM datasets (generate_SDTM.sas)

If mapping file is OK, generating SDTM domains is just clicking the button. After submitting the codes, you will see the log file, reports, SDTM datasets and temporary datasets in corresponding folders:

Step 5 of 6: Validate SDTM datasets (Validate_SDTM_Domains.sas)

The outputs files of validating SDTM datasets are all located in C:\Program Files\CDISC Express\SDTM Validation\:

Step 6 of 6: Generate Define.xml and xpt (generate_Definexml.sas)

Get the final define.xml file and SAS transport files (.xpt):

For a quick start and deep understanding, you could read the official documentations in the following sequence:

C:\Program Files\CDISC Express\documentation\FAQ.htm

C:\Program Files\CDISC Express\documentation\Quick Start.htm

C:\Program Files\CDISC Express\documentation\User guide.htm

A video tutorial would be also helpful:

C:\Program Files\CDISC Express\documentation\videotutorial.htm

A must-read conference paper, An Excel Framework to Convert Clinical Data to CDISC SDTM Leveraging SAS Technology by Sophie McCallum and Stephen Chan of Clinovo, supplies a wonderful discussion the architectures of CDISC Express:

# Dive into CDISC Express (4) : Data manipulation techniques

Here is the fourth part of Dive into CDISC Express.

3. Data manipulation techniques in CDISC Express

CDISC Express supplies relative rich sets of data manipulation techniques assembling with SAS languages used for data mapping. Following is a not limited listing and I will keep it updated.

3.1 Reference one dataset

A raw dataset name appear in “Dataset” column indicate a “set” operation in SAS.

All dataset options can be used when referencing a dataset, such as

siteinv(drop=invcode)

siteinv(rename=(invcode=inv))

siteinv(where=(invcode ne “”))

You can also reference an external dataset. You should incorporate the external file in spreadsheet with name beginning with an underscore, “_”, and “_visits” in this case:

Then you can use it in any domains needed, e.g., TV domain:

There is a macro %cpd_importlist used to import the external dataset, “_visits”. Again, this macro roots in C:\Program Files\CDISC Express\macros\function_library\.

Using a macro call to re-sharp or modify an input dataset offers great flexibility referencing data. We will also discuss the benefits later on.

3.2 Assignment

You can assign a number, string and a dataset variable with any valid SAS functions to a SDTM domain variable in “Expression” column.

Sometimes a temporary variable needed for later calculation. You can produce such temporary variable in “Dataset” column with an assignment in the “Expression” column just similar with any other domain variables. Two differences: first, such temporary variables named begin with an asterisk, “*”; second, all temporary variables will not be included in the final domain. Once created, such temporary variables can be used for any other expressions.

There are three special symbols used in “Dataset” column of CDISC Express. Asterisk, “*” indicates a temporary variable, while other two are

Tilde, “~” : indicate a variable used for supplemental domain (SUPPQUAL).

Number sign, “#”: indicate a variable used for comments domain (CO).

Another symbol, at sign, “@”, used in “Expression” column, indicated referencing a variables produced before:

In this case, “AGEU” uses “AGE” as input, while “AGE” is calculated before. “@AGE” just indicates the dependency. In concept, it looks like the “calculated” option in SAS PROC SQL:

proc sql ;

select (AvgHigh – 32) * 5/9 as HighC ,

(AvgLow – 32) * 5/9 as LowC ,

(calculated HighC – calculated LowC)

as Range

from temps;

quit;

3.3 Match-merging

We already got a math-merging example before. If “all” appears as a dataset in the “Dataset” column, all the previous datasets should be merged first for later processing by the common key specified in “Merge Key” column. If no key assigned, patient ID is used by the system.

CDISC Express also supports two types of join, inner join and outer join (left, right, full) using data steps. The implementation has slightly difference with standard SQL, but the ideas are same.

We add a new column, “Join”, usually beside the “Merge Key” column.

There are two values for “Join”, “O” or “I” while “O” stands for “outer join” and “I”, “inner join”. A join indicator “I” equals a dataset option “in=” in action while “O” means no. Use the above as illustration, the corresponding SAS codes behind look like

data temp;

merge demog(in=a) siteinv(in=b);

by sitecode;

if b;

run;

This is so called “right outer join”. The combination of “I” and “O” in these two datasets can perform all the four types of join, one inner join and three outer join:

As we could see, if no “Join” column specified, CDISC Express will perform inner join by default.

So far CDISC Express cannot support multiply merge keys. For example, the following file is illegal currently:

 Dataset Merge Key arm siteid, grpno armdescri siteid, grpno

The developer Romain indicated that such enhancements would be raised to the next round of product road map and he also proposed a work around. To use multiple keys for merging, we can create a temporary variable holding such multiple keys as a concatenation then this temporary variable can be used as a single merging key.

3.4 Concatenating

Above we discussed lots about “merge” operation in CDISC Express. This section dedicated for “set” operation. We already know how to “set” one dataset for referencing, but how to “set” multiple datasets, i.e, “Concatenating”?

Symmetrically, an “all” appears in “Dataset” column indicating merging operation, an “all (stack)” indicates concatenating operation:

The above file can be also translated to SAS codes for better understanding:

height;

set vtsigns(where=(height ne .));

VSTESTCD=”HEIGHT”;

VSTEST =”Height”;

VSORRES =put(height,best12.);

VSORRESU=”cm”;

VSSTRESC=put(height,best12.);

VSSTRESN=height;

VSSTRESU=”cm”;

run;

data weight;

set vtsigns(where=(weight ne .));

VSTESTCD=”WEIGHT”;

VSTEST =”Weight”;

VSORRES =put(weight,best12.);

VSORRESU=”kg”;

VSSTRESC=put(weight,best12.);

VSSTRESN=weight;

VSSTRESU=”cm”;

run;

data vs;

set height weight;

STUDYID =study;

DOMAIN =&domain;

USUBJID =%CONCATENATE(_variables=study sitecode patid);

VSSEQ =%SEQUENCE();

. . .

run;

3.5 Transpose

Clinical SAS programmers do lots of transpose operation to re-sharp the raw data to fit the CDISC standards. Currently there is no explicit guide in CDISC Express on how to transpose, but this is not the end of story.

There are two types of transpose:

Type I: from a wide dataset (more variables, less observations) to a long dataset (less variables, more observations), e.g. transposing a one-row-per-subject datasets to a multiple-row-per-subject dataset

Type II: from a long dataset (less variables, more observations) to a wide dataset (more variables, less observations), e.g. transposing a multiple-row-per-subject dataset to a one-row-per-subject datasets

As good practices, in SAS we always use data steps with “output” statement to perform type I transpose and use PROC TRANSPOSE for type II. Although CDISC Express doesn’t support transpose operation in an explicit way, at least you can perform type I transpose and surprisingly we already saw it before!

Just back to section of concatenating. The example is taken from C:\Program Files\CDISC Express\studies\example2\.

We can see the input data vtsigns is typical wide table (more variables, less observations):

And the final domain VS is a typical long table (less variables, more observations):

So obviously, such concatenating operation just did a wonderful type I transpose, from a wide table to a long table! More often, the compact SAS codes for type I transpose look like:

data vs;

set vtsigns;

if height ne . then do;

VSTESTCD=”HEIGHT”;

VSTEST =”Height”;

VSORRES =put(height,best12.);

VSORRESU=”cm”;

VSSTRESC=put(height,best12.);

VSSTRESN=height;

VSSTRESU=”cm”;

output;

end;

if weight ne . then do;

VSTESTCD=”WEIGHT”;

VSTEST =”Weight”;

VSORRES =put(weight,best12.);

VSORRESU=”kg”;

VSSTRESC=put(weight,best12.);

VSSTRESN=weight;

VSSTRESU=”cm”;

output;

end;

. . .

run;

3.6 All others: use macro!

Now we discussed almost all the common data derivation techniques in programmers’ daily life and the corresponding implementation in CDISC Express. At least we have one question unsolved: how to perform type II transpose, i.e. from a long table to a wide table?

It would be an open question for the developers of the application. But we can also solve this problem in current framework: use macro, customized macro. You can use macros in “Expression” and “Dataset” column. Macro used in “Dataset” column returns a dataset, while macro in “Expression” column returns series of string: that’s the basic structure you should consider when customize your own macros. For more, you can reference the macros in C:\Program Files\CDISC Express\macros\function_library\. For example, &concatenate used in “Expression” column; &cpd_importlist in “Dataset” column.

So it would be convenient to create temporary datasets using macros imbedded type II transpose operation in “Dataset” column. Every thing SAS can do, you can also implement it in CDISC Express. Just use macros, in “Expression” and “Dataset” column accordingly.

The raw data varies according to trial design and clinical data capture system and procedures. It is impossible and impractical to anticipate the CDISC SDTM converter such as CDISC Express to map all the data just clicking a button. The introducing of CDISC Express doesn’t keep programmers away. It just keeps most of the trivial work away from programmers’ daily life and let them more concentrated on creative work and be productive and efficient.

Following would be the close of such pages.

# Programming Challenge (6): Motto of Hogwarts School

Aug 25, 2011 by     3 Comments    Posted under: Code, Monthly Contest

I am no big Harry Potter fan so I can’t answer questions like when Ron felt in love with Hermione. But that doesn’t stop me from playing this game:

Find the motto of Hogwarts School of Witchcraft and Wizardry, “draco dormiens nunquam titillandus“, in English by using a program written in whatever programming language you feel comfortable.

Something may pop up onto your mind immediately:

1. Q(uick) approach: Google translate API
2. Q(uick)&D(irty) approach: web text/knowledge mining
3. Diehard approach: write something equivalent to Google translate
4. et cetera

For the first approach, unfortunately it’s actually a failed case for Google translate (titillandus is the gerundive of titillo, titillare). For the second, I like Q&D but it may not be scalable, i.e., you can’t apply the same method to get, say, the book of Genesis in Vulgate into English. For the third one, well, I don’t think so .

Send the source code to me by 30/9 to win a \$25 gift card from Fry’s– Well, it’s actually negotiable if you prefer Macy’s.

BTW I used to think another topic:

Think a subset of sas programs that do not use macros. Write in any language you feel comfortable to get rid of comments.

Some contenders of our June challenge like Kalyani actually touched this topic. Then I chatted with Jiangtang–Both of us agreed it may be less interested in sas community. So I end up to leverage the Deathly Hallows. But if you want to try this code scanner topic, you are more than welcome!

A summary of our previous challenges and the winners:

March challenge: Web crawling (L0) (winner: Megha)
April challenge: Web crawling (L1) (winner: Megha)
May challenge: Eight Queens Puzzle (winner: Kalyani)
June challenge: Source line counting (winner: Megha)
July challenge: Infinitesimal in sas (winner: Jiangtang)

# The Winner Of July Programming Challenge Is Jiangtang Hu

Aug 15, 2011 by     No Comments    Posted under: Code, Monthly Contest, New Technologies, Tips & Techniques

Our last programming challenge is Infinitesimal in the SAS universe of discourse. The following code and the like provide a predicate to approach that “infinitesimal”:

``` data _null_; x=1; do until (0=x/2);x=x/2;c+1;end; put c= x= binary64.; run; ```

The put statement writes in the log the output:
``` c=1074 x=0000000000000000000000000000000000000000000000000000000000000001 ```

In another word, the sas infinitesimal is $2^{-1074}$.

Some programmers provided the solution by using sas contant “SMALL”. On the surface level and surprisingly, it is not the smallest number in sas universe:
``` data _null_;x=constant('SMALL'); y=x/2; z=x>y; put x= binary64. y= binary64. z=;run; ```
Log:
``` x=0000000000010000000000000000000000000000000000000000000000000000 y=0000000000001000000000000000000000000000000000000000000000000000 z=1 ```

The rationale behind the definition of “SMALL”, from my understanding, is “SMALL” is the smallest number that the implicit bit algorithm (see IEEE 754) to encode floating point numbers is applicable.

More code to play:
``` data _null_; x=log2(constant('SMALL')); put x=;run; ```
Log:
``` x=-1022 ```

# Dive into CDISC Express (2) : Create a new study

Jul 21, 2011 by     No Comments    Posted under: Best-Practices, Case studies, CDISC, Code, Tips & Techniques

Here is the second part of ‘Dive into CDISC Express’, written by Jiangtang Hu. You can read the first part here.

Step 1 of 6: Create a new study (create_new_study.sas)

Open create_new_study.sas in C:\Program Files\CDISC Express\programs\, you can see only one line of a macro call:

Just assign a study name to the macro variable, &studyname, e.g, “CLINCAP”:

Submit the codes, you can find a folder named “CLINCAP” with the same structure as the two demo studies imbedded in this application(example1 and example2) in C:\Program Files\CDISC Express\studies\, see(the left and right panels are folders and files before and after the execution of  create_new_study.sas. The following the same):

Folder ‘doc’ is used to hold the mapping files;

Folder ‘log’ used to hold log files generated by following macro calls, such as generate SDTM domains;

Folder ‘results’ and its subfolder will hold all the outputs, such as define.xml, SAS transport file, validation reports and SDTM datasets;

Folder ‘source’ holds all the clinical raw data used as inputs for SDTM domains;

Folder ‘tempdata’ holds all the temporary datasets generated by following macro calls.

Also, a configuration file named CLINCAP_configuration.sas put in C:\Program Files\CDISC Express\programs\study configuration\. This file is used to set some study level parameters, such as lab and toxicity specifications (details in C:\Program Files\CDISC Express\specs\Lab specs\).

Two versions of SDTM implementation guides are supported by CDISC Express, CDISC SDTM Implementation Guide Version 3.1.1 and Version 3.1.2. You can find the corresponding specification files in C:\Program Files\CDISC Express\specs\SDTM specs\:

SDTM_Specs_3_1_1.xls

SDTM_Specs_3_1_2.xls

The choosing of SDTM implementation version is also coded in the configuration file, in Line 41:

%LET SDTMSPECFILE=SDTM_Specs_3_1_1.xls;

Version 3.1.1 is used by default. You can also choose Version 3.1.2 if needed:

%LET SDTMSPECFILE=SDTM_Specs_3_1_2.xls;

Assign a study name and choose a SDTM implementation version. That’s all needed in step 1. Let’s take few minutes to navigate the software. CDISC Express is a set of macros and Excel files. It is important to know the file structure.

C:\Program Files\CDISC Express\

├─documentation : FAQ, Quick Start, User Guide

├─macros

│  ├─ClinMap : system level macros

│  └─function_library : study level macros

├─programs : “action taken” macros

│  ├─study configuration : study parameters configuration, e.g, choose SDTM version

├─SDTM Validation : For validation of SDTM domains

│  └─study1

├─specs : specification files

│  ├─Excel engine : ExcelXP tagset file

│  ├─Lab specs : lab and toxicity

│  ├─Mapping validation : validation rules

│  ├─SDTM specs : hold two versions of SDTM implementation

│  └─SDTM Terminology : SDTM codelist(including NCI terminology)

├─studies

│  ├─example1

└─temp : hold temporary data not specified to any studies

As we already got, all the “action taken” programs such as create_new_study.sas are located in C:\Program Files\CDISC Express\programs\. In create_new_study.sas, one macro is called, %addnewstudy, which is in C:\Program Files\CDISC Express\macros\ClinMap\.

Note that in C:\Program Files\CDISC Express\macros\, there are two sets of macros in different folders:

C:\Program Files\CDISC Express\macros\ClinMap\: this folder holds all “system” level macros used by the application only. No modification encouraged.

C:\Program Files\CDISC Express\macros\function_library\: macros used for mapping among studies. You can also create you own macro in this folder. The application imbedded macros also documented in user guide.

Next part on the mapping file next week!

# Programming Challenge: Infinitesimal in the SAS universe of discourse

Jul 8, 2011 by     1 Comment     Posted under: Code, Monthly Contest, New Technologies, Tips & Techniques

First of all, thank Jonathan Lee, Jiangtang Hu, Jacques Thibault, Neha Mohan and Na Li for their feedback on the question about the largest 3-byte integer in sas dataset in our Code Jeet Kune Do email list. Code Jeet Kune Do is expanded from our internal Q&A mail list for technique and knowledge sharing. If you are interested to subscribe, please shoot me a message at jian.dai@clinovo.com to add your email there.

Second, the term “infinitesimal” in the title is a bit misleading. Here is how the game is specified: Please use a program to find the minimal positive number that SAS can represent and send your code to me by August 1, 2011. The one who gets it right first wins

Hint: The following code is a brutal-force approach to pin down the maximal integer that SAS can represent.
``` data _null_;do until(x=x+1);x+1;end;put x= binary64.;run; ```
Don’t run it as I estimate it takes about 456 days to exit the loop on my laptop (unless you can access to a super super fast machine)

Reference:
Jacques and I discovered an excellent paper on this subject that you can use: Numeric Length: Concepts and Consequences. This topic is along the line of the post Play Dataset Like a CS Pro: Binary Tree as it touches some fundamental issues in the field of computation.

# Megha becomes the third time winner: June Programming Challenge now is finished

Jul 8, 2011 by     2 Comments    Posted under: Code, Monthly Contest, New Technologies, Scripting, Tips & Techniques

The top three contenders are Kalyani Chilukuri from Clinovo, Jiangtang Hu from Sanofi Pasteur, and Megha Agarwal from Clinovo. Thank you all for excellent work! The winner is Megha: Her code is the closest to beat the benchmark when tested on a version of CDISC Express. Congratulation!

Benchmark code
``` %let _=%sysfunc(time()); filename _ temp; data _null_; infile 'dir/s/b "C:\CDISC Express\*.sas"|findstr/v "sas7 sas~"' pipe end=EOF;input; if _n_=1 then call execute('proc printto log=_ new;'); call execute('data _null_; infile "'||_infile_||'"; input;'); if EOF then call execute('proc printto log=log;'); data _null_;retain s; x=prxparse('/(\d+) records were read from the infile/'); infile _ end=EOF;input; if prxmatch(x,_infile_) then s+input(prxposn(x,1,_infile_),best.); if EOF then put s=; run; %put %sysevalf(%sysfunc(time())-&_); ```

Nuance The next two solutions:
i) A terser and faster sas approach by deploying more sophisticated shell command:
``` data _null_; infile 'for /f "tokens=* usebackq" %f in (`dir/s/b "C:\CDISC Express\*.sas"^|findstr /i /v "sas7 sas~"`) do @type "%f"' pipe;input;run; ```
The total number of lines can be found in the log file.
ii) If you have Cygwin or MinGW minimal system installed on your Windows machine then you can pipe the unix command “wc -l” into sas to get the line counting of each program.

However, either approach encounters a systematic error rooted in the fact that CDISC Express originally is developed on Solaris, a Unix platform. For example, apply the unix command “cat -v” on the macro “attrn.sas” (the “v” switch lets this utility print nonprinting characters) and you will get something like this:
``` /*******************************************************************************^M * PROGRAM NAME: attrn.sas^M * DESCRIPTION: ^M * - Open a dataset and get one of its attributes^M *^M * PROGRAMMER: Ale Gicqueau^M *******************************************************************************/^M ^M %macro attrn(ds,attrib);^M ^M %local dsid rc;^M ^M %let dsid=%sysfunc(open(&ds,is));^M %if &dsid EQ 0 %then %do;^M %put ERROR: (attrn) Dataset &ds not opened due to the following reason:;^M %put %sysfunc(sysmsg());^M %end;^M %else %do;^M %sysfunc(attrn(&dsid,&attrib))^M %let rc=%sysfunc(close(&dsid));^M %end;^M ^M %mend; ```
“^M” is carriage return, which is missed in the last line. The above-mentioned two approaches do not count the last line in this and similar case.

Powershell We discussed the issue of “zero installation programming” before (see here and here). If you are using Windows 7 then the following one-liner script is ready to go:
``` \$c=0;foreach (\$x in ls -r | where {\$_.extension -eq ".sas"} ){ \$c=\$c+\$(get-content \$x.FullName|measure).Count }; \$c ```

# Play Dataset Like a CS Pro: Binary Tree

Jun 28, 2011 by     2 Comments    Posted under: Code, New Technologies

The challenge I posed for myself this week is to iterate a binary tree within one single data step.

To start, we build a binary tree in a SAS data set: The nodes are coded as integers from 1 to 15. The value of the node also serves as a “pseudo-address” for reference. Left and right pointers are defined in the data set variables “L” and “R”. Null pointer is represented as SAS missing value. 5 is used as the root.
``` ```

```data _; input pseudoAddress L R; datalines; 5 2 9 2 4 6```
``` 9 7 10 4 1 8 6 3 11```
``` 7 15 12 10 14 13 1 . . 8 . . 3 . . 11 . . 15 . . 12 . . 14 . . 13 . . ;run;```

Before pursuing the hardcore single data step approach, we point out here there are (at least) easy-going recursive solutions:
TAKE I
``` %macro It(R); %local Lprt Rprt; %put [&R]; proc sql noprint; select L,R into :Lprt,:Rprt from _ where pseudoAddress=&R;quit; %if %eval(&Lprt>0) %then %It(&Lprt); %if %eval(&Rprt>0) %then %It(&Rprt); %mend; ```
The output is in the format of so-called prefix notation.
``` option nonotes; %It(5) option notes; ```

TAKE II

``` %macro It1(R); %local Lprt Rprt; %put {&R}; data _null_; i=&R; set _s point=i; call symput('Lprt',put(L,best.)); call symput('Rprt',put(R,best.)); stop; run; %if %eval(&Lprt>0) %then %It1(&Lprt); %if %eval(&Rprt>0) %then %It1(&Rprt); %mend;```

To use the second recursive approach, as it deploys the random access of SAS data set, the tree has to be sorted by pseudo-address:
`proc sort data=_ out=_s; by pseudoAddress;run;`

Now the invocation:
```option nonotes; %It1(5) option notes; ```

Now the cool part: Essentially we need to manually code the part that is hidden or taken care by SAS Macro facility in the previous two recursive approaches. To do that, we need to think the process to traverse a binary tree very clearly. Structure-wise, a run-time stack must be manually implemented so two pieces of information can be pushed, node and the state of the processing of that node. Three states can be defined in the case of binary tree: to go to the left (`state=0`), to go to the right (`state=1`), and to go back (`state=2`).

``` data _null_;length stck \$100; c=0; i=5; state=0; stck=''; do until (c=n); set _s nobs=n point=i; *put _all_; if state=0 then do; put pseudoAddress=;c+1; if L>.Z then do;stck=strip(put(i,best.))||',1;'||strip(stck);i=L;end; else if R>.Z then do;stck=strip(put(i,best.))||',2;'||strip(stck);i=R;end; else do; i=input(scan(stck,1,','),4.);state=input(scan(stck,2,',;'),4.); stck=substr(stck,index(stck,';')+1); end; end; else if state=1 then do; if R>.Z then do;stck=strip(put(i,best.))||',2;'||strip(stck);i=R;state=0;end; else do; i=input(scan(stck,1,','),4.);state=input(scan(stck,2,',;'),4.); stck=substr(stck,index(stck,';')+1); end; end; else if state=2 then do; i=input(scan(stck,1,','),4.);state=input(scan(stck,2,',;'),4.); stck=substr(stck,index(stck,';')+1); end; end; stop; run; ```

Some background and reference:
William E. Benjamin , Jr. has a pioneer paper on how to build run-time stack manually. Lately we extensively discussed the recursion in SAS. See posts Recursive SAS Macro, Recursive SQL Query, Solve Eight Queens Puzzle by SAS, and Source line counting, as well as my PharmaSUG paper Permutation via Recursive SAS® Macro. A sister post can be found here.

Check out the BioNews, a very handy daily recap of the latest industry news!