Clinical information tends to be more complex, comes from multiple sources in different formats. As a result, clinical data submission has become time-consuming, costly and error-prone. CDISC® (Clinical Data Interchange Standards Consortium) established new data standards to speed up data-review and improve clinical data exchange, storage and archival. Our technology edge combined to our experience in standards implementation allows us to develop tailored CDISC solutions to accelerate your FDA review. Clinovo introduced a new opportunity to learn these recognized clinical data standards!
Clinovo’s new “CDISC Standards: Theory and Application” class is an 8-week training program starting in June 11th, 2013. The TechTrainings are technical hands-on classes for entry-level or experienced clinical trial professionals designed to help them reach the next step in their professional career. The class will be held in Palo Alto at Dentons Offices or remotely.
Taught by Sy Truong, President at Meta-Xceed and author of award-winning papers, this new course will give an overview of CDISC standards: ODM, SDTM, ADaM and Define.XML. Students will learn how to transform legacy data into these clinical standards through real-life examples. Case studies will include data exchange, archival, and electronic submission to regulatory agencies such as the FDA.
Clinovo will continue to offer the “Base Clinical SAS Programming” class to help entry-level programmers prepare the Base SAS certification, as well as the “Advanced Clinical SAS Programming” class to tackle advanced real-world SAS programming challenges. Clinovo offers $50 gift cards for referrals.
More information on the class can be found on clinovo.com/techtrainings.
Olivier Roth launched the TechTrainings by Clinovo in 2012, a series of hands-on courses for clinical trial professionals, leveraging his company’s years of on-field experience and industry expertise. He is the Marketing & Communication Coordinator at Clinovo, a CRO based in Sunnyvale, focused on streamlining clinical trials for life science companies through technology solutions. Olivier helps managing Clinovo’s marketing and communication from marketing strategy to partnership management, lead generation, event planning and new business opportunities. Prior to Clinovo, Olivier was working as a Strategic Marketing Consultant at VivaSante, an international consumer healthcare company based in Paris.
CDISC® (Clinical Data Interchange Standards Consortium) is establishing data standards to speed up data-review and improve clinical data exchange, storage and archival. Today, 60% of FDA submissions are already done in CDISC standards. The FDA is getting more and more involved into CDISC standards, a meaningful signal for the industry. Theresa Mullin, Director of Office of Planning and Informatics within CDER, claimed that “the FDA is committed to using CDISC standards for the foreseeable future”. These data standards are expected to be mandatory by 2016 for every drug submission.
CDISC standards hold the clinical data to a greater level of readability and compliancy in regards to FDA requirements. Carey Smoak, Senior Manager of SAS Programming at Roche Molecular and CDISC Device Team Leader, points out that “a submission without CDISC standards can have a review period twice as long as one under standards”. Indeed, they facilitate the FDA review process since they are known and understood by reviewers.
A 2009 study conducted by Gartner in collaboration with the CDISC organization shows that the overall clinical trial duration is divided by two when using CDISC standards. Thus CDISC standards ultimately speed up time to market.
So if the benefits of using CDISC standards are so obvious, how can we explain that so many sponsor companies are still not adopting them?
Converting legacy data to CDISC standards is expensive
Clinical data standardization is no simple process: It is time consuming and proves to be tedious. However, a few open source CDISC conversion tools have been launched to address this problem. One successful example is the OpenCDSIC validator software, recognized by the FDA and freely available. CDISC Express, Clinovo’s free SAS-based SDTM mapping tool, has been downloaded 600 times.
In the future, standards can be adopted smoothly if the industry works harder at incorporating them earlier in the process. Indeed, the next challenge is to push CDISC standards upfront in the clinical trial process. CDISC experts agree the best timing to implement CDISC standards is the database built.
CDISC standards are still evolving
Standards are still being built and are in constant evolution. The CDISC organization is still releasing new versions of its clinical standards. Sponsors companies are often scared that if they convert their clinical data to a format, it will be obsolete a year later. Clinical trial experts still state however that sponsor companies should shift to CDISC standards as soon as possible.
Companies often lack the internal expertise
In order to be efficiently used and maintained, Carey Smoak points out that “the wiser choice is to hire people with expertise on CDISC standards”. Companies should educate themselves on this topic and exclusively hire experts from CDISC Registered Solutions Providers organizations.
Ale Gicqueau, CEO at Clinovo
There is a general consensus that the old paper-based data management tools and processes were inefficient and should be optimized. Electronic Data Capture has transformed the process of clinical trials data collection from a paper-based Case Report Form (CRF) process (paper-based) to an electronic-based CRF process (edc process).
In an attempt to optimize the process of collecting and cleaning clinical data, the Clinical Data Interchange Standards Consortium (CDISC), has developed standards that span the research spectrum from preclinical through postmarketing studies, including regulatory submission. These standards primarily focus on definitions of electronic data, the mechanisms for transmitting them, and, to a limited degree, related documents, such as the protocol. Read more »
Austin, TX – 18 April 2012 – The Clinical Data Interchange Standards Consortium (CDISC) is pleased to announce today at the CDISC European Interchange Conference in Stockholm, Sweden, the release of the first iteration of a Protocol Representation “Toolkit” for clinical research. The purpose is to make it easy for authors of the research plan or protocol to reap the benefits of the Protocol Representation Model (PRM), which has been developed over the past decade by global clinical research experts from academia, industry and government. Using such a model can save time and resources for research studies by enabling electronic re-use of protocol information for other purposes such as clinical trial registration, study tracking, regulatory information and study reports. The current release of the “Toolkit” includes a standard Study Outline Template in MS Word format, a standard list of Study Outline Concepts, and a complete mapping of the Study Outline Concepts to both the Biomedical Research Integrated Domain Group (BRIDG) model and the CDISC Study Data Tabulation Model (SDTM) Trial Summary (TS) Domain. Read more »
Here is the fifth part of Dive into CDISC Express.
The following tasks, such as generating SDTM domains and define.xml, need just some clicking button work in CDISC Express using a well designed mapping file. Few words needed due to the software.
Step 3 of 6: Validate mapping file (Validate_Mapping_File.sas)
It would be back and forth to design, validate then modify and re-validate the mapping file. And sure finally, you will get all the work done, at least no syntax error (how to avoid semantic errors is upon your domain knowledge). A validated mapping file, named mapping.xls will be copied to …\ doc\Mapping file – validated version\ from the working file, tmpmaping.xls. You will see
The corresponding log file in folder …\ log\
A report in …\results\Mapping Validation\, named Mapping_validation.html
Also the temporary datasets in …\tempdata\ and …\temp\:
Step 4 of 6: Generate SDTM datasets (generate_SDTM.sas)
If mapping file is OK, generating SDTM domains is just clicking the button. After submitting the codes, you will see the log file, reports, SDTM datasets and temporary datasets in corresponding folders:
Step 5 of 6: Validate SDTM datasets (Validate_SDTM_Domains.sas)
The outputs files of validating SDTM datasets are all located in C:\Program Files\CDISC Express\SDTM Validation\:
Step 6 of 6: Generate Define.xml and xpt (generate_Definexml.sas)
Get the final define.xml file and SAS transport files (.xpt):
Recommended reading and action taken
For a quick start and deep understanding, you could read the official documentations in the following sequence:
C:\Program Files\CDISC Express\documentation\FAQ.htm
C:\Program Files\CDISC Express\documentation\Quick Start.htm
C:\Program Files\CDISC Express\documentation\User guide.htm
A video tutorial would be also helpful:
C:\Program Files\CDISC Express\documentation\videotutorial.htm
A must-read conference paper, An Excel Framework to Convert Clinical Data to CDISC SDTM Leveraging SAS Technology by Sophie McCallum and Stephen Chan of Clinovo, supplies a wonderful discussion the architectures of CDISC Express:
Here is the fourth part of Dive into CDISC Express.
3. Data manipulation techniques in CDISC Express
CDISC Express supplies relative rich sets of data manipulation techniques assembling with SAS languages used for data mapping. Following is a not limited listing and I will keep it updated.
3.1 Reference one dataset
A raw dataset name appear in “Dataset” column indicate a “set” operation in SAS.
All dataset options can be used when referencing a dataset, such as
siteinv(where=(invcode ne “”))
You can also reference an external dataset. You should incorporate the external file in spreadsheet with name beginning with an underscore, “_”, and “_visits” in this case:
Then you can use it in any domains needed, e.g., TV domain:
There is a macro %cpd_importlist used to import the external dataset, “_visits”. Again, this macro roots in C:\Program Files\CDISC Express\macros\function_library\.
Using a macro call to re-sharp or modify an input dataset offers great flexibility referencing data. We will also discuss the benefits later on.
You can assign a number, string and a dataset variable with any valid SAS functions to a SDTM domain variable in “Expression” column.
Sometimes a temporary variable needed for later calculation. You can produce such temporary variable in “Dataset” column with an assignment in the “Expression” column just similar with any other domain variables. Two differences: first, such temporary variables named begin with an asterisk, “*”; second, all temporary variables will not be included in the final domain. Once created, such temporary variables can be used for any other expressions.
There are three special symbols used in “Dataset” column of CDISC Express. Asterisk, “*” indicates a temporary variable, while other two are
Tilde, “~” : indicate a variable used for supplemental domain (SUPPQUAL).
Number sign, “#”: indicate a variable used for comments domain (CO).
Another symbol, at sign, “@”, used in “Expression” column, indicated referencing a variables produced before:
In this case, “AGEU” uses “AGE” as input, while “AGE” is calculated before. “@AGE” just indicates the dependency. In concept, it looks like the “calculated” option in SAS PROC SQL:
proc sql ;
select (AvgHigh – 32) * 5/9 as HighC ,
(AvgLow – 32) * 5/9 as LowC ,
(calculated HighC – calculated LowC)
We already got a math-merging example before. If “all” appears as a dataset in the “Dataset” column, all the previous datasets should be merged first for later processing by the common key specified in “Merge Key” column. If no key assigned, patient ID is used by the system.
CDISC Express also supports two types of join, inner join and outer join (left, right, full) using data steps. The implementation has slightly difference with standard SQL, but the ideas are same.
We add a new column, “Join”, usually beside the “Merge Key” column.
There are two values for “Join”, “O” or “I” while “O” stands for “outer join” and “I”, “inner join”. A join indicator “I” equals a dataset option “in=” in action while “O” means no. Use the above as illustration, the corresponding SAS codes behind look like
merge demog(in=a) siteinv(in=b);
This is so called “right outer join”. The combination of “I” and “O” in these two datasets can perform all the four types of join, one inner join and three outer join:
As we could see, if no “Join” column specified, CDISC Express will perform inner join by default.
So far CDISC Express cannot support multiply merge keys. For example, the following file is illegal currently:
The developer Romain indicated that such enhancements would be raised to the next round of product road map and he also proposed a work around. To use multiple keys for merging, we can create a temporary variable holding such multiple keys as a concatenation then this temporary variable can be used as a single merging key.
Above we discussed lots about “merge” operation in CDISC Express. This section dedicated for “set” operation. We already know how to “set” one dataset for referencing, but how to “set” multiple datasets, i.e, “Concatenating”?
Symmetrically, an “all” appears in “Dataset” column indicating merging operation, an “all (stack)” indicates concatenating operation:
The above file can be also translated to SAS codes for better understanding:
set vtsigns(where=(height ne .));
set vtsigns(where=(weight ne .));
set height weight;
USUBJID =%CONCATENATE(_variables=study sitecode patid);
. . .
Clinical SAS programmers do lots of transpose operation to re-sharp the raw data to fit the CDISC standards. Currently there is no explicit guide in CDISC Express on how to transpose, but this is not the end of story.
There are two types of transpose:
Type I: from a wide dataset (more variables, less observations) to a long dataset (less variables, more observations), e.g. transposing a one-row-per-subject datasets to a multiple-row-per-subject dataset
Type II: from a long dataset (less variables, more observations) to a wide dataset (more variables, less observations), e.g. transposing a multiple-row-per-subject dataset to a one-row-per-subject datasets
As good practices, in SAS we always use data steps with “output” statement to perform type I transpose and use PROC TRANSPOSE for type II. Although CDISC Express doesn’t support transpose operation in an explicit way, at least you can perform type I transpose and surprisingly we already saw it before!
Just back to section of concatenating. The example is taken from C:\Program Files\CDISC Express\studies\example2\.
We can see the input data vtsigns is typical wide table (more variables, less observations):
And the final domain VS is a typical long table (less variables, more observations):
So obviously, such concatenating operation just did a wonderful type I transpose, from a wide table to a long table! More often, the compact SAS codes for type I transpose look like:
if height ne . then do;
if weight ne . then do;
. . .
3.6 All others: use macro!
Now we discussed almost all the common data derivation techniques in programmers’ daily life and the corresponding implementation in CDISC Express. At least we have one question unsolved: how to perform type II transpose, i.e. from a long table to a wide table?
It would be an open question for the developers of the application. But we can also solve this problem in current framework: use macro, customized macro. You can use macros in “Expression” and “Dataset” column. Macro used in “Dataset” column returns a dataset, while macro in “Expression” column returns series of string: that’s the basic structure you should consider when customize your own macros. For more, you can reference the macros in C:\Program Files\CDISC Express\macros\function_library\. For example, &concatenate used in “Expression” column; &cpd_importlist in “Dataset” column.
So it would be convenient to create temporary datasets using macros imbedded type II transpose operation in “Dataset” column. Every thing SAS can do, you can also implement it in CDISC Express. Just use macros, in “Expression” and “Dataset” column accordingly.
The raw data varies according to trial design and clinical data capture system and procedures. It is impossible and impractical to anticipate the CDISC SDTM converter such as CDISC Express to map all the data just clicking a button. The introducing of CDISC Express doesn’t keep programmers away. It just keeps most of the trivial work away from programmers’ daily life and let them more concentrated on creative work and be productive and efficient.
Following would be the close of such pages.
Here is the second part of ‘Dive into CDISC Express’, written by Jiangtang Hu. You can read the first part here.
Step 1 of 6: Create a new study (create_new_study.sas)
Open create_new_study.sas in C:\Program Files\CDISC Express\programs\, you can see only one line of a macro call:
%addnewstudy(studyname=my new study);
Just assign a study name to the macro variable, &studyname, e.g, “CLINCAP”:
Submit the codes, you can find a folder named “CLINCAP” with the same structure as the two demo studies imbedded in this application(example1 and example2) in C:\Program Files\CDISC Express\studies\, see(the left and right panels are folders and files before and after the execution of create_new_study.sas. The following the same):
Folder ‘doc’ is used to hold the mapping files;
Folder ‘log’ used to hold log files generated by following macro calls, such as generate SDTM domains;
Folder ‘results’ and its subfolder will hold all the outputs, such as define.xml, SAS transport file, validation reports and SDTM datasets;
Folder ‘source’ holds all the clinical raw data used as inputs for SDTM domains;
Folder ‘tempdata’ holds all the temporary datasets generated by following macro calls.
Also, a configuration file named CLINCAP_configuration.sas put in C:\Program Files\CDISC Express\programs\study configuration\. This file is used to set some study level parameters, such as lab and toxicity specifications (details in C:\Program Files\CDISC Express\specs\Lab specs\).
Two versions of SDTM implementation guides are supported by CDISC Express, CDISC SDTM Implementation Guide Version 3.1.1 and Version 3.1.2. You can find the corresponding specification files in C:\Program Files\CDISC Express\specs\SDTM specs\:
The choosing of SDTM implementation version is also coded in the configuration file, in Line 41:
Version 3.1.1 is used by default. You can also choose Version 3.1.2 if needed:
Assign a study name and choose a SDTM implementation version. That’s all needed in step 1. Let’s take few minutes to navigate the software. CDISC Express is a set of macros and Excel files. It is important to know the file structure.
C:\Program Files\CDISC Express\
├─documentation : FAQ, Quick Start, User Guide
│ ├─ClinMap : system level macros
│ └─function_library : study level macros
├─programs : “action taken” macros
│ ├─study configuration : study parameters configuration, e.g, choose SDTM version
├─SDTM Validation : For validation of SDTM domains
├─specs : specification files
│ ├─Excel engine : ExcelXP tagset file
│ ├─Lab specs : lab and toxicity
│ ├─Mapping validation : validation rules
│ ├─SDTM specs : hold two versions of SDTM implementation
│ └─SDTM Terminology : SDTM codelist(including NCI terminology)
└─temp : hold temporary data not specified to any studies
As we already got, all the “action taken” programs such as create_new_study.sas are located in C:\Program Files\CDISC Express\programs\. In create_new_study.sas, one macro is called, %addnewstudy, which is in C:\Program Files\CDISC Express\macros\ClinMap\.
Note that in C:\Program Files\CDISC Express\macros\, there are two sets of macros in different folders:
C:\Program Files\CDISC Express\macros\ClinMap\: this folder holds all “system” level macros used by the application only. No modification encouraged.
C:\Program Files\CDISC Express\macros\function_library\: macros used for mapping among studies. You can also create you own macro in this folder. The application imbedded macros also documented in user guide.
Next part on the mapping file next week!
CDISC Express Mapping contest comes to an end today!
The challenge was to create a mapping file to map the source data set provided on this page to the SDTM DM domain using CDISC Express. Learn more about CDISC Express.
The winner is Jiangtang Hu! He is a reader and a blogger, lives in Beijing, China. He is a statistical SAS programmer at Sanofi Pasteur and a new member of the “Elite Fathers’ Club” by life. He wins an iPad2!
Jiangtang is one of the eraly testers and adopters of CDISC Express. He wrote a paper to help users like him ‘Dive into CDISC Express’. We posted the first part of this paper on our blog last week. There are 4 parts focused on guiding the user in the different features on the application. Next part will be published next week.
Thank you to all the participants and congratulations to Jiangtang
Here is the first part of a post written by Jiangtang Hu, statistical SAS programmer at Sanofi Pasteur Beijing in the Biostatistics department. Jiangtang was one of the first to download CDISC Express. I have been interacting a lot with him. He is sharing on his personal blog his experience as a tester and user and offers the community a very practical guidance to use CDISC Express.
Thank you Jiangtang for this valuable input! This dive into CDISC Express is structured in four parts. I will be posting one every week. Don’t hesitate to comment this post and ask your questions to Jiangtang or myself.
Recently I did for my personal project some research on Clinovo’s open source application, CDISC Express, a SAS application based on Excel framework designed to map clinical data to CDISC SDTM domains automatically. Not perfect yet, but it is easily understandable and practically usable after few hours’ of exploration of user guide. And most important, it is on the right way: an automatic CDISC converter is the magic weapon in almost every clinical programmer’s dream.
CDISC Express is the first and only practically usable open source CDISC converter I even met. I wrote a post a month ago when I first tested it with great interests and reported some issues to its fix system. Then I also had the great opportunity to discuss the software via email with its core developer, Romain Miralles. This post is just my personal notes on how to use and dig into the software, and will be best serve as a working documentation. You can return to me for any questions and comments.
By the way, there is an opportunity for your practicing and you will also have a change to win an iPad2 from Clinovo’s CDISC Express Contest:
The due day is July 15th and I already submitted my work. That’s fun.
1. Download and Installation
You can get CDISC Express for free in
It is a window application and will be installed by default in
C:\Program Files\CDISC Express\
After installation, this path will be coded as a macro variable &CDISCPATH in the following six SAS files which are all located in C:\Program Files\CDISC Express\programs\:
The macro variable reads as
%LET CDISCPATH = C:\Program Files\CDISC Express;
If you change the destination folder at the installation stage, e.g., to D:\CDISC Express\, the value of the macro variable &CDISCPATH will be changed accordingly in the six files mentioned before:
%LET CDISCPATH = D:\CDISC Express;
Note that if you want copy the whole folder of files to another destination, you should at least manually change the value of &CDISCPATH in such six files or add some codes to capture the path accordingly. From this point of view, the path setting of CDISC Express is not completely portable. Recommend that if you have such needs, just re-install the software in any destination you want. It will not write any records into registry and you can have many copies in one machine.
The following discussion assumes the software roots in C:\Program Files\CDISC Express\.
2. Working Flow
You can follow all the 6 action steps one by one coded in
C:\Program Files\CDISC Express\programs\
1) Create a new study (create_new_study.sas)
Simple and easy. Just assign a new study name in a macro call and run.
2) Generate mapping file (generate_mapping_template.sas)
This is the critical and most time consuming part. You should design mapping rules for every domain needed in Excel spreadsheets (the MAPPING FILE). If done, all other tasks, such as generate SDTM datasets, SAS transport files, define.xml and validation, can be well done by just clicking buttons.
3) Validate mapping file (Validate_Mapping_File.sas)
For validating the mapping file, just click the button. As mentioned, the most important work is designing mapping file. It would be back and forth to design mapping file and validate it.
4) Generate SDTM datasets (generate_SDTM.sas)
If mapping file is OK, click the button.
5) Validate SDTM datasets (Validate_SDTM_Domains.sas)
Click the button.
6) Generate Define.xml (generate_Definexml.sas)
Click the button.
Following part will dig into the software step by step.
You still have a chance to participate and win an iPad2! Your challenge is to create a mapping file to map the source data set provided on this page to the SDTM DM domain using CDISC Express.
Here are the rules http://www.clinovo.com/cdisc/game
2. Jian’s challenge is still running. You have until July 1st to count the lines of all SAS source code of CDISC Express. Jian is waiting for your answers!
Send us your mapping file and get a chance to win a Fry’s gift card.
- Best Practices (3)
- Best-Practices (16)
- BioNews (3)
- Business Best Practices (5)
- Case studies (2)
- CDISC (11)
- Clinical Data Management (6)
- Clinical Stories (1)
- Code (13)
- EDC (7)
- Event (3)
- Events (7)
- Menu (3)
- Monthly Contest (12)
- New Technologies (15)
- OpenClinica (2)
- SAS Library (4)
- Scripting (2)
- Tips & Techniques (14)
- Trends (11)