Bringing OpenClinica Data into SAS<br />rick.watts@ualberta.ca<br />780-248-1170<br />
CRIC and OpenClinica<br />CRIC supports a wide variety of studies<br />‘Regulatory’ clinical trials<br />Many different ty...
OpenClinica exports seem difficult for our users to work with.<br />Data structures vary depending on the data content.<br...
The Challenge<br />We wanted to:<br />Produce consistently usable data for minimal up front effort.<br />Get data that cou...
Create ‘SAS friendly’ XML to be read by the XML Libname engine.<br />Create a SAS XML Map file to assign labels, data type...
SAS macros or external utility?<br />Hi complexity<br />Ensure OpenClinica metadata translated into legal SAS names.<br />...
Command Line Java Utility<br />Programmer available<br />(I would have to write SAS code myself!)<br />Capable development...
Enter connection parameters and study identifier (interactively or command line)<br />Connect to Postgres via ODBC<br />Re...
Legalize Names<br />SAS names <= 32 characters<br />Must start with a letter or underscore<br />Format names cannot end in...
CRFs<br />No ‘top level’ mapping between CRFs and data sets.<br />CRF Section -> SAS data set<br />CRF sections contain lo...
Groups -> Rows<br />Ungrouped section data repeated in each row<br />Each repeat becomes a separate row in the data set<br...
CRF items -> dataset variables<br />Item_name -> variable name<br />Description_label -> variable label<br />Calculate len...
A new column is created for each response value<br />Column names based on item_name<br />Columns labeled based on item_la...
Response option lists become SAS formats and informats.<br />Format names created from CRF item’s response_label.<br />For...
Informats are created to read numeric data and handle OpenClinica null values.<br />CRF Dates<br />procformat;<br />invalu...
Numeric Response Options<br />procformat;<br />invaluebestnull'ASKU' = .k<br />'NA'   = .a<br />'NASK' = .d<br />'NI'   = ...
Formats are created for CRF data.<br />Response options<br />procformat;<br />valueyesno0 = 'No'<br />1 = 'Yes'<br />.k = ...
Dates<br />procformat;<br />valuecrfdate	.k	= 'ASKU'<br />.a	= 'NA'<br />.d	= 'NASK'<br />.i= 'NI'<br />.p	= 'NP'<br />.o	...
Numeric Data<br />procformat;<br />valuebestnull	.k	= 'ASKU'<br />.a	= 'NA'<br />.d	= 'NASK'<br />.i= 'NI'<br />.p	= 'NP'<...
CRF Data<br />One data set per CRF section<br />Each row contains:<br />Study ID<br />Site ID<br />Subject ID<br />Study e...
Subject Data<br />List of subjects including site, secondary ID, group, etc.<br /> Event Data<br />List of subjects study ...
Data for removed subjects is not exported.<br />PHI data remains encrypted .<br />Output Data Sets<br />
C:> java -jar export.jar<br />----------------------------------------<br />             Export Output:             <br />...
Successful connection to database openclinica on jdbc:postgresql://localhost:5432/<br /> <br />Please choose a study:<br /...
Command line options may be used rather than prompts. Options include:<br />Host, database, ID and password<br />Study OID...
Define libraries<br />libnameocdata xml92 “data_file.xml"xmlmap=“map_file.map“ access=readonly;<br />libname library “c:pr...
Execute the Import<br />%letscommand	=java -Xmx256m -jar c:exportexport.jar;<br />%letshost	=-h 10.11.12.13;<br />%let spo...
Create the Format Catalog from the XML<br />procsortdata=ocdata92.fmtlib out=work.fmtlib;<br />byfmtname type start;<br />...
Copy the Data Sets<br />procdatasetslibrary=ocdata92;<br />copyout=studylib;<br />excludefmtlib; <br />quit;<br />SAS Code...
Import into SAS<br />If we have time:<br />XML Structures<br />Import into Access<br />Import into Excel<br />Do It!<br />
Rick Watts<br />rick.watts@ualberta.ca<br />780-248-1170<br />Contact<br />
Upcoming SlideShare
Loading in …5
×

Bringing OpenClinica Data into SAS

3,206 views

Published on

OpenClinica Global Forum 2010. A Java tool to create \'SAS friendly\' XML from OpenClinica

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,206
On SlideShare
0
From Embeds
0
Number of Embeds
24
Actions
Shares
0
Downloads
49
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Bringing OpenClinica Data into SAS

  1. 1. Bringing OpenClinica Data into SAS<br />rick.watts@ualberta.ca<br />780-248-1170<br />
  2. 2. CRIC and OpenClinica<br />CRIC supports a wide variety of studies<br />‘Regulatory’ clinical trials<br />Many different types of academic study<br />Variable size and complexity<br />Investigators design their own CRFs<br />CRIC has limited control over design strategies and CRF consistency.<br />Analysis requirements and data formats vary<br />SPSS, Stata, SAS, Excel.<br />CRIC’s Preferred data handling tool is SAS<br />
  3. 3. OpenClinica exports seem difficult for our users to work with.<br />Data structures vary depending on the data content.<br />CRF versions (repeat as extra columns)<br />Group contents (number of repeats)<br />Multi-select objects difficult to handle.<br />Must be ‘broken’ into separate variables for analysis.<br />Null values represented as text in otherwise numeric variables<br />OpenClinica Export<br />
  4. 4. The Challenge<br />We wanted to:<br />Produce consistently usable data for minimal up front effort.<br />Get data that could easily be transferred into different formats.<br />Produce tall, thin, de-normalized data sets suitable for data management purposes.<br />Leverage CRF metadata to add value:<br />Dataset labels<br />Variable labels<br />SAS formats and informats<br />SAS special missing values.<br />
  5. 5. Create ‘SAS friendly’ XML to be read by the XML Libname engine.<br />Create a SAS XML Map file to assign labels, data types, informats and formats.<br />Generate a CNTLIN data set in the XML suitable for use by PROC FORMAT.<br />Note: The XML file can also be imported directly into MS Access.<br />The Solution<br />
  6. 6. SAS macros or external utility?<br />Hi complexity<br />Ensure OpenClinica metadata translated into legal SAS names.<br />Map OC hierarchy to SAS data sets.<br />CRFs, sections, groups and data items to tables, rows and columns.<br />De-duplicate object names<br />No resource to develop complex macros<br />Development Approach<br />
  7. 7. Command Line Java Utility<br />Programmer available<br />(I would have to write SAS code myself!)<br />Capable development environment<br />Portable (Windows / Linux)<br />Callable from within SAS<br />The Choice<br />
  8. 8. Enter connection parameters and study identifier (interactively or command line)<br />Connect to Postgres via ODBC<br />Read study metadata<br />Manipulate the metadata<br />Write map file<br />Read study data<br />Write data file<br />Data Processing<br />
  9. 9. Legalize Names<br />SAS names <= 32 characters<br />Must start with a letter or underscore<br />Format names cannot end in a number<br />De-duplicate names<br />Multiple CRFs may contain the same section and response option names.<br />Duplicate names have numbers and underscores appended.<br />Metadata Manipulations<br />
  10. 10. CRFs<br />No ‘top level’ mapping between CRFs and data sets.<br />CRF Section -> SAS data set<br />CRF sections contain logically grouped data – CRFs may not!<br />CRFs containing multiple sections result in multiple output data sets.<br />Every data item contained within a section is output to the same data set.<br />Section label -> dataset name<br />Section title -> dataset label<br />Metadata Manipulations<br />
  11. 11. Groups -> Rows<br />Ungrouped section data repeated in each row<br />Each repeat becomes a separate row in the data set<br />Rows are numbered to provide a unique key based on their order within the group.<br />Multiple groups contained within the same section are merged based on order within the groups.<br />Where groups contain unequal numbers of rows missing values result.<br />Metadata Manipulations<br />
  12. 12. CRF items -> dataset variables<br />Item_name -> variable name<br />Description_label -> variable label<br />Calculate length of character variables<br />SAS has no support for VARCHARs. Explicitly specifying variable length saves considerable space on disk.<br />Metadata Manipulations<br />
  13. 13. A new column is created for each response value<br />Column names based on item_name<br />Columns labeled based on item_label and response option value.<br />Columns contain 1 or 0 to indicate selected or unselected.<br />Multi-select and Checkbox items<br />
  14. 14. Response option lists become SAS formats and informats.<br />Format names created from CRF item’s response_label.<br />Format names legalized and de-duplicated.<br />If separate CRFs contain identical response option lists only one format results.<br />Formats and Informats are written to the XML as a new data table.<br />This is used as a CNTRLIN data set for PROC FORMAT.<br />Response Options<br />
  15. 15. Informats are created to read numeric data and handle OpenClinica null values.<br />CRF Dates<br />procformat;<br />invaluecrfdate'ASKU' = .k<br />'NA' = .a<br />'NASK' = .d<br />'NI' = .i<br />'NP' = .p<br />'OTH' = .o<br />'UNK' = .u<br /> other = [mmddyy10.];<br />run;<br />Missing Values<br />
  16. 16. Numeric Response Options<br />procformat;<br />invaluebestnull'ASKU' = .k<br />'NA' = .a<br />'NASK' = .d<br />'NI' = .i<br />'NP' = .p<br />'OTH' = .o<br />'UNK' = .u<br /> other = [best10.];<br />run;<br />Missing Values<br />
  17. 17. Formats are created for CRF data.<br />Response options<br />procformat;<br />valueyesno0 = 'No'<br />1 = 'Yes'<br />.k = 'ASKU'<br />.a = 'NA'<br />.d = 'NASK'<br />.i = 'NI'<br />.p = 'NP'<br />.o = 'OTH'<br />.u = 'UNK';<br />run;<br />Missing Values<br />
  18. 18. Dates<br />procformat;<br />valuecrfdate .k = 'ASKU'<br />.a = 'NA'<br />.d = 'NASK'<br />.i= 'NI'<br />.p = 'NP'<br />.o = 'OTH'<br />.u = 'UNK‘<br />Other = [date9.];<br />run;<br />Missing Values<br />
  19. 19. Numeric Data<br />procformat;<br />valuebestnull .k = 'ASKU'<br />.a = 'NA'<br />.d = 'NASK'<br />.i= 'NI'<br />.p = 'NP'<br />.o = 'OTH'<br />.u = 'UNK‘<br />Other = [best10.] ;<br />run;<br />Missing Values<br />
  20. 20. CRF Data<br />One data set per CRF section<br />Each row contains:<br />Study ID<br />Site ID<br />Subject ID<br />Study event name<br />Event start and end date<br />CRF Name<br />CRF Version<br />Data Set Output<br />
  21. 21. Subject Data<br />List of subjects including site, secondary ID, group, etc.<br /> Event Data<br />List of subjects study events including start date, end date and status.<br />CRF Status<br />List of subject CRFs including event details, CRF version, creation date, completion date and status.<br />Discrepancies<br />Output Data Sets<br />
  22. 22. Data for removed subjects is not exported.<br />PHI data remains encrypted .<br />Output Data Sets<br />
  23. 23. C:> java -jar export.jar<br />----------------------------------------<br /> Export Output: <br />----------------------------------------<br /> MAP FILE: export.map.xml<br /> EXPORT FILE: export.xml<br />----------------------------------------<br />Postgresql driver loaded<br /> <br />Enter Database url (default: localhost):<br />Database port (default: 5432):<br />Database name (default: openclinica):<br />username (default: clinica):<br />password: <br /> <br />Enter Export file name (default: derived from study):<br />Enter Map file name (default: derived from study):<br />Interactive Execution<br />
  24. 24. Successful connection to database openclinica on jdbc:postgresql://localhost:5432/<br /> <br />Please choose a study:<br />----------------------<br /> 1) Study1<br /> 2) Study2<br /> 3) Study3<br /> 4) Study4<br />==> 1<br /> <br />Retrieving study metadata<br />Creating subject table<br />Writing formats to .xml file<br />Writing subjects to .xml file<br />Retrieving study item data<br />Writing study item data to file<br />Complete<br />Files generated: study1.map.xml<br /> Study1.xml<br />Interactive Execution<br />
  25. 25. Command line options may be used rather than prompts. Options include:<br />Host, database, ID and password<br />Study OID<br />File names<br />Suppression of map file<br />Creation of ‘SPSS friendly’ SAS data sets<br />Minimal formatting allows data sets to be exported to SPSS using PROC EXPORT.<br />Command line options allow the utility to be executed from within SAS.<br />Command Line Options<br />
  26. 26. Define libraries<br />libnameocdata xml92 “data_file.xml"xmlmap=“map_file.map“ access=readonly;<br />libname library “c:projectfmt";<br />libnamestdylib“c:projectdata";<br />SAS Code<br />
  27. 27. Execute the Import<br />%letscommand =java -Xmx256m -jar c:exportexport.jar;<br />%letshost =-h 10.11.12.13;<br />%let sport =-p 5432;<br />%letsstudy =-soid S_STDY1234;<br />%letsdatabase =-D openclinica;<br />%letsuser =-U dbuserid;<br />%letspswd =-P password;<br />%letspss = ;<br />X "&scommand &shost &sport &sstudy &sdatabase &suser &spswd &smapFile &sdataFile &spss";<br />SAS Code<br />
  28. 28. Create the Format Catalog from the XML<br />procsortdata=ocdata92.fmtlib out=work.fmtlib;<br />byfmtname type start;<br />run;<br />procformatcntlin=work.fmtliblibrary=library fmtlib;<br />run;<br />SAS Code<br />
  29. 29. Copy the Data Sets<br />procdatasetslibrary=ocdata92;<br />copyout=studylib;<br />excludefmtlib; <br />quit;<br />SAS Code<br />
  30. 30. Import into SAS<br />If we have time:<br />XML Structures<br />Import into Access<br />Import into Excel<br />Do It!<br />
  31. 31. Rick Watts<br />rick.watts@ualberta.ca<br />780-248-1170<br />Contact<br />

×