The New York State University at Brockport
             Department of Computational Science




Standardization of “Drug S...
Abstract:

The purpose of this thesis is the development of an application process for
preparing reports on drug safety. T...
The hiring of qualified staff and carefully selecting software increases the quality
and reduces costs. A two-hour job may...
gathered to develop the system as a whole. It can accept data from both papers
and electronic databases. Databases such as...
I. Use the SAS ODBC driver to access by communicating with
         either local or remote SAS servers using TCP/IP protoc...
machine or Teradata, MSSQL Server or any other
                       machine.
                    f. Baan or PeopleSoft f...
PostScript/ PDF/ PCL files, RTF or even color graphs that can be made
   interactive using ActiveX controls or Java applet...
critical body reactions. Another resource is the company’s surveys on products
completed by patients or clients who are vo...
 In the pharmaceutical field and bio-informatics, SAS software is generally
   thought of for statistical analysis progra...
does encoding that is part of clinical data entry. All data entries are standard
based approved by the FDA.


Terminologie...
The FDA uses MedDRA as a part of the proposed rule for post-marketing
reporting. MedDRA is the abbreviation for Medical Di...
MedDRA classifications have an Object Oriented data structure as shown in the
following screens.



                      ...
13
Each MedDRA has a unique code that can be use as a searching key.



                                                     ...
A query makes a link between collected data and terms in MeDRA. A
query can create a selection on a description of medical...
Control Code:

SAS and MedDRA both have code controlling utility to do the following:

   Debugging system and maintenanc...
SCM includes a friendly GUI that has SAS file check-in/check-out capabilities.
This GUI lists all libraries, data sets, ca...
pieces of information which are data values, and the formats determine how
these values are displayed or used in calculati...
o illegal mathematical operations
 o observations out of order for BY-group processing
 o Incorrect reference in an INFILE...
Data mining is a critical aspect of these reporting systems. Occasionally, the
predictions may be even more important than...
(panel) data analysis), whereas, data is usually analyzed by regression (one
observation for each patient). Sometimes it i...
operational data from internal systems such as the homegrown applications
     of clinics or hospitals, the manual data co...
the study medications. These data may be joined to MedDRA
            information to build a larger directory that is used...
2. HLGT
      MedDRA CODE             Numeric
      MedDRA Term             String
3. PT
      MedDRA CODE             Num...
o Other relevant history including preexisting medical condition (e.g.
               allergies, race, pregnancy, smoking ...
The specification of required information for an adverse event serves as a
starting point for constructing a conceptual sc...
ID
                  Reason
                  date_of_event
                  date_of_report
                  therapy_sta...
may also be saved seperatly in a data source. This designed E_R model gives
substantial flexibility in the designing of th...
Many data might come as raw data. This raw data must be entered into a SAS
data set. As an example, one of the clients mig...
1- Identify the file directly in the INFILE, FILE, or other SAS statement that

         uses the file.
   2- Set up a fil...
environment.    ‘libref’ makes a shortcut to the metadata on the SAS
     Metadata Server. Any metadata in the SAS metadat...
SQL Scripting Goal is the driving of available data from any possible data source.
Most vendor applications have SQL backb...
MedDRACode         INT,
drug_id            CHAR(12),
INDEX drug_ind (drug_id),
FOREIGN KEY (drug_id) REFERENCES drug(id)
O...
PRIMARY KEY(Info_id)
) ENGINE=INNODB;

/* transforming to a tabular form of this E_R model includes aggration
is streightf...
indexes, and data values in PROC SQL tables can be updated.              It is also
possible to update and retrieve data f...
Country or Origin for Patients Receiving the suspected drug in a Postmarketing Setting
                                   ...
proc sql;
create view temp2 as
   select region, count(region) as Exposures
   from paitient,
    where paitient_Id in (se...
second time. One can use the SAS utility to convert data from one form to
another or copy between machines. A free trial o...
The above script retrieves MedDRA Classification from a data source. Often
these data may not represent all MedDRA data. U...
From the parameter list created, values can be individually highlighted and
chosen for processing. These required paramete...
values(   'NDA','International Conference on Harmonisation ')
values(   'PSUR','Medical Dictionary for Regulatory Activiti...
(1)         m (lower case) = meter
    (2)                     kg = kilogram
    (3)                      g = gram
    (4)...
‘3’ = ‘week’
          ‘4’ = ‘month’
          ‘5’ = ‘year’

  run;
  proc format;
     value $age_range _form
        ‘1’...
proc format library=proclib;

        value $sex
              '1'='male’
              '2'='female'
              '3'='un...
can come from an ODBS. These data may have dynamic data values that get
up-dated by end-users through the web. Normally, t...
options comamid=tcp;
filename rlink    '!sasrootconnectsaslinktcptso.scr';
signon os390host;

/***************************...
from oracdat group by gender,country;)
      union
    (select gender,country, count(*) into population
    from paitient1...
modeling used by the programmer. The degree to which an implementation is
standardized is in direct proportion to the corr...
References
     SAS Publishing, the Analyst Application, Second Edition (July 2002)


     Adriaans, P., and D.Zantings....
   MedDRA http://www.meddrahelp.com/




                                        50
Upcoming SlideShare
Loading in …5
×

Standardization of “Drug Safety” Reporting Applications-doc file

3,278 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,278
On SlideShare
0
From Embeds
0
Number of Embeds
21
Actions
Shares
0
Downloads
41
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Standardization of “Drug Safety” Reporting Applications-doc file

  1. 1. The New York State University at Brockport Department of Computational Science Standardization of “Drug Safety” Reporting Applications Halley M. Zand Winter, 2005 Thesis advisor: Dr. Robert Tuzun 1
  2. 2. Abstract: The purpose of this thesis is the development of an application process for preparing reports on drug safety. The FDA is responsible for protecting the public health by assuring the safety and security of human and veterinary drugs. Annually, companies who provide medications are required to generate reports that assure the FDA of the drug’s safety. This thesis proposes an Information Technology infrastructure model that provides drug providers IT organization with a strategic perspective on how to computerize their Drug Safety reporting activity. It introduces software development concepts, methods, techniques and tools for collecting data from multiple platforms and generates reports from them by scripting queries. Introduction: According to Guidance Documents for Drug Evaluations and Research from the U.S. Food and Drug Administration all prescription drugs, both new and generic need to be approved by the FDA. To obtain these approvals, drug providers are required to generate annual reports on product safety and attach them to their application letter. Also, any person can report to the FDA a reaction or problem with a drug. The FDA reviews applications and all reported clinical outcomes to see if the reported events happened because of other reasons or use the suspected drug. Manually reporting is not practical because of the large volume of data, and the differing platforms and formats in which they are stored. Unfortunately, tools and standards are often poorly used due to lack of Database Application Modeling, Programming and Software Engineering skills. User applications are often cobbled together with little more efficiency than manual processing, and tools for automation and large scale data processing are not utilized. 2
  3. 3. The hiring of qualified staff and carefully selecting software increases the quality and reduces costs. A two-hour job may take a week due to poor technical skills, and the cost of software licensing may increase by as much as 5,000 USD from 10,000 USD because of the lack of attention paid to the productivity of the software tool. A standardized IT infrastructure provides higher computational quality at lower cost. In addition, professional developers with computational science backgrounds are the only group that has the sufficient computational knowledge and bookkeeping skills for software application design and the ability to apply technical concepts. Merging Computational Science and Drug Development Science for Drug Safety Evaluation can be evolved within a modern computer environment; and because Computational Technology grows quickly, designers would need an advanced vision for the future. A strong knowledge in computational science and bookkeeping helps developers use what is available and progress forward from it. This thesis explains a modern computational architecture for implementing Drug Safety Reporting Applications. This architecture uses advanced IT concepts to increase the quality of work on a large volume of data that may be dynamic rather than static and comes from distributed computer networks. This thesis aids in the study of Drug Safety in obtaining the best software solution advantages possible. Objectives: SAS is the software application that developers use to provide high-quality reporting applications for Drug Safety. The collection of concepts that work together is required in order to achieve a computer-based method for Drug Safety evaluation. This paper proposes an infrastructure that uses the optimal solutions for this process. The abstract is intended to use the information 3
  4. 4. gathered to develop the system as a whole. It can accept data from both papers and electronic databases. Databases such as Oracle and Microsoft Access can be considered as backbones of the system. All computational terminologies that are recommended for this proposed infrastructure must be explained. For example; in some cases, data mining might be used to find a pattern and help to estimate descriptions of a data field. This ability of the proposed architecture in data mining should be illustrated. In this thesis, entity relational database modeling as well as data accessing, formatting, classification, and scripting is illustrated best by giving examples and working on creating descriptions of longitudinal data. Focusing on code consistency with all essential attributes and their effeciencies in the proposed infrastructure is included. Proposed software should support maintainablity; but focusing techniques on the data error concept is not within the scope of this paper. In order to achieve the best result, we need to use all available pieces of accurate data and perform the correct programming processing. These data can come from health care providers, consumers, literature, and other relevant databases. It is important to find the ordinary errors during scripting. Due to a missing part or step in coding for data processing (extracting and retrieving, manipulating data, or making narrative data from queries and assessing them) a large difference on the expected result and the accuracy of the reports may occur. Technical Specifications:  Data accessing: SAS data might come from other application platforms. These data might be formatted or non-formatted and therefore filed differently in varied environments. Accessing these data from several servers is done in the following steps. 4
  5. 5. I. Use the SAS ODBC driver to access by communicating with either local or remote SAS servers using TCP/IP protocol. Data can come from a local, remote, or any type of database server. It can be in any format including raw data or any vendor’s software data set. The ability to read raw data in any format, from any kind of file (including variable-length records, binary files, and free- formatted data--even files with messy or missing data) is required. II. Combine and manipulate these data on the client side, analyze the out-coming data and distribute it by making an execute file from the server to multiple client. The following are examples of possible case in data accessing: a. Data may exist on a mainframe computer or pc network. These data might join to an existing data set, create new variables (columns), and produce tables and interactive graphs. b. Raw data may exist on a UNIX server. Compute other data values from them, form statistics, and create an HTML report to use in web application systems, then store on a web server in intranet /internet platform. c. Access may be needed to BMDP, SPSS, and OSIRIS files directly as well as files such as Microsoft Excel spreadsheet, Microsoft Access table, dBase, ORACLE forms and any other DBMS. In addition, both relational and non-relational databases, including any PC data source can be considered as a data file. d. The relational databases in DB2 format exists in OS/390, VM, DB2, UNIX or PC environment. e. ODBC, Informix, ORACLE and OLE DB data may come from any platform. They may also come from SYBAS 5
  6. 6. machine or Teradata, MSSQL Server or any other machine. f. Baan or PeopleSoft files may come from ERP systems such as R/3 and SAP BW. Thus global data may be received and processed for creating an enterprise report.  Data Management: After accessing data, it is necessary to manage them, by creating, retrieving, and updating database information. This may require advanced programming skills because the information comes from a wide range of data sources and it is necessary to merge them together and then evaluate. Data with the same attributes need generic formatting that requires a manipulation process. Evaluating values of data requires computational operations that may be defined as functions. Saved sets of data in the data forms may have been extracted from subsets data. Complex conditional processes during data manipulation may be needed when a wide range of data source is merged.  After gathering and shaping information we need Statistical Analyzing to produce reports. These reports are customized and they may be complex. Tables, frequency counts, and cross-tabulation tables may be produced to create a variety of charts and plots. Also, the computation of a variety of descriptive statistics including linear regression analysis, standard deviation, correlations and other measures of association, as well as multi-way cross- tabulations and inferential statistics may be necessary.  These representations should be able to be reported to a wide variety of locations and platforms in order to suit client needs. Results may be required to be presented in many formats, such as an array of markup languages including HTML4 and XML, or formatted for a high-resolution printer such 6
  7. 7. PostScript/ PDF/ PCL files, RTF or even color graphs that can be made interactive using ActiveX controls or Java applets. System architecture modeling Reporting data by investigators. Clinical Trials Hospital Labs. Data Dictionary Clinical Studies Modification Archive Data, (MedDRA/PubMed) (Oracle) Verifications Post marketing DATA Ware House Adverse Event Reporting Individual clinical Data User trials Analyses (SAS) Information must be gathered by drug providers. These data come from clinical studies by the FDA and other professional investigators. Other information comes from medical records of patients who were treated by the specific drug. Usually, drug providers do a study of their product before moving onto the evaluation step. The first step is the collecting of data to generate reports such as the country of origin for patients receiving the drug, worldwide patient exposure, demographic characteristics, most commonly reported body-system reactions (ordered by gender and/or age of patients), and the summary of death or other 7
  8. 8. critical body reactions. Another resource is the company’s surveys on products completed by patients or clients who are volunteers in the U.S. or other countries. These surveys include match data from Med Watch Forms that the U.S. Department of Health and Human Services accepts as a voluntary reporting of adverse events and product problems. Also, these manufacturing companies may be able to receive FDA Reports generated on the basis of Med Watch Reports about this product. Furthermore, many of the surveys are answered by physicians and other doctors who have the EMR System and are able to answer detailed questions regarding medical conditions and other related medical issues. Any tool that is recommended here should be consistent with FDA Standards and the objectives that follow.  In any Adverse Event Reporting System, the Basic Calculation and Data Analysis have statistical bases on data sets that may frequently be ordered according to one or more variables coming from a variety of data sources. Thus an Adverse Events Reporting System can work on any possible platform. For example, if it uses E2B data element structures then it should be able of doing any possible interactive query or data flow transactions on shared data. SAS is compatible with all computer platforms. It works on any type of operating system. It supports data sharing concepts. It suports submission through the WEB or any other network that includes Oracle, Unix, NT servers or Mainframe machines. This means that any regardless of backbone, SAS can suport it.  Data sources may need to be summarized or checked before being reported. Scripting and programing concepts are one of the major necesities in development. SAS has a powerful scripting language that can do any required summarizing, verification and validation. 8
  9. 9.  In the pharmaceutical field and bio-informatics, SAS software is generally thought of for statistical analysis programming but is also a largely untapped resource for its other many features. It’s screen building and object oriented development abilities are needed to keep up with the latest Information and Technology advances.  SAS is a stand-alone system produced by SAS, Inc. and sold in the open market. It exceeds all technical objectives specified here. The FDA has proposed MedDRA as a standardized dictionary of medical terminology. MedDRA has been used internationally to discuss the regulation of medical products. MedDRA provides symptoms, signs, diseases, and diagnoses information. It also includes other information such as:  Names of investigations (e.g. liver function analyses, metabolism tests)  Sites (e.g. application site reactions, implant site reactions and injection site reactions)  Therapeutic indications  Surgical and medical procedures  Social and family history terms SAS and MedDRA are FDA standards. They have high standard designing; and assure that company builders continue looking to find weaknesses and improve their products. All their documentation and userinterfaces are user friendly. SAS and MedDRA are generic softwares and any specific needs such as security of data or reliability of operations can be negotiated in a service level agreement. EMR Database: These data come from hospital laboratories and clinical data entry systems. They are documented before and after verification. All documentations are electronic and all reporting submitted electronically. MeDRA 9
  10. 10. does encoding that is part of clinical data entry. All data entries are standard based approved by the FDA. Terminologies: A computerized Drug Safety Evaluations requires the following informatics terminologies:  Data classifications  Control Code  Formatting  Quality Control  Data Mining  Gathering information  Accessing and manipulating data  Scripting Each of these terminologies carries a process or methodology that will be discussed in the following. Data Classification: Any Structured Analysis of information needs classification. Data Classification is the first best-known task in data flow modeling. The data model of a Drug Adverse Event Reporting System is derived from conceptual information such as entities and their interrelationships. A mechanism serves as a store of all drug information which can link analysis, design, implementation and evolutions applied in most medical applications. This classification should be consistent and not clash. It is integrated in all parts that require maintainability. The outcome attributed to adverse events is the most important information that needs to be classified. The data classification for this attribute should be a standard classification that is matched by the FDA reporting program. 10
  11. 11. The FDA uses MedDRA as a part of the proposed rule for post-marketing reporting. MedDRA is the abbreviation for Medical Dictionary for Regulatory Activities and it is an international terminology designed to support the classification, retrieval, presentation, and communication of medical information throughout the medical product regulatory cycle. Originally, MedDRA was written in English and distributed in ASCII file format; but it is now available in several other languages such as Dutch, French, German, Italian, Portuguese, Spanish, and Japanese. This on-line dictionary is intended to become the global medical terminology standard for use by every bio-pharmaceutical company in the world and has the best-known classification with an integrated platform in updating that can be used by all standard systems. In the majority of homegrown medical applications, the patient medical recording systems use this classification and it is valid for all phases of drugs and subscribing Pharmaceutical companies. MedDRA works as a catalog of medical disorders. It has a hierarchical data structure that has five terms. Developing queries or retrieving information about medical diagnoses need hierarchical searching on these terms, and other queries might be selected by grouping them thusly. The next page picture shows the SOC view of Cardiac and Vascular investigations (excl enzyme test): 11
  12. 12. MedDRA classifications have an Object Oriented data structure as shown in the following screens. 12
  13. 13. 13
  14. 14. Each MedDRA has a unique code that can be use as a searching key. 14
  15. 15. A query makes a link between collected data and terms in MeDRA. A query can create a selection on a description of medical data. This selection requires searching and enters the term to be sought into the 'Search for Value' field. The query then selects one of the records returned and identifies information about patients. After that, codes in the database are ready for any statistical evaluation. The other advantages of using MedDRA are:  MedDRA is on-line (not requiring installation or periodic updates on the client system). The application has a standardized interface, is well supported, and requires little effort to interface with any client computing environment. A good designer can get the best advantage of this classified information by using it as a shared data set. Updating this shared information maintains all the related outcomes that have referenced this data set.  Informatics terminologies such as encoding are already included in MedDRA for its own data sets.  MedDRA includes high standards that can be updated with queries or importing data; however, it requires quality control because it can disrupt everything.  Current MedDRA Version has MediMiner for the managing and analysis of the coded data included all data mining. This unique tool allows analysis of the impact of recoding the data sets from one MedDRA version to another when MedDRA is a standalone product that has been used as an integral component of our range of coding tools. MedDRA classification can be browsed by a tree that can be collapsed and viewed at every level of detail for all occurrences in every possible search category such as legend, terms and coding. 15
  16. 16. Control Code: SAS and MedDRA both have code controlling utility to do the following:  Debugging system and maintenance ability in any branch of code to make a cross-reference listing showing all the program names that have been declared and used.  The analyzer discovers un-initialized variables, unreachable codes, uncalled functions and procedures as well as the number of times executed for each statement. MedDRA has MedMiner as its version control utility. During any updating in MedDRA MedDRA 3.1, MediMiner controls all changes by analyzing the coding sets. In MedDRA 4.1, it also impacts the recoding of data by identifying all codes that remain unchanged, and identifying those codes that may require recoding. It is also possible to identify the codes that no longer exist, those that have been changed in some way, and those that have a related change or where a multiracial (inherited from multiphase of original codes) change has had an impact. Primary and secondary changes are identified as well as changes in the current status of the code. SAS software includes Source Control Manager (SCM) utility as one of the options in Desktop selection of Solution menu. SAS->Solutions->Desktop->Development and Programming-> Source Code Manager 16
  17. 17. SCM includes a friendly GUI that has SAS file check-in/check-out capabilities. This GUI lists all libraries, data sets, catalogs, and catalog entries in a hierarchical order. SCM has flexible testing, revision control, and version labeling with an easy application distribution utility. By having a version label, it is easy to create a copy of an application and place it in other locations on the network. Also, SAS/CONNECT utility can place the application on other remote machines. Formatting: Usability of information is one the most important components of any application implementation. Usability requires readability and the readability of any data set is facilitated by standardized formatting. Each line represents many separate 17
  18. 18. pieces of information which are data values, and the formats determine how these values are displayed or used in calculations. These formats set the width of displayed values, the number of decimal points displayed, the handling of blanks, zeroes, and commas, as well as other details. SAS supports its own standard and user defined formatting. Standard formats might be use for numeric, character or picture data. Also, User can write or define custom-made formats in Data and Procedure steps. User defined formats are reusable and can be saved in format catalogs. If saved in a SAS Catalog they then remain there permanently. If saved in catalog WORK.FORMATS, they are there temporarily and retrievable only in the same SAS session or job in which they were created. Because catalogs are a type of SAS files that reside in a SAS data library, they work as an executable handling facility and intercept run- time error under undefined format. By this way, type-checking is supported and influences the readability of information. If the SAS system option NOFMTERR is in effect, SAS uses its own default formatting when it calls an undefined format so that in some cases we might ignore these errors and continue the executing. Quality Control: Delivering the correct result requires quality control. SAS recognizes common errors such as syntax, execution-time, data and semantic errors; however, users can check for common mistakes such as the following:  Check for syntax errors o statements ending with a semicolon o starting and ending quotation marks o keywords o Every DO and SELECT statement must be followed by an END statement  Check for execution errors: 18
  19. 19. o illegal mathematical operations o observations out of order for BY-group processing o Incorrect reference in an INFILE statement such as misspelling or otherwise incorrectly stating the external files are recognized. o A program may run, yet give an incorrect result. These errors are often detectable by checking self-consistency and should always be reported, certainly in the debugging stage, and often during production runs.  SAS usually executes the statements in a DATA step one by one, in the order they appear. After executing the DATA step, SAS moves to the next step and continues in the same fashion. It must be certain that all the SAS statements appear in order so that SAS can execute them properly.  Check input statements and data. SAS can detect data errors during the execution; but this won’t terminate the processing. After executing, it prints a note describing the error. In that note SAS lists the related values that are stored in the input buffer and the program data vector. o The corresponding values with actual variable values in INPUT statements must be checked. o Any corresponding arrangement such as formats, lists and columns for input statements must be checked too. Data mining: Data mining is a class of database applications that look for hidden patterns in a group of data. Statistical analysis is the data analyzing method that is matched with the nature of data mining. Statistical analysis might uncover the hidden pattern of data for a large volume of information coming from Adverse Events Reports or survey systems. A data mining process might combine variables that occur more than expected. By applying statistical options, an optimal guess can be made about the best match behavior that may have occurred frequently. 19
  20. 20. Data mining is a critical aspect of these reporting systems. Occasionally, the predictions may be even more important than detections in drug safety evaluation. In the United States, patients can file lawsuits against drug providers for severe adverse reactions. These legal actions often make American drug companies fearful to introduce drugs into the U.S. market. However, data mining on data from other parts of the world offers a way to move the drug safety process from a reactive process to a proactive posture in efforts. In effect, it would help drug providers to take a safer marketing strategy rather than take risks. Data mining on data from other parts of the world is also a way to move drug safety evaluation from detection to prediction If MedDRA System Organ Class terms are adopted as a class of events then one can select related data from patient records for that event and make it possible to discover statistical rules or patterns automatically from the data, later creating a hypothesis and runing tests on the patient record database to verify or refute it. Data mining can protect drug providers against lawsuit. This process uses data from other countries and clinical studies. SAS assists data analyzing in an instructional way, so that even people with no statistical knowledge are able to run the required processes on selected data sources (a basic option includes: counting missing and non-missing values, minimum, maximum, range, sum, mean, variance, standard deviation, standard error of the mean, coefficient of variance, skewness, kurtosis ). In addition, access to data sources can be secured to prevent unauthorized access. SAS also allows for the creating of different reports and presentations on results (including tabular tables, frequently reports with graphical presentations to visualize the results). SAS supports data mining for a large volume of statistical procedures (regression, association discovery, time series, and time series cross-Sectional 20
  21. 21. (panel) data analysis), whereas, data is usually analyzed by regression (one observation for each patient). Sometimes it is required to correlate with cross- sectional data such as geographic region, gender, smoking, alcohol use, and so on. Gathering information and documenting system specifications: The available information (such as the toxicological and pharmacokinetic profiles of the individual drug, the treatment indication or indications, the intended populations, etc.) might have been defined by relational databases. The backbone of this system might be SQL, Access or even Excel; but the data query may not be suited to the performance of detailed statistical analyses of data in this stage. It is then that SAS helps in statistical analysis. SAS has been interfaced with databases to allow large volumes of data to be retrieved efficiently for analysis. All engines can be assigned to a SAS library. This library is a place that saves all access to the stored files. These files might come from a variety of engines such as ODBC, SPSS, SYMBAS, REMOTE, META, MYSQL, ACEESS, ORACLE, DB2, MySQL, ACCESS, etc. For the processing of data, it is required to define all connections that might be created between the different sets of data records. The first link can illustrate correspondence of the MedDRA classifications to the patient records. In concentrating on the relevance of available data, medical information of patient works in tandem with MedDRA classifications to build queries and analysis information. As a part of application developing process, specifying the following information is required: 1. Source data: Miscellaneous data sources may exist and in order to get the correct results, the prescription drug information provided by drug firms should be truthful, balanced, and accurately communicated. The same applies to data coming from clinical and post-marketing trials, or spontaneous reports (submitted individually by doctors or patients). Dynamic data are 21
  22. 22. operational data from internal systems such as the homegrown applications of clinics or hospitals, the manual data coming from paper chart patient history, EMR (Electronic Medical Records), and Adverse Event Reporting (Med Watch). 2. Data Staging: This area includes the storage and processing for extracted data from the internal and external systems prior to loading in a SAS data bank. The following is a list of cases. • Information may be located in multiple SQL tables in a local computer or external servers. If it is required, one may make a connection to the database server and use the data dynamically. For example the Adverse Events Database has included side effects which are serious (such as death or risk of dying, hospitalization, disabilities, congenital anomaly or required intervention to prevent permanent impairment or damage). These data are required for generating some particular reports. • Part of the information is part of Aventis Reports or ClinTrace. Data from these two areas might work together to complete an assignment then create an executable program that makes a connection to the backbone database of these two licensed vendor applications and use the data. Note: Having a basic knowledge about these databases helps programmers to create standard codes. For example an Aventis or ClinTrace Case ID (Manufacturer Control #) is assigned on an “Episode” basis for each patient. Adverse Events (reporting side effects) are temporarily linked to the same episode and are entered in the same Case ID. For drugs that are given intermittently, additional episodes (Case ID) are created for events that occur after different treatment cycles. • Side effects are stored in Companies Core Safety Data Sheets. These sheets are for global labeling of reports and are based on the diagnoses which are in turn assessed by seriousness. All diagnoses reported from intensified monitoring (such as clinical trial or post-marketing surveillance study) are assessed as associated or not-associated with 22
  23. 23. the study medications. These data may be joined to MedDRA information to build a larger directory that is used in SQL scripts. • Drug providers use certain information, such as the cause of side effects as a result of internal or natural body process, in a causality algorithm for internal clinical interpretation or signal evaluation purposes. In some particular cases, this algorithm is required to be applied as a part of script logic in the SAS code. If a company has a computerized analyzing application, depending on their software, it is possible to execute a connection for using this application inside the SAS script code. • In data mining related by diagnoses, MedDRA information is required. It is recommended to use SAS scripting for creating a remote connection to read MedDRA ASCII file, importing data to the temporary created tables. These tables would be deleted at the end of scripting process. Note: All transactions such as queries, statistical analyses or visualizations coming from sources should be consistent. Sometimes these data are not enough to be consistent. In order to solve this problem, all “no match” data need appropriate transformations or conversion from their original form to the MedDRA representation. 3. Metadata: A term used to describe or specify the data. It is used to define all of the characteristics of data required to build databases and applications, and to support knowledge workers and information producers. This includes data element name, meaning, format, domain values, business integrity rules, relationships, owner, etc. For example the following classification shows the analogy of data concepts in MedDRA: 1. SOC MedDRA CODE Numeric MedDRA Term String 23
  24. 24. 2. HLGT MedDRA CODE Numeric MedDRA Term String 3. PT MedDRA CODE Numeric MedDRA Term String COSTART Symbol, AlphaNumeric WHO_ART Code, Numeric ICDS Code, Numeric PT ICD-10 Code Numeric HARTS Code, Numeric ICDS_CM Code, Numeric JART Code Numeric * SOC Code Numeric * SOC Name Numeric 4. LLT – Lowest Level Term MedDRA Code Numeric MedDRA Term String WHO_ART Code Numeric COSTART Symbol AlphaNumeric ICDS_CM Code Numeric CURRENCY Character/Boolean HARTS Code Numeric ICDS Code Numeric JART Code Numeric * Multi valued attribute Defining Metdata for the adverse event reporting data is also required. These data are: o Patient Identifier and patient information: age at time of event or date of birth, sex, weight, etc. o Outcomes attributed to adverse events such as death, life- threatening occurrences, hospitalization, initial or prolonged, disability, congenital anomaly, required intervention to prevent impairment/damage, other. o Date of event and report in mo/day/yr format. o Description of problem. o Relevant tests/laboratory data including dates. 24
  25. 25. o Other relevant history including preexisting medical condition (e.g. allergies, race, pregnancy, smoking or alcohol use, hepatic/renal dysfunction, etc.) Still most popular medical clinics use Paper Medical Records (PMRs) but many others have begun to use Electronic Medical Records (EMRs). No standard form has been yet defined for EMRs, but all provide the same information that requires Metadata definitions. These data are: o Patient primary reason for medical visit o History of onset of clinical signs and symptoms, o Current list of medications the patient is using o Relevant past medical history, including hospital admission, surgeries, and diagnosis o History of family disease, such as diabetes, cancer, heart disease, and medical illness o Social history: use of drugs, smoking, job stability, and housing, living condition, incarceration. o Review of systems: patient relocation of systems and current medical problems, such as trouble sleeping at night, panic episodes, and results of tests. o Physical examination: the clinician’s hands-on examination of patient, including head, eyes, ears, nose, throat, chest, and extremities o Labs includes blood glucose, cholesterol, and drug levels o Studies such as X-ray, MRI, CT, and EKG. o Progress notes such as record of temporal progression of signs and symptoms, labs and studies for the length of the study or admission 4. The entity-relationship model 25
  26. 26. The specification of required information for an adverse event serves as a starting point for constructing a conceptual schema (overall design of the database) for the suggested database. The identity set and attributes targeted here are drug and patient entity sets. These entity sets have a relationship that has attributes by itself. This relationship is a “many to many” relationship. Other relationships might be designed between subsets of an entity set. The relationship between drug entity sets and ingredient or side effect entity sets are examples of these relationships. Here, these relationships are “many to one” relationship. This method of designation helps in saving memory. In some other cases such as patient-drug relationship, the maximum participants are limited to two relations, which leave a designation in one general set. In the following diagram, small rectangles show the entity set; large rectangles specify attributes; diamonds represent relationship sets; lines link attributes to entity sets and entity sets to relationship sets; arrows indicate that an entity falls exclusively into another entity; double lines indicate many relationship sets; bold diamonds show “many to one” relationship sets, and rectangles with non-indexed information indicate information about a relationship set. 26
  27. 27. ID Reason date_of_event date_of_report therapy_start_date therapy_end_date diagnose information Lot_number Exp_Date 1. MedDRACode NDC Num adverse_desc route and dosage 1. ID 2. Name 3. Value 4. Unit Adverse reactions and side effects Patient -Drugs Drug- ingredie 1. ID relevant information : 1. id 2. First Name 10. allergies 2. generic name 3. Middle Name 11. smoking 3. trade name 4. Last Name 12. alcohol 4. the dosage range 5. Date Of Birth 13. pregnancies 6. Sex 14. dysfunction 5. metric unit 7. Weight 15. Lab results 6. category 8. race information 7. the form of 9. country Above E_R model is a sample of what can be considered; although the attributes can be designed with more details in mind. For example, ‘rout and dosage’ could be designed as a separate entity because it includes many optional attributes that may be concatenated together as a description data text. They 27 Occur
  28. 28. may also be saved seperatly in a data source. This designed E_R model gives substantial flexibility in the designing of the basic data base schema. Accessing and Manipulating Data: The first step in accessing and manipulating data is the DATA Step. The DATA Step is for accessing, reading and programming the data processing. As explained before, one of the strengths of SAS is the fast and easy access from many different sources. In addition to the programming components, SAS has many other features in the DATA Step Process that help to develop a standard application. SAS language has all the statements required for accomplishing typical data processing. Among these are the reading and adding of raw data files and SAS data sets and writing the results. Sub-setting data, combining multiple SAS files, creating SAS variables, recoding data values; and creating listing and summary reports that include advanced analyzing features such as web analytical solutions are also possible. Special focus should be placed on the management of SAS data set input and output, working with different data types, and the manipulation of data. It may also be necessary to control the SAS data set input and output, combine, summarize, and then process iteratively with programming to perform data manipulations and transformations Accessing data would be first needed here. Sometimes, the required data file will be saved in another server and location. With an ftp server running, SAS can make an ftp connection and use the external data source remotely without there remaining any copy of the downloaded data on the machine unless SAS writes it out. As an example, one can assume the data belongs to cps-users and is located at ~/halley/thesis/main.data. filename fromrcr ftp 'main.data' cd='halley/thesis' user='cps-user' host='cps.brockport.edu' 28 recfm=v prompt;
  29. 29. Many data might come as raw data. This raw data must be entered into a SAS data set. As an example, one of the clients might send a letter or a txt file that includes parts of the patient’s information. The following script shows how to input these data into a SAS data set. data PatientInfo; infile 'c:thesisdata1.txt' ; input PatientId $ 1-13 age 14-17 sex $ 18-23 weight 24-30 +2 country run; proc print data=PatientInfo; run; The SAS System 05:25 Thursday, December 15, 2005 5 PatientId age sex weight country Hzan0616341 30 1 200 11 Amir5666892 40 2 180 12 J675bhgfdql 56 2 . 45 -> Nmjhg567908 12 1 100 23 Iu6-567-567 99 1 170 01 ***A missing value for a numeric variable is presented by a period (.) Processing Examples: • To use external files, it is required to tell SAS where to find them. To do this, there are the following choices: 29
  30. 30. 1- Identify the file directly in the INFILE, FILE, or other SAS statement that uses the file. 2- Set up a fileref for the file by using the FILENAME statement, and then use the fileref in the INFILE, FILE, or other SAS statement. 3- Use operating environment commands to set up a fileref, and then use the fileref in the INFILE, FILE, or other SAS statement. Note: To use several files or members from the same directory, partitioned data sets (PDS), or MACLIB, use the FILENAME statement to create a fileref that will identify the name. The fileref can then be used in the INFILE statement and enclose the name of the file, PDS member, or MACLIB member in parentheses immediately after the fileref, as shown in the example below: /* filename data 'directory-or-PDS-or-MACLIB' */; /* data1.txt and data2.txt located in directory c:thesis */ filename data 'c:thesis'; data paitientdata1; infile data('data1.txt'); input PatientId $11. +2 age 2. +1 weight 3. +2 sex 1. +2 date1 mmddyy10. +2 date2 mmddyy10. + 2 country 12. ; run; data paitientdata2; infile data('data2.txt'); input PatientId $11. +2 age 2. +1 weight 3. +2 sex 1. +2 date1 mmddyy10. +2 date2 mmddyy10. + 2 country 12. ; run; • Also, from file menu, ADX can import data from a SAS data set or any of ACCESS data base, Excel spreadsheet, a dBase database, a delimited text file, and files with other common formats. This is helpful when one has saved information in a variety of formats. • In SAS one can gain access to data sources by defining ’libref’ and assigning accesses to them without copying them inside the SAS 30
  31. 31. environment. ‘libref’ makes a shortcut to the metadata on the SAS Metadata Server. Any metadata in the SAS metadata server can be read by a Meta. Meta is an engine that has options for controlling the outputs. Meta creates just the metadata in the repository and does not affect the data sources. If the table does not exist in the data source, the Meta engine creates the metadata based on the information specified in the application for the output table. When deleting a table, this option deletes the metadata from the repository but does not delete the table from the data source. Also, when deleting a table, this option deletes the table from the data source but does not delete the metadata from the repository. SAS Library includes Metadata objects that are defined by ‘libref.’ These objects define the engines that are used to process the data. This library has URI (Uniform Resource Identifier) architecture. To get access to a SAS Metadata Server, define the host address. If working in a TCP network, define the port number. If the protocol is not a com but a bridge, define a user-id and password otherwise it will not be possible to log into a SAS Metadata Server. In addition, any repository Metadata may be used by a repository-id or name. To access these tables, one can use SAS/Warehouse Administrator as a tool. In order to determine the metadata, it needs to identify and search the objects by their name, URL and other identifiers such as their ID. The following script displays this process. Ibname upcase metan liburi="SASLibrary?@name='oralib' " ipaddr=d6292.us.GCS.com Scripting: 31
  32. 32. SQL Scripting Goal is the driving of available data from any possible data source. Most vendor applications have SQL backbone so that with SQL scripting it is possible to perform queries on original or manipulated data (retrieving data from multiple tables; creating views, indexes, and tables; and updating or deleting values in existing tables and views as well as summarizing them). SQL scripting can happen in SAS or SQL environment. In the following example, the reduction of the earlier E_R schema ids is created from inside the SQL environment: /*------------------------------------------------------------------------------------------*/ /* create a higher-level entity set for drug information */ CREATE TABLE drug( id CHAR(12) NOT NULL, generic_name CHAR(25), trade_name CHAR(25), dosage INT, unit INT, category INT, FOREIGN KEY (category) REFERENCES drug_category(category_id) ON DELETE CASCADE, FOREIGN KEY (unit) REFERENCES unit(unit_id) ON DELETE CASCADE, PRIMARY KEY (id) ) ENGINE=INNODB; /* create the lower level entity sets for drug information */ CREATE TABLE ingredient ( id INT, drug_id CHAR(12), ingredient_name CHAR(25), ingredient_value INT, unit INT, INDEX drug_ind (drug_id), FOREIGN KEY (drug_id) REFERENCES drug(id) ON DELETE CASCADE, FOREIGN KEY (unit) REFERENCES unit(unit_id) ON DELETE CASCADE, ) ENGINE=INNODB; /* the side effects of each drug have description that should be compatible with MedDRAClassification */ CREATE TABLE sideeffects ( 32
  33. 33. MedDRACode INT, drug_id CHAR(12), INDEX drug_ind (drug_id), FOREIGN KEY (drug_id) REFERENCES drug(id) ON DELETE CASCADE ) ENGINE=INNODB; /* create a general entity set for patient information; This entity set can be expanded by other entity sub sets such as patient laboratory information or more information about the history of that patient */ CREATE TABLE paitient( id CHAR(12) NOT NULL, first_name CHAR(25), middle_name CHAR(25), last_name CHAR(25), DateOfBirth DATE, Sex INT, weight INT, race INT, country INT, FOREIGN KEY (race) REFERENCES drug(race_id) ON DELETE CASCADE, FOREIGN KEY (country) REFERENCES drug(country_id) ON DELETE CASCADE, PRIMARY KEY (id) ) ENGINE=INNODB; /* some revalent paitient information might come from following sugested sub entity set */ CREATE TABLE Relevant_Patients_Info ( Info_id INT NOT NULL AUTO_INCREMENT, paitient_id CHAR(25) NOT NULL, allergies_id INT, races_id INT, Num_pregnancies INT, smoking INT, alcohol_use INT, hepatic_id INT, dysfunctions_id INT, INDEX (allergies_id), FOREIGN KEY (allergies_id) REFERENCES allergies(allergies_id) ON UPDATE CASCADE ON DELETE RESTRICT, INDEX (races_id), FOREIGN KEY (races_id) REFERENCES races(races_id) ON UPDATE CASCADE ON DELETE RESTRICT, INDEX (hepatic_id), FOREIGN KEY (hepatic_id) REFERENCES hepatic(hepatic_id) ON UPDATE CASCADE ON DELETE RESTRICT, INDEX (dysfunctions_id), FOREIGN KEY (dysfunctions_id) REFERENCES dysfunctions(dysfunctions_id) ON UPDATE CASCADE ON DELETE RESTRICT, INDEX (paitient_id), FOREIGN KEY (paitient_id) REFERENCES paitient(id) ON UPDATE CASCADE ON DELETE RESTRICT, 33
  34. 34. PRIMARY KEY(Info_id) ) ENGINE=INNODB; /* transforming to a tabular form of this E_R model includes aggration is streightforward. Paitient-Drug relationship includes a column for each attribute in the primary key of the entity set for this relationship (any oconcomitant medical products that paitient uses and therapy dates might come from related tables in the drug id and paitient id. Also, any available adverse event information that shows the problem of using that drugshould be included.) */ CREATE TABLE Patients_drugs ( Info_id INT NOT NULL AUTO_INCREMENT, paitient_id CHAR(25) NOT NULL, drug_id CHAR(12) NOT NULL, therapy_start_date DATE, therapy_end_date DATE, MedDRACode_DiagnoseForUse INT, /* 1 == yes, 2==no, 3==doesn’t apply */ /* Event abated after use stopped or dose reduced */ Quest1 INT, /* event reappeared after reintroduction */ Quest2 INT, Lot_number INT, Exp_Date DATE, NDCno INT, reason INT NOT NULL, date_of_event DATE, date_of_report DATE, adverse_desc TEXT, ----- INDEX (paitient_id), FOREIGN KEY (paitient_id) REFERENCES paitient(id) ON UPDATE CASCADE ON DELETE RESTRICT, INDEX (drug_id), FOREIGN KEY (drug_id) REFERENCES drug(id) ON UPDATE CASCADE ON DELETE RESTRICT, PRIMARY KEY(Info_id) ) ENGINE=INNODB; SQL scripting is required to generate reports on summary statistics. Macro Language provides a facility that allows writing SQL procedure inside the SAS environment. Therefore, SQL scripting extends SAS coding to the retrieval and combination of data from tables or views. New ones can be created along with 34
  35. 35. indexes, and data values in PROC SQL tables can be updated. It is also possible to update and retrieve data from Database Management System tables or modify a PROC SQL table by adding, modifying, or dropping columns. Example: Assume the Adverse Events Information from clinical studies, post- marketing trials, spontaneous reports, and miscellaneous sources (including independent drug identification numbers and retrospective data collection) are saved in the above SQL tables. The following script generates a report that shows Country of Origin for Patients receiving a drug in a post-marketing setting. proc sql; /* It extracts and manipulates grouped and ordered data from patient records to create a new temporary view table that includes only patient populations in each country. Country field is defined as an id number; to represent it by country name, it joins to the columns from countries table. After process is done, the temporary view table is dropped*/ create view temp as select country, count(country) as count, calculated Count/Subtotal as Percent format=percent8.2 from paitient, (select count(*) as Subtotal from paitient) as survey2 group by country order by count; quit; proc sql; /* extracts required data from created temporary view table and then drop it */ title1 'Country or Origin for Patients Receiving the suspected drug in a Postmarketing Setting'; select c.countryname,t.count as cc,"(", t.Percent ,")" from countries c, temp t where c.ipcode = t.country; quit; proc sql; drop view temp; quit; 35
  36. 36. Country or Origin for Patients Receiving the suspected drug in a Postmarketing Setting 22:04 Monday, January 16, 2006 CountryName Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Greece 1 (0.1%) Uruguay 2 (0.2%) Taiwan 2 (0.2%) French Polynesia 2 (0.2%) Peru 2 (0.2%) Korea 2 (0.2%) South Africa 3 (0.2%) Portugal 3 (0.2%) Turkey 4 (0.3%) Hungary 4 (0.3%) Austria 4 (0.3%) New Zealand 7 (0.5%) Brazil 7 (0.5%) Norway 10 (0.8%) Israel 11 (0.8%) Chile 15 (1.1%) Netherlands 26 (2.0%) Italy 39 (3.0%) Spain 38 (2.9%) Belgium 38 (2.9%) United States 42 (3.2%) Finland 44 (3.4%) Germany 50 (3.8%) Sweden 69 (5.3%) Denmark 91 (7.0%) Canada 97 (7.4%) Australia 107 (8.2%) Great Britain 271 (20.8%) France 313 (24.0%) The patient exposure to the drug can be calculated and presented in different ways. Although available exposure data are provided for a period of time, the primary focus of a submitted report may be the number of exposures and cases that occurred in a specific period of time. In the following report, global patient exposures from 1989 to 2004 are provided: proc sql; create view temp1 as select region, count(region) as SachetSales from paitient group by region order by SachetSales; quit; 36
  37. 37. proc sql; create view temp2 as select region, count(region) as Exposures from paitient, where paitient_Id in (select paitient_Id from Patients_drugs where substr(therapy_start_date,7,4) > '1983' && substr(therapy_end_date,7,4) < '2001') group by region order by Exposures; quit; proc sql; title1 'Wor ldwide Patient Exposure to the suspected drug 1989 to 1994'; select c.region,t1.SachetSales , t2.Exposures from countries c, temp1 t1, temp2 t2 where c.ipcode = t1.region and c.ipcode = t2.region ; quit; proc sql; select sum(t1.SachetSales) as SumSachetSales, sum(t2.Exposures) as SumExposures from temp1 t1, temp2 t2 quit; proc sql; drop view temp1, temp2; quit; Worldwide Patient Exposure to the suspected drug 1989 to 1994 23 20:55 Saturday, January 21, 2006 Region SachetSales Exposures ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ƒƒƒƒƒ Europe 230,649,500 1,895,749 Australia 5,292,542 43,500 Korea 3,067,300 25,211 Canada 1,497,100 12,305 Rest of World 2,405,064 19,768 SumSachet SumExposures ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ƒƒƒƒ 242,911,506 1,996,533 Inside the SQL scripting, one may occasionally work with data that are imported from the MedDRA application. These data may have already existed in a machine and it is not required to make access to the MedDRA environment a 37
  38. 38. second time. One can use the SAS utility to convert data from one form to another or copy between machines. A free trial of MedDRA is available on the MSSO website. This contains a copy sample of MedDRA data which are saved in an Access data base. It could also be imported to an Excel file if needed. If the data set is standard and completed it would then be better to use it as a shared data source. This shared data source may be stored as a Relational Database System (RDBMS), an Excel spreadsheet, or even as data stored on a flat file. If it is stored in an external machine then it becomes an external data source and a SAS connection is required for access. The following SAS script retrieves MedDRA Classification from a data source. It imports data from an external file (a spreadsheet) to a SAS table. This code was generated and saved during the wizard importing process. Saving this type of script helps to prevent redoing the work when the information is needed again. PROC IMPORT OUT= WORK.MEDDRAInfo DATAFILE= "C:thesisCTCAEv3.xls" DBMS=EXCEL REPLACE; SHEET="'CTCAE v3#0 MedDRA Codes$'"; GETNAMES=YES; MIXED=NO; SCANTEXT=YES; USEDATE=YES; SCANTIME=YES; RUN; The following script works as well: Filename xclfil 'C:thesisCTCAEv3.xls’; proc import datafile=xclfil out= WORK.MEDDRAInfo dbms=excel97 replace; getnames= yes ; 38
  39. 39. The above script retrieves MedDRA Classification from a data source. Often these data may not represent all MedDRA data. Usually, only a subset of these data is required and is stored in an external file. Assume MedDRAClassifications.xls includes only the MedDRA Classifications Data. To generate reports related to side effects, importing this file is enough to retrieve the appropriate symptoms information or signs listed by outcomes. PROC IMPORT DBMS=EXCEL OUT= work.MedDRA DATAFILE="c:thesisMedDRAClassifications.xls"REPLACE; Run; infile ' c:thesisMedDRAClassifications.csv' delimiter=',' dsd; proc print data=MedDRA; run; The SAS System 05:25 Thursday, December 15, 2005 1 Obs MedDRATermLevel1 MedDRATermLevel2 1 Nervous system disorders 2 Balance disorder 3 Convulsion 4 Lethargy 5 Optic neuritis 6 Paraesthesia 7 Speech disorder 8 Tunnel vision 9 Visual field defect 10 11 Eye disorders 12 Astigmatism 13 Blindness ……… ……… … . * Sometimes the information that comes from a Report Adverse Event, clinical trials or any other post-marketing or Pharmacovigilance Application has a provisional order number that is assigned to outcome data which is cannot be correctly mapped to MedDRA. These order numbers alone can be used when electronic reports or data are submitted and automatically converted to the MedDRA codes. 39
  40. 40. From the parameter list created, values can be individually highlighted and chosen for processing. These required parameter values may be retrieved from tables that have been created by scripts such as following: proc sql; create table reasonlist1 ( Description char(60)); insert into reasonlist1 values('Patient Died') values('Life threatening illness') values('Required emergency room/doctor visit') values('Required hospitalization') values('Resulted in permanent disability') values('Resulted in prolongation of hospitalization') values('others'); The ordering of the above parameter values is important for selecting the rows by their Order Number and the description of these values must be the same as those found on the FDA forms. The following script creates a parameter table for the abbreviations used by Drug Safety Reporting. The ordering and description of these abbreviations is also consistent with FDA standards. proc sql; create table abbreviations ( abb char(5), Description char(60)); insert into abbreviations values( 'ADR','adverse drug reaction') values( 'AE','adverse event') values( 'AERS','Adverse Event Reporting System ') values( 'bid','twice daily') values( 'CI','confidence interval') values( 'CIOMS','Council for International Organizations of Medical Sciences') values( 'COSTAR','Coding Symbols for Thesaurus of Adverse Reaction TermsT') values( 'CSDS','Core Safety Data Sheet') values( 'CV','coefficient of variation') values( 'FDA','Food and Drug Administration') values( 'GABA','Gamma amino butyric acid') values( 'HARTS','') values( 'IBD','International Birth Date' ) values( 'ICD9-1','International Classification of Diseases, 9th and 10th 0') values( 'ICD9C','MEditions/Revisions') values( 'ICH','International Classification of Diseases, Ninth Revision, Clinical MedDRAModification') 40
  41. 41. values( 'NDA','International Conference on Harmonisation ') values( 'PSUR','Medical Dictionary for Regulatory Activities') values( 'qd','New Drug Application') values( 'qid','Periodic Safety Update Report') values( 'SAE','once daily') values( 'SD','four times daily') values( 'SE','serious adverse event') values( 'US','standard deviation') values( 'WHO-AR','standard error T'); quit; Formatting may be used for other parameter values. The ATTRIB Statement permanently associates a format with a variable. SAS uses the format to write the values of the variables specified. attrib sales1-sales3 format=comma10.2; Due to the permanent association of the ATTRIB Statement in the above command, any subsequent DATA Step or PROC Step will use COMMA10.2 format to write the values of sales1, sales2, and sales3. In addition to the default formats that are supplied by Base SAS Software, one can create custom-made formats by the Format Procedure. The following format procedure is used to define the Static Parameter Values that may be required. It expresses weights; and measures using USP (United States Pharmacopeia) standard abbreviations for dosage units. Proc format; value $dosage_units ‘1’ = ‘m’ ‘2’ = ‘kg’ ‘3’ = ‘g’ ‘4’ = ‘m’ ’5’ = ‘mcg’ ‘6’ = ‘L’ ‘7’ = ‘mL’ ’8’ = ‘mEq’ ’9’ = ‘mmol’ ‘10’ = ‘ %’ run; *see legend below for definitions 41
  42. 42. (1) m (lower case) = meter (2) kg = kilogram (3) g = gram (4) mg = milligram (5) mcg = microgram (do not use the Greek letter mu which has been misread as mg) (6) L (upper case) = liter (7) mL (lower/upper case) = milliliter (do not use cc which has been misread as U or the number 4) (8) mEq = milliequivalent (9) mmol = millimole It can also be used to define a format variable for the drug in question (see procedure below): proc format; value $dosage_form ‘1’ = ‘capsule’ ‘2’ = ‘cream’ ‘3’ = ‘ear drop’ ‘4’ = ‘eye drop’ ‘5’ = ‘inhaler’ ‘6’ = ‘injection’ ‘7’ = ‘oral solution’ ‘8’ = ‘solution’ ‘9’ = ‘suspension pediatric drop’ ‘10’ = ‘syrup’ ‘11’ = ‘tablet’ ‘12’ = ‘chewable tablet’ ‘13’ = ‘other’ run; Time durations, age and formats are also available: proc format; value $time_duration_form ‘1’ = ‘hour’ ‘2’ = ‘day’ 42
  43. 43. ‘3’ = ‘week’ ‘4’ = ‘month’ ‘5’ = ‘year’ run; proc format; value $age_range _form ‘1’ = ‘children’ ‘2’ = ‘adult’ run; proc format value $eating-format ‘1’ = ‘with meal’ ‘2’ = ‘without meal’ ‘3’ = ‘before meal’ ‘4’ = ‘after meal’ ‘5’ = ‘with a glass of water’ ‘5’ = ‘other’ run; proc format value $time-format ‘1’ = ‘morning’ ‘2’ = ‘noon’ ‘3’ = ‘after noon’ ‘4’ = ‘evening’ ‘5’ = ‘midnight’ run; Other values are a combination of the above defined formats. For example, drug labels may read: “for adults, every morning, 2 tablets, 2 hour before meals, with a glass of water” or “for children, under 8 years of age, ½ a tablet before meals, with a glass of water….” In a database, grouping processes may be based on the “Sex/Gender” field where the values of “Male” “Female” and “unknown” can define minor groupings. These values can be stored as Numeric variables (1, 2, and 3). The ordering of numeric levels in relation to classification variables must be done with care. If in a statistical report, the data for female patients is required to appear after the data for males, the “Sex/Gender” field would use “2” for females and “1” for males. The following SAS script describes this formatting. 43
  44. 44. proc format library=proclib; value $sex '1'='male’ '2'='female' '3'='unknown' picture pop low-high='000,000,000' run; Formatting has other usages in scripting. Many of the data values must be defined by format. In SAS one can use this format with any of the following: 1. PUT, PUTC, or PUTN functions 2. %SYSFUNC macro function 3. FORMAT/ATTRIB statement in a DATA step or a PROC step num=15; char=put(num,hex2.); population=1145.32; put population 10.2; result: 1,145.32 Also one can use a macro function to define a user defined function. This function applies the defined format to the result of the function outside a DATA step. %macro tst(amount); %put %sysfunc(putn(&amount,dollar10.2)); %mend tst; %tst (1154.23); Usually Patient records are the type of data that can come from an Open Database Connectivity (ODBC). It is very possible that these data have existed as a backbone of a medical client-server application. In this case, access to data via ODBC is required. The module "SAS/Access for ODBC" must be installed on the computer. Configuring the database by referring to the DNS (Data Source Name) and how it is accessed is can also be required. Even parameter values 44
  45. 45. can come from an ODBS. These data may have dynamic data values that get up-dated by end-users through the web. Normally, these applications have administration parts that allow the end-user to do parameter updating. Example: The following script shows how one can use a part of data that is stored in another vendor's Database Management System (DBMS) files. This data then goes into the SAS data set. In the following script a ‘libref’ is declared and points to a library containing Oracle data. SAS reads data from an Oracle file into a SAS data set: libname dblib oracle user=halley password=halley path='hrdept_002'; data paitient.big; set dblib.paitient; run; Memory allocation is the most important concept in creating or extending a data library. SAS allows for the request of space as needed. For optimizing system performance and allocating space appropriately, one can pre-allocate the most space that that may be needed. These methods are used more often when multivolume access to SAS data libraries is required. The above data statement may then change to: /* Know this is a big data set. */ data paitient.big (alq=100000 deq=5000); As is explained earlier, data can come from an external data file. Additionally, one can connect to a data file and work on it. In the following script, we can connect to Z/OS and UNIX server to use DB2 and Oracle data: /*************************************/ /* connect to z/OS */ /*************************************/ 45
  46. 46. options comamid=tcp; filename rlink '!sasrootconnectsaslinktcptso.scr'; signon os390host; /*************************************/ /* download DB2 data views using */ /* SAS/ACCESS engine */ /*************************************/ rsubmit os390host; libname db db2; proc download data=db.paitient out=db2dat; run; endrsubmit; /*************************************/ /* connect to UNIX */ /*************************************/ options remote=hrunix comamid=tcp; filename rlink '!sasrootconnectsaslinktcpunix.scr'; signon; /*************************************/ /* download Oracle data using */ /* SAS/ACCESS engine */ /*************************************/ rsubmit hrunix; libname oracle user=hzan password=halley; proc download data=oracle.paitient out=oracdat; run; endrsubmit; /*************************************/ /* sign off both links */ /*************************************/ signoff hrunix; signoff os390host cscript= '!sasrootconnectsaslinktcptso.scr'; /*************************************/ /* union data into SAS view */ /*************************************/ proc sql; create view temp_joindata as (select gender ,country, count(*) into population from db2dat group by gender,country ;) union (select gender,country, count(*) into population 46
  47. 47. from oracdat group by gender,country;) union (select gender,country, count(*) into population from paitient1 group by gender,country; ) proc sql; create view jointdata select temp_joindata.gender, temp_joindata. population, countries.name from temp_joindata, countries where countries.codeId = temp_joindata.country order by gender, countries.name group by gender, countries.name options fmtsearch=(proclib); /* The NOWD option runs the REPORT procedure without the REPORT window and sends its output to the open output destination(s).*/ proc report data=jointdata nowd; column gender country population; format gender $SEX. Country & $50. Population pop; title ‘Country or Origin for Patients Receiving the drug in Post marketing’; run; Country or Origin for Patients Receiving this drug in Post marketing for 04JAN06 Gender country Population Female Algeria 743,453 Male 235,984 Unkown 167 Female Denmark 423,457,698 Male 546,876,345 Unkown 897 Female Spain 456,9812,564 Male 400,987,564 Unkown 234 Female United Kingdom 876,234,123 Male 564,234,876 Unkown Conclusions: This thesis proposes ways on how to improve programming practices for Standardizing Drug Safety Reporting Systems. The quality of a Drug Safety Reporting Application depends on the system architecture, methodologies, and 47
  48. 48. modeling used by the programmer. The degree to which an implementation is standardized is in direct proportion to the correctness of methods in accessing, gathering and manipulating the data, its classifications, control code, quality control, formatting, statistical analyzing, and mining thereof. Classification terms should follow a hierarchical structure that is consistent with FDA standards and MedDRA. Using the control code with MedMinder and the SCM is also important. Both this and quality control should not be overlooked by programmers. Formatting of data must be done properly and again, consistent with FDA standards. Statistical analyzing and data mining in these types of applications must also be done correctly as it has a direct affect on the results. Ultimately, gathering data and its access should be handled dynamically and manual accessing should not be considered. Above all, details such as size of data in the data accessing stage should be carefully protected. As to the professional performing in the system, an advanced background in computational, mathematical, and programming methods is obligatory for accurately applying these terminologies. SAS programming, knowledge of Object Oriented programming data structures, data base modeling and SQL are all necessary skills for implementing a Standard Drug Safety Reporting System. Knowledge of statistical modeling is particularly desirable in data mining applications. Finally, a graduated computational science major or a professional software designer can make the application work more dynamically and accurately with good scripting skills. The workbench of Drug Safety Reporting Systems is made up of SAS, and MedDRA applications. SAS supports an advanced data accessing technology; and MedDRA classification matches the metadata required for designing this application. These existing components improve the reliability of design, and SQL scripting expands it. 48
  49. 49. References  SAS Publishing, the Analyst Application, Second Edition (July 2002)  Adriaans, P., and D.Zantings.1996. Data Mining. Edinburg Gate, England: Addison Wesley Longman.  Hand, D.J. 1997. Construction and Assessment of Classification Rules. New York: John Wiley & Sons, Inc  Berry, M.J.A., and G. Linoff. 1996. Data Mining Techniques for Marketing, Sales, and Customer Support. New York: John Wiley & Sons, Inc  Bergeron, Bryan P. (2003). Prentice Hall Professional Technical Reference. Bioformatics  Computing. New Jersey: Pearson Education, Inc.  Pharmacoepidemiology and Drug Safety, Vol. 1 [1992], Vol. 2 [1993], Vol. 6 [1997]) & Vol. 7 [1998])  Agresti, A. (1996) Introduction to categorical Data Analysis, Wiley, NY  Collet, D. (1994) Modeling Survival Data in Medical researches, CRC/Chapman & Hall, London  Benichou C., (ed) Adverse Reactions: A practical Guide to Diagnosis and Management (Wiley & Sons, 1994)  Fuchi, K. (1981) “Aiming for knowledge information processing system.” Processing of international conference on fifth generation computing systems, Japan Information Processing Development center, Tokyo republished (1982) by North-Holland Publishing, Amsterdam  SAS online documents http://www.sas.com/service/library/onlinedoc  CDER (http://www.fda.gov/cder/handbook/index.htm)  MedWatch http://www.fda.gov/medwatch/getforms.htm 49
  50. 50.  MedDRA http://www.meddrahelp.com/ 50

×