SlideShare a Scribd company logo
1 of 50
The New York State University at Brockport
             Department of Computational Science




Standardization of “Drug Safety” Reporting Applications


                        Halley M. Zand

                         Winter, 2005




                Thesis advisor: Dr. Robert Tuzun




                                                          1
Abstract:

The purpose of this thesis is the development of an application process for
preparing reports on drug safety. The FDA is responsible for protecting the public
health by assuring the safety and security of human and veterinary drugs.
Annually, companies who provide medications are required to generate reports
that assure the FDA of the drug’s safety.


This thesis proposes an Information Technology infrastructure model that
provides drug providers IT organization with a strategic perspective on how to
computerize their Drug Safety reporting activity.        It introduces software
development concepts, methods, techniques and tools for collecting data from
multiple platforms and generates reports from them by scripting queries.


Introduction:


According to Guidance Documents for Drug Evaluations and Research from the
U.S. Food and Drug Administration all prescription drugs, both new and generic
need to be approved by the FDA. To obtain these approvals, drug providers are
required to generate annual reports on product safety and attach them to their
application letter. Also, any person can report to the FDA a reaction or problem
with a drug. The FDA reviews applications and all reported clinical outcomes to
see if the reported events happened because of other reasons or use the
suspected drug.


Manually reporting is not practical because of the large volume of data, and the
differing platforms and formats in which they are stored. Unfortunately, tools and
standards are often poorly used due to lack of Database Application Modeling,
Programming and Software Engineering skills. User applications are often
cobbled together with little more efficiency than manual processing, and tools for
automation and large scale data processing are not utilized.



                                                                                2
The hiring of qualified staff and carefully selecting software increases the quality
and reduces costs. A two-hour job may take a week due to poor technical skills,
and the cost of software licensing may increase by as much as 5,000 USD from
10,000 USD because of the lack of attention paid to the productivity of the
software tool. A standardized IT infrastructure provides higher computational
quality at lower cost. In addition, professional developers with computational
science backgrounds are the only group that has the sufficient computational
knowledge and bookkeeping skills for software application design and the ability
to apply technical concepts.


Merging Computational Science and Drug Development Science for Drug Safety
Evaluation can be evolved within a modern computer environment; and because
Computational Technology grows quickly, designers would need an advanced
vision for the future. A strong knowledge in computational science and
bookkeeping helps developers use what is available and progress forward from
it.


This thesis explains a modern computational architecture for implementing Drug
Safety Reporting Applications. This architecture uses advanced IT concepts to
increase the quality of work on a large volume of data that may be dynamic
rather than static and comes from distributed computer networks. This thesis
aids in the study of Drug Safety in obtaining the best software solution
advantages possible.

Objectives:


SAS is the software application that developers use to provide high-quality
reporting applications for Drug Safety. The collection of concepts that work
together is required in order to achieve a computer-based method for Drug
Safety evaluation. This paper proposes an infrastructure that uses the optimal
solutions for this process. The abstract is intended to use the information



                                                                                  3
gathered to develop the system as a whole. It can accept data from both papers
and electronic databases. Databases such as Oracle and Microsoft Access can
be considered as backbones of the system. All computational terminologies that
are recommended for this proposed infrastructure must be explained. For
example; in some cases, data mining might be used to find a pattern and help to
estimate descriptions of a data field. This ability of the proposed architecture in
data mining should be illustrated.


In this thesis, entity relational database modeling as well as data accessing,
formatting, classification, and scripting is illustrated best by giving examples and
working on creating descriptions of longitudinal data. Focusing on code
consistency with all essential attributes and their effeciencies in the proposed
infrastructure is included. Proposed software should support maintainablity; but
focusing techniques on the data error concept is not within the scope of this
paper. In order to achieve the best result, we need to use all available pieces of
accurate data and perform the correct programming processing. These data can
come from health care providers, consumers, literature, and other relevant
databases. It is important to find the ordinary errors during scripting.   Due to a
missing part or step in coding for data processing (extracting and retrieving,
manipulating data, or making narrative data from queries and assessing them) a
large difference on the expected result and the accuracy of the reports may
occur.


Technical Specifications:



 Data accessing:


SAS data might come from other application platforms. These data might be
formatted or non-formatted and therefore filed differently in varied environments.
Accessing these data from several servers is done in the following steps.



                                                                                  4
I. Use the SAS ODBC driver to access by communicating with
         either local or remote SAS servers using TCP/IP protocol. Data
         can come from a local, remote, or any type of database server. It
         can be in any format including raw data or any vendor’s software
         data set. The ability to read raw data in any format, from any kind
         of file (including variable-length records, binary files, and free-
         formatted data--even files with messy or missing data) is required.
      II. Combine and manipulate these data on the client side, analyze
         the out-coming data and distribute it by making an execute file
         from the server to multiple client.
The following are examples of possible case in data accessing:


              a. Data may exist on a mainframe computer or pc network.
                  These data might join to an existing data set, create new
                  variables (columns), and produce tables and interactive
                  graphs.
              b. Raw data may exist on a UNIX server. Compute other
                  data values from them, form statistics, and create an
                  HTML report to use in web application systems, then
                  store on a web server in intranet /internet platform.
              c. Access may be needed to BMDP, SPSS, and OSIRIS
                  files directly as well as files such as Microsoft Excel
                  spreadsheet, Microsoft Access table, dBase, ORACLE
                  forms and any other DBMS. In addition, both relational
                  and non-relational databases, including any PC data
                  source can be considered as a data file.
              d. The relational databases in DB2 format exists in OS/390,
                  VM, DB2, UNIX or PC environment.
              e. ODBC, Informix, ORACLE and OLE DB data may come
                  from any platform. They may also come from SYBAS




                                                                          5
machine or Teradata, MSSQL Server or any other
                       machine.
                    f. Baan or PeopleSoft files may come from ERP systems
                       such as R/3 and SAP BW. Thus global data may be
                       received and processed for creating an enterprise report.



 Data Management: After accessing data, it is necessary to manage them, by
   creating, retrieving, and updating database information. This may require
   advanced programming skills because the information comes from a wide
   range of data sources and it is necessary to merge them together and then
   evaluate.    Data with the same attributes need generic formatting that
   requires a manipulation process. Evaluating values of data requires
   computational operations that may be defined as functions. Saved sets of
   data in the data forms may have been extracted from subsets data. Complex
   conditional processes during data manipulation may be needed when a wide
   range of data source is merged.


 After gathering and shaping information we need Statistical Analyzing to
   produce reports. These reports are customized and they may be complex.
   Tables, frequency counts, and cross-tabulation tables may be produced to
   create a variety of charts and plots. Also, the computation of a variety of
   descriptive statistics including linear regression analysis, standard deviation,
   correlations and other measures of association, as well as multi-way cross-
   tabulations and inferential statistics may be necessary.




 These representations should be able to be reported to a wide variety of
   locations and platforms in order to suit client needs. Results may be required
   to be presented in many formats, such as an array of markup languages
   including HTML4 and XML, or formatted for a high-resolution printer such



                                                                                 6
PostScript/ PDF/ PCL files, RTF or even color graphs that can be made
   interactive using ActiveX controls or Java applets.


System architecture modeling




   Reporting data by
   investigators.




   Clinical Trials
   Hospital Labs.



                                   Data Dictionary
   Clinical Studies                                        Modification
                                      Archive
   Data,                         (MedDRA/PubMed)

                                     (Oracle)
                                                           Verifications

   Post marketing                     DATA Ware
                                        House

                                                            Adverse Event
                                                              Reporting


   Individual clinical                  Data
      User
   trials                             Analyses
                                       (SAS)




Information must be gathered by drug providers. These data come from clinical
studies by the FDA and other professional investigators. Other information
comes from medical records of patients who were treated by the specific drug.

 Usually, drug providers do a study of their product before moving onto the
evaluation step. The first step is the collecting of data to generate reports such as
the country of origin for patients receiving the drug, worldwide patient exposure,
demographic characteristics, most commonly reported body-system reactions
(ordered by gender and/or age of patients), and the summary of death or other


                                                                                   7
critical body reactions. Another resource is the company’s surveys on products
completed by patients or clients who are volunteers in the U.S. or other
countries. These surveys include match data from Med Watch Forms that the
U.S. Department of Health and Human Services accepts as a voluntary reporting
of adverse events and product problems. Also, these manufacturing companies
may be able to receive FDA Reports generated on the basis of Med Watch
Reports about this product. Furthermore, many of the surveys are answered by
physicians and other doctors who have the EMR System and are able to answer
detailed questions regarding medical conditions and other related medical
issues.

Any tool that is recommended here should be consistent with FDA Standards
and the objectives that follow.


 In any Adverse Event Reporting System, the Basic Calculation and Data
   Analysis have statistical bases on data sets that may frequently be ordered
   according to one or more variables coming from a variety of data sources.
   Thus an Adverse Events Reporting System can work on any possible
   platform. For example, if it uses E2B data element structures then it should be
   able of doing any possible interactive query or data flow transactions on
   shared data. SAS is compatible with all computer platforms. It works on any
   type of operating system. It supports data sharing concepts. It suports
   submission through the WEB or any other network that includes Oracle, Unix,
   NT servers or Mainframe machines. This means that any regardless of
   backbone, SAS can suport it.


 Data sources may need to be summarized or checked before being reported.
   Scripting and programing concepts are one of the major necesities in
   development. SAS has a powerful scripting language that can do any
   required summarizing, verification and validation.




                                                                                8
 In the pharmaceutical field and bio-informatics, SAS software is generally
   thought of for statistical analysis programming but is also a largely untapped
   resource for its other many features. It’s screen building and object oriented
   development abilities are needed to keep up with the latest Information and
   Technology advances.


 SAS is a stand-alone system produced by SAS, Inc. and sold in the open
   market. It exceeds all technical objectives specified here.


The FDA has proposed MedDRA as a standardized dictionary of medical
terminology. MedDRA has been used internationally to discuss the regulation of
medical products. MedDRA provides symptoms, signs, diseases, and diagnoses
information. It also includes other information such as:


      Names of investigations (e.g. liver function analyses, metabolism tests)
      Sites (e.g. application site reactions, implant site reactions and injection
       site reactions)
      Therapeutic indications
      Surgical and medical procedures
      Social and family history terms


SAS and MedDRA are FDA standards. They have high standard designing; and
assure that company builders continue looking to find weaknesses and improve
their products. All their documentation and userinterfaces are user friendly. SAS
and MedDRA are generic softwares and any specific needs such as security of
data or reliability of operations can be negotiated in a service level agreement.


EMR Database: These data come from hospital laboratories and clinical data
entry systems. They are documented before and after verification. All
documentations are electronic and all reporting submitted electronically. MeDRA




                                                                                    9
does encoding that is part of clinical data entry. All data entries are standard
based approved by the FDA.


Terminologies:

A computerized Drug Safety Evaluations requires the following informatics
terminologies:

 Data classifications
   Control Code
 Formatting
 Quality Control
   Data Mining
   Gathering information
    Accessing and manipulating data
 Scripting

Each of these terminologies carries a process or methodology that will be
discussed in the following.


Data Classification:

Any Structured Analysis of information needs classification. Data Classification is
the first best-known task in data flow modeling. The data model of a Drug
Adverse Event Reporting System is derived from conceptual information such as
entities and their interrelationships. A mechanism serves as a store of all drug
information which can link analysis, design, implementation and evolutions
applied in most medical applications. This classification should be consistent and
not clash. It is integrated in all parts that require maintainability.

The outcome attributed to adverse events is the most important information that
needs to be classified. The data classification for this attribute should be a
standard classification that is matched by the FDA reporting program.


                                                                                10
The FDA uses MedDRA as a part of the proposed rule for post-marketing
reporting. MedDRA is the abbreviation for Medical Dictionary for Regulatory
Activities and it is an international terminology designed to support the
classification, retrieval, presentation, and communication of medical information
throughout the medical product regulatory cycle. Originally, MedDRA was written
in English and distributed in ASCII file format; but it is now available in several
other languages such as Dutch, French, German, Italian, Portuguese, Spanish,
and Japanese. This on-line dictionary is intended to become the global medical
terminology standard for use by every bio-pharmaceutical company in the world
and has the best-known classification with an integrated platform in updating that
can be used by all standard systems. In the majority of homegrown medical
applications, the patient medical recording systems use this classification and it
is valid for all phases of drugs and subscribing Pharmaceutical companies.

MedDRA works as a catalog of medical disorders. It has a hierarchical data
structure that has five terms. Developing queries or retrieving information about
medical diagnoses need hierarchical searching on these terms, and other
queries might be selected by grouping them thusly.

The next page picture shows the SOC view of Cardiac and Vascular
investigations (excl enzyme test):




                                                                                11
MedDRA classifications have an Object Oriented data structure as shown in the
following screens.



                                                                          12
13
Each MedDRA has a unique code that can be use as a searching key.



                                                                    14
A query makes a link between collected data and terms in MeDRA. A
query can create a selection on a description of medical data. This
selection requires searching and enters the term to be sought into the
'Search for Value' field. The query then selects one of the records
returned and identifies information about patients. After that, codes in
the database are ready for any statistical evaluation.

The other advantages of using MedDRA are:

   MedDRA is on-line (not requiring installation or periodic updates on the client
    system). The application has a standardized interface, is well supported, and
    requires little effort to interface with any client computing environment. A
    good designer can get the best advantage of this classified information by
    using it as a shared data set. Updating this shared information maintains all
    the related outcomes that have referenced this data set.

   Informatics terminologies such as encoding are already included in MedDRA
    for its own data sets.
   MedDRA includes high standards that can be updated with queries or
    importing data; however, it requires quality control because it can disrupt
    everything.
   Current MedDRA Version has MediMiner for the managing and analysis of
    the coded data included all data mining. This unique tool allows analysis of
    the impact of recoding the data sets from one MedDRA version to another
    when MedDRA is a standalone product that has been used as an integral
    component of our range of coding tools.       MedDRA classification can be
    browsed by a tree that can be collapsed and viewed at every level of detail for
    all occurrences in every possible search category such as legend, terms and
    coding.




                                                                                15
Control Code:

SAS and MedDRA both have code controlling utility to do the following:

   Debugging system and maintenance ability in any branch of code to make a
    cross-reference listing showing all the program names that have been
    declared and used.
   The analyzer discovers un-initialized variables, unreachable codes, uncalled
    functions and procedures as well as the number of times executed for each
    statement.

MedDRA has MedMiner as its version control utility. During any updating in
MedDRA MedDRA 3.1, MediMiner controls all changes by analyzing the coding
sets. In MedDRA 4.1, it also impacts the recoding of data by identifying all codes
that remain unchanged, and identifying those codes that may require recoding. It
is also possible to identify the codes that no longer exist, those that have been
changed in some way, and those that have a related change or where a
multiracial (inherited from multiphase of original codes) change has had an
impact. Primary and secondary changes are identified as well as changes in the
current status of the code.

SAS software includes Source Control Manager (SCM) utility as one of the
options in Desktop selection of Solution menu.

SAS->Solutions->Desktop->Development and Programming-> Source Code
Manager




                                                                               16
SCM includes a friendly GUI that has SAS file check-in/check-out capabilities.
This GUI lists all libraries, data sets, catalogs, and catalog entries in a
hierarchical order. SCM has flexible testing, revision control, and version labeling
with an easy application distribution utility. By having a version label, it is easy to
create a copy of an application and place it in other locations on the network.
Also, SAS/CONNECT utility can place the application on other remote machines.




Formatting:

Usability of information is one the most important components of any application
implementation. Usability requires readability and the readability of any data set
is facilitated by standardized formatting. Each line represents many separate


                                                                                    17
pieces of information which are data values, and the formats determine how
these values are displayed or used in calculations. These formats set the width
of displayed values, the number of decimal points displayed, the handling of
blanks, zeroes, and commas, as well as other details.
SAS supports its own standard and user defined formatting. Standard formats
might be use for numeric, character or picture data. Also, User can write or
define custom-made formats in Data and Procedure steps. User defined formats
are reusable and can be saved in format catalogs. If saved in a SAS Catalog
they then remain there permanently. If saved in catalog WORK.FORMATS, they
are there temporarily and retrievable only in the same SAS session or job in
which they were created. Because catalogs are a type of SAS files that reside in
a SAS data library, they work as an executable handling facility and intercept run-
time error under undefined format. By this way, type-checking is supported and
influences the readability of information. If the SAS system option NOFMTERR is
in effect, SAS uses its own default formatting when it calls an undefined format
so that in some cases we might ignore these errors and continue the executing.


Quality Control:

Delivering the correct result requires quality control. SAS recognizes common
errors such as syntax, execution-time, data and semantic errors; however, users
can check for common mistakes such as the following:

    Check for syntax errors

 o   statements ending with a semicolon
 o starting and ending quotation marks
 o keywords
 o Every DO and SELECT statement must be followed by an END statement


    Check for execution errors:




                                                                                18
o illegal mathematical operations
 o observations out of order for BY-group processing
 o Incorrect reference in an INFILE statement such as misspelling or
     otherwise incorrectly stating the external files are recognized.
 o   A program may run, yet give an incorrect result. These errors are often
     detectable by checking self-consistency and should always be reported,
     certainly in the debugging stage, and often during production runs.

    SAS usually executes the statements in a DATA step one by one, in the
     order they appear. After executing the DATA step, SAS moves to the next
     step and continues in the same fashion. It must be certain that all the SAS
     statements appear in order so that SAS can execute them properly.
    Check input statements and data. SAS can detect data errors during the
     execution; but this won’t terminate the processing. After executing, it prints
     a note describing the error. In that note SAS lists the related values that are
     stored in the input buffer and the program data vector.

 o The corresponding values with actual variable values in INPUT statements
     must be checked.
 o Any corresponding arrangement such as formats, lists and columns for input
     statements must be checked too.

Data mining:

Data mining is a class of database applications that look for hidden patterns in a
group of data. Statistical analysis is the data analyzing method that is matched
with the nature of data mining. Statistical analysis might uncover the hidden
pattern of data for a large volume of information coming from Adverse Events
Reports or survey systems. A data mining process might combine variables that
occur more than expected. By applying statistical options, an optimal guess can
be made about the best match behavior that may have occurred frequently.




                                                                                 19
Data mining is a critical aspect of these reporting systems. Occasionally, the
predictions may be even more important than detections in drug safety
evaluation. In the United States, patients can file lawsuits against drug providers
for severe adverse reactions. These legal actions often make American drug
companies fearful to introduce drugs into the U.S. market. However, data mining
on data from other parts of the world offers a way to move the drug safety
process from a reactive process to a proactive posture in efforts. In effect, it
would help drug providers to take a safer marketing strategy rather than take
risks.

Data mining on data from other parts of the world is also a way to move drug
safety evaluation from detection to prediction

If MedDRA System Organ Class terms are adopted as a class of events then one
can select related data from patient records for that event and make it possible to
discover statistical rules or patterns automatically from the data, later creating a
hypothesis and runing tests on the patient record database to verify or refute it.

Data mining can protect drug providers against lawsuit. This process uses data
from other countries and clinical studies.

SAS assists data analyzing in an instructional way, so that even people with no
statistical knowledge are able to run the required processes on selected data
sources (a basic option includes: counting missing and non-missing values,
minimum, maximum, range, sum, mean, variance, standard deviation, standard
error of the mean, coefficient of variance, skewness, kurtosis ). In addition,
access to data sources can be secured to prevent unauthorized access. SAS
also allows for the creating of different reports and presentations on results
(including tabular tables, frequently reports with graphical presentations to
visualize the results).

SAS supports data mining for a large volume of statistical procedures
(regression, association discovery, time series, and time series cross-Sectional


                                                                                     20
(panel) data analysis), whereas, data is usually analyzed by regression (one
observation for each patient). Sometimes it is required to correlate with cross-
sectional data such as geographic region, gender, smoking, alcohol use, and so
on.

Gathering information and documenting system specifications:

The available information (such as the toxicological and pharmacokinetic profiles
of the individual drug, the treatment indication or indications, the intended
populations, etc.) might have been defined by relational databases.             The
backbone of this system might be SQL, Access or even Excel; but the data query
may not be suited to the performance of detailed statistical analyses of data in
this stage.     It is then that SAS helps in statistical analysis. SAS has been
interfaced with databases to allow large volumes of data to be retrieved efficiently
for analysis. All engines can be assigned to a SAS library. This library is a place
that saves all access to the stored files. These files might come from a variety of
engines such as ODBC, SPSS, SYMBAS, REMOTE, META, MYSQL, ACEESS,
ORACLE, DB2, MySQL, ACCESS, etc. For the processing of data, it is required
to define all connections that might be created between the different sets of data
records. The first link can illustrate correspondence of the MedDRA
classifications to the patient records. In concentrating on the relevance of
available data, medical information of patient works in tandem with MedDRA
classifications to build queries and analysis information.

As a part of application developing process, specifying the following information
is required:

1.    Source data: Miscellaneous data sources may exist and in order to get the
      correct results, the prescription drug information provided by drug firms
      should be truthful, balanced, and accurately communicated. The same
      applies to data coming from clinical and post-marketing trials, or spontaneous
      reports (submitted individually by doctors or patients). Dynamic data are



                                                                                 21
operational data from internal systems such as the homegrown applications
     of clinics or hospitals, the manual data coming from paper chart patient
     history, EMR (Electronic Medical Records), and Adverse Event Reporting
     (Med Watch).
2.   Data Staging: This area includes the storage and processing for extracted
     data from the internal and external systems prior to loading in a SAS data
     bank. The following is a list of cases.
      •     Information may be located in multiple SQL tables in a local computer or
            external servers. If it is required, one may make a connection to the
            database server and use the data dynamically. For example the Adverse
            Events Database has included side effects which are serious (such as
            death or risk of dying, hospitalization, disabilities, congenital anomaly or
            required intervention to prevent permanent impairment or damage).
            These data are required for generating some particular reports.
      •     Part of the information is part of Aventis Reports or ClinTrace. Data from
            these two areas might work together to complete an assignment then
            create an executable program that makes a connection to the backbone
            database of these two licensed vendor applications and use the data.

      Note: Having a basic knowledge about these databases helps programmers
          to create standard codes. For example an Aventis or ClinTrace Case ID
          (Manufacturer Control #) is assigned on an “Episode” basis for each
          patient. Adverse Events (reporting side effects) are temporarily linked to
          the same episode and are entered in the same Case ID. For drugs that are
          given intermittently, additional episodes (Case ID) are created for events
          that occur after different treatment cycles.

      •     Side effects are stored in Companies Core Safety Data Sheets. These
            sheets are for global labeling of reports and are based on the diagnoses
            which are in turn assessed by seriousness. All diagnoses reported from
            intensified   monitoring   (such    as   clinical   trial   or   post-marketing
            surveillance study) are assessed as associated or not-associated with


                                                                                        22
the study medications. These data may be joined to MedDRA
            information to build a larger directory that is used in SQL scripts.

      •     Drug providers use certain information, such as the cause of side effects
            as a result of internal or natural body process, in a causality algorithm for
            internal clinical interpretation or signal evaluation purposes. In some
            particular cases, this algorithm is required to be applied as a part of
            script logic in the SAS code. If a company has a computerized analyzing
            application, depending on their software, it is possible to execute a
            connection for using this application inside the SAS script code.

      •     In data mining related by diagnoses, MedDRA information is required. It
            is recommended to use SAS scripting for creating a remote connection
            to read MedDRA ASCII file, importing data to the temporary created
            tables. These tables would be deleted at the end of scripting process.

          Note: All transactions such as queries, statistical analyses or visualizations
       coming from sources should be consistent. Sometimes these data are not
       enough to be consistent. In order to solve this problem, all “no match” data
       need appropriate transformations or conversion from their original form to
       the MedDRA representation.

3.   Metadata: A term used to describe or specify the data. It is used to define all
     of the characteristics of data required to build databases and applications,
     and to support knowledge workers and information producers. This includes
     data element name, meaning, format, domain values, business integrity rules,
     relationships, owner, etc.

For example the following classification shows the analogy of data concepts in
 MedDRA:

1. SOC
     MedDRA CODE                Numeric
     MedDRA Term                String


                                                                                      23
2. HLGT
      MedDRA CODE             Numeric
      MedDRA Term             String
3. PT
      MedDRA CODE             Numeric
      MedDRA Term             String
      COSTART Symbol,         AlphaNumeric
      WHO_ART Code,            Numeric
      ICDS Code,               Numeric
      PT ICD-10 Code           Numeric
      HARTS Code,             Numeric
      ICDS_CM Code,            Numeric
      JART Code                Numeric
*     SOC Code                 Numeric
*     SOC Name                 Numeric

4. LLT – Lowest Level Term

       MedDRA Code     Numeric
       MedDRA Term     String
       WHO_ART Code    Numeric
       COSTART Symbol AlphaNumeric
       ICDS_CM Code    Numeric
       CURRENCY        Character/Boolean
       HARTS Code Numeric
       ICDS Code       Numeric
       JART Code       Numeric

*   Multi valued attribute

Defining Metdata for the adverse event reporting data is also required. These
 data are:

           o   Patient Identifier and patient information: age at time of event or
               date of birth, sex, weight, etc.
           o   Outcomes attributed to adverse events such as death, life-
               threatening occurrences, hospitalization, initial or prolonged,
               disability, congenital anomaly, required intervention to prevent
               impairment/damage, other.
           o   Date of event and report in mo/day/yr format.
           o   Description of problem.
           o   Relevant tests/laboratory data including dates.



                                                                               24
o Other relevant history including preexisting medical condition (e.g.
               allergies, race, pregnancy, smoking or alcohol use, hepatic/renal
               dysfunction, etc.)

Still most popular medical clinics use Paper Medical Records (PMRs) but many
others have begun to use Electronic Medical Records (EMRs). No standard form
has been yet defined for EMRs, but all provide the same information that requires
Metadata definitions. These data are:

           o Patient primary reason for medical visit
           o History of onset of clinical signs and symptoms,
           o Current list of medications the patient is using
           o Relevant past medical history, including hospital admission,
               surgeries, and diagnosis
           o History of family disease, such as diabetes, cancer, heart disease,
               and medical illness
           o   Social history: use of drugs, smoking, job stability, and housing,
               living condition, incarceration.
           o   Review of systems: patient relocation of systems and current
               medical problems, such as trouble sleeping at night, panic
               episodes, and results of tests.
           o Physical examination: the clinician’s hands-on examination of
               patient, including head, eyes, ears, nose, throat, chest, and
               extremities
           o Labs includes blood glucose, cholesterol, and drug levels
           o Studies such as X-ray, MRI, CT, and EKG.
           o Progress notes such as record of temporal progression of signs
               and symptoms, labs and studies for the length of the study or
               admission

4.   The entity-relationship model



                                                                              25
The specification of required information for an adverse event serves as a
starting point for constructing a conceptual schema (overall design of the
database) for the suggested database. The identity set and attributes targeted
here are drug and patient entity sets. These entity sets have a relationship that
has attributes by itself. This relationship is a “many to many” relationship. Other
relationships might be designed between subsets of an entity set. The
relationship between drug entity sets and ingredient or side effect entity sets are
examples of these relationships. Here, these relationships are “many to one”
relationship. This method of designation helps in saving memory. In some other
cases such as patient-drug relationship, the maximum participants are limited to
two relations, which leave a designation in one general set.

In the following diagram, small rectangles show the entity set; large rectangles
specify attributes; diamonds represent relationship sets; lines link attributes to
entity sets and entity sets to relationship sets; arrows indicate that an entity falls
exclusively into another entity; double lines indicate many relationship sets; bold
diamonds show “many to one” relationship sets, and rectangles with non-indexed
information indicate information about a relationship set.




                                                                                   26
ID
                  Reason
                  date_of_event
                  date_of_report
                  therapy_start_date
                  therapy_end_date
                  diagnose information
                  Lot_number
                  Exp_Date
                                                         1. MedDRACode
                  NDC Num
                  adverse_desc
                  route and dosage                                          1. ID
                                                                            2. Name
                                                                            3. Value
                                                                            4. Unit




                                                      Adverse reactions
                                                      and side effects




                                Patient -Drugs                               Drug-
                                                                             ingredie




   1.   ID              relevant information :   1. id
   2.   First Name           10. allergies       2.   generic name
   3.   Middle Name          11. smoking         3.   trade name
   4.   Last Name            12. alcohol
                                                 4.   the dosage range
   5.   Date Of Birth        13. pregnancies
   6.   Sex                  14. dysfunction     5.   metric unit
   7.   Weight               15. Lab results     6.   category
   8.   race                       information   7.   the form of
   9.   country

                   Above E_R model is a sample of what can be considered; although the attributes
                   can be designed with more details in mind. For example, ‘rout and dosage’
                   could be designed as a separate entity because it includes many optional
                   attributes that may be concatenated together as a description data text. They



                                                                                              27

Occur
may also be saved seperatly in a data source. This designed E_R model gives
substantial flexibility in the designing of the basic data base schema.




Accessing and Manipulating Data:

The first step in accessing and manipulating data is the DATA Step. The DATA
Step is for accessing, reading and programming the data processing. As
explained before, one of the strengths of SAS is the fast and easy access from
many different sources. In addition to the programming components, SAS has
many other features in the DATA Step Process that help to develop a standard
application. SAS language has all the statements required for accomplishing
typical data processing. Among these are the reading and adding of raw data
files and SAS data sets and writing the results. Sub-setting data, combining
multiple SAS files, creating SAS variables, recoding data values; and creating
listing and summary reports that include advanced analyzing features such as
web analytical solutions are also possible.

Special focus should be placed on the management of SAS data set input and
output, working with different data types, and the manipulation of data. It may
also be necessary to control the SAS data set input and output, combine,
summarize, and then process iteratively with programming to perform data
manipulations and transformations

Accessing data would be first needed here. Sometimes, the required data file
will be saved in another server and location. With an ftp server running, SAS can
make an ftp connection and use the external data source remotely without there
remaining any copy of the downloaded data on the machine unless SAS writes it
out. As an example, one can assume the data belongs to cps-users and is
located at ~/halley/thesis/main.data.


 filename fromrcr
    ftp 'main.data'
    cd='halley/thesis'
    user='cps-user'
    host='cps.brockport.edu'                                                  28
    recfm=v
  prompt;
Many data might come as raw data. This raw data must be entered into a SAS
data set. As an example, one of the clients might send a letter or a txt file that
includes parts of the patient’s information. The following script shows how to
input these data into a SAS data set.



 data PatientInfo;
  infile 'c:thesisdata1.txt' ;
   input PatientId $ 1-13 age 14-17 sex $ 18-23 weight
 24-30 +2 country
 run;
 proc print data=PatientInfo;
 run;




The SAS System       05:25 Thursday, December 15, 2005 5

PatientId            age sex    weight country

Hzan0616341           30 1     200       11
Amir5666892           40 2     180       12
J675bhgfdql 56        2    .    45       ->
Nmjhg567908          12 1      100       23
Iu6-567-567          99 1      170       01
***A missing value for a numeric variable is presented by a period (.)



Processing Examples:



    •   To use external files, it is required to tell SAS where to find them. To do
        this, there are the following choices:



                                                                                29
1- Identify the file directly in the INFILE, FILE, or other SAS statement that

         uses the file.
   2- Set up a fileref for the file by using the FILENAME statement, and then

         use the fileref in the INFILE, FILE, or other SAS statement.
   3- Use operating environment commands to set up a fileref, and then use the

         fileref in the INFILE, FILE, or other SAS statement.

Note: To use several files or members from the same directory, partitioned data
sets (PDS), or MACLIB, use the FILENAME statement to create a fileref that will
identify the name. The fileref can then be used in the INFILE statement and
enclose the name of the file, PDS member, or MACLIB member in parentheses
immediately after the fileref, as shown in the example below:


  /* filename data 'directory-or-PDS-or-MACLIB' */;
  /* data1.txt and data2.txt located in directory c:thesis */

  filename data 'c:thesis';

  data paitientdata1;
    infile data('data1.txt');
     input PatientId $11. +2 age 2. +1 weight 3. +2 sex 1. +2 date1 mmddyy10. +2 date2
  mmddyy10. + 2 country 12. ;

  run;

  data paitientdata2;
    infile data('data2.txt');
     input PatientId $11. +2 age 2. +1 weight 3. +2 sex 1. +2 date1 mmddyy10. +2 date2
  mmddyy10. + 2 country 12. ;
  run;




   •     Also, from file menu, ADX can import data from a SAS data set or any of
         ACCESS data base, Excel spreadsheet, a dBase database, a delimited
         text file, and files with other common formats. This is helpful when one has
         saved information in a variety of formats.


   •     In SAS one can gain access to                 data sources by defining ’libref’ and
         assigning accesses to them without copying them inside the SAS


                                                                                         30
environment.    ‘libref’ makes a shortcut to the metadata on the SAS
     Metadata Server. Any metadata in the SAS metadata server can be read
     by a Meta. Meta is an engine that has options for controlling the outputs.
     Meta creates just the metadata in the repository and does not affect the
     data sources. If the table does not exist in the data source, the Meta
     engine creates the metadata based on the information specified in the
     application for the output table. When deleting a table, this option deletes
     the metadata from the repository but does not delete the table from the
     data source. Also, when deleting a table, this option deletes the table from
     the data source but does not delete the metadata from the repository.


     SAS Library includes Metadata objects that are defined by ‘libref.’ These
     objects define the engines that are used to process the data. This library
     has URI (Uniform Resource Identifier) architecture. To get access to a
     SAS Metadata Server, define the host address.         If working in a TCP
     network, define the port number. If the protocol is not a com but a bridge,
     define a user-id and password otherwise it will not be possible to log into a
     SAS Metadata Server. In addition, any repository Metadata may be used
     by a repository-id or name.


     To access these tables, one can use SAS/Warehouse Administrator as a
     tool. In order to determine the metadata, it needs to identify and search
     the objects by their name, URL and other identifiers such as their ID. The
     following script displays this process.


       Ibname       upcase    metan      liburi="SASLibrary?@name='oralib'       "
       ipaddr=d6292.us.GCS.com


Scripting:




                                                                               31
SQL Scripting Goal is the driving of available data from any possible data source.
Most vendor applications have SQL backbone so that with SQL scripting it is
possible to perform queries on original or manipulated data (retrieving data from
multiple tables; creating views, indexes, and tables; and updating or deleting
values in existing tables and views as well as summarizing them). SQL scripting
can happen in SAS or SQL environment.

In the following example, the reduction of the earlier E_R schema ids is created
from inside the SQL environment:



/*------------------------------------------------------------------------------------------*/
/* create a higher-level entity set for drug information */
CREATE TABLE drug(
id                                    CHAR(12) NOT NULL,
generic_name                          CHAR(25),
trade_name                            CHAR(25),
dosage                                INT,
unit                                  INT,
category                              INT,
FOREIGN KEY (category) REFERENCES drug_category(category_id)
ON DELETE CASCADE,
FOREIGN KEY (unit) REFERENCES unit(unit_id)
ON DELETE CASCADE,
PRIMARY KEY (id)
) ENGINE=INNODB;

/* create the lower level entity sets for drug information */

CREATE TABLE ingredient (
id                 INT,
drug_id            CHAR(12),
ingredient_name    CHAR(25),
ingredient_value   INT,
unit               INT,
INDEX drug_ind (drug_id),
FOREIGN KEY (drug_id) REFERENCES drug(id)
ON DELETE CASCADE,
FOREIGN KEY (unit) REFERENCES unit(unit_id)
ON DELETE CASCADE,

) ENGINE=INNODB;


/*   the side effects of each drug have description that should be
compatible with MedDRAClassification */

CREATE TABLE sideeffects (



                                                                                                 32
MedDRACode         INT,
drug_id            CHAR(12),
INDEX drug_ind (drug_id),
FOREIGN KEY (drug_id) REFERENCES drug(id)
ON DELETE CASCADE
) ENGINE=INNODB;

/* create a general entity set for patient information; This entity set
can be expanded by other entity sub sets such as patient laboratory
information or more information about the history of that patient */

CREATE TABLE paitient(
id                 CHAR(12) NOT NULL,
first_name         CHAR(25),
middle_name        CHAR(25),
last_name          CHAR(25),
DateOfBirth        DATE,
Sex                INT,
weight             INT,
race               INT,
country             INT,
FOREIGN KEY (race) REFERENCES drug(race_id)
ON DELETE CASCADE,
FOREIGN KEY (country) REFERENCES drug(country_id)
ON DELETE CASCADE,
PRIMARY KEY (id)
) ENGINE=INNODB;


/* some revalent paitient     information   might   come   from   following
sugested sub entity set */

CREATE TABLE Relevant_Patients_Info (
Info_id               INT NOT NULL AUTO_INCREMENT,
paitient_id           CHAR(25) NOT NULL,
allergies_id          INT,
races_id              INT,
Num_pregnancies       INT,
smoking               INT,
alcohol_use           INT,
hepatic_id            INT,
dysfunctions_id       INT,
INDEX (allergies_id),
FOREIGN KEY (allergies_id) REFERENCES allergies(allergies_id) ON UPDATE
CASCADE ON DELETE RESTRICT,
INDEX (races_id),
   FOREIGN KEY (races_id) REFERENCES races(races_id) ON UPDATE CASCADE
ON DELETE RESTRICT,
INDEX (hepatic_id),
    FOREIGN KEY (hepatic_id) REFERENCES hepatic(hepatic_id) ON UPDATE
CASCADE ON DELETE RESTRICT,
INDEX (dysfunctions_id),
FOREIGN KEY (dysfunctions_id) REFERENCES dysfunctions(dysfunctions_id)
ON UPDATE CASCADE ON DELETE RESTRICT,
INDEX (paitient_id),
FOREIGN KEY (paitient_id) REFERENCES paitient(id) ON UPDATE CASCADE ON
DELETE RESTRICT,


                                                                        33
PRIMARY KEY(Info_id)
) ENGINE=INNODB;

/* transforming to a tabular form of this E_R model includes aggration
is streightforward. Paitient-Drug relationship includes a column for
each attribute in the primary key of the entity set for this
relationship (any oconcomitant medical products that paitient uses and
therapy dates might come from related tables in the drug id and
paitient id. Also, any available adverse event information that shows
the problem of using that drugshould be included.)
*/


CREATE TABLE Patients_drugs (

Info_id              INT NOT NULL AUTO_INCREMENT,
paitient_id          CHAR(25) NOT NULL,
drug_id              CHAR(12) NOT NULL,
therapy_start_date   DATE,
therapy_end_date     DATE,
MedDRACode_DiagnoseForUse INT,
/* 1 == yes, 2==no, 3==doesn’t apply */
/* Event abated after use stopped or dose reduced */
Quest1               INT,
/* event reappeared after reintroduction */
Quest2               INT,
 Lot_number          INT,
 Exp_Date            DATE,
 NDCno               INT,

reason                 INT NOT NULL,
date_of_event          DATE,
date_of_report         DATE,
adverse_desc           TEXT, -----

INDEX (paitient_id),
FOREIGN KEY (paitient_id) REFERENCES paitient(id) ON UPDATE CASCADE ON
DELETE RESTRICT,
INDEX (drug_id),
FOREIGN KEY (drug_id) REFERENCES drug(id) ON UPDATE CASCADE ON DELETE
RESTRICT,
PRIMARY KEY(Info_id)
) ENGINE=INNODB;




SQL scripting is required to generate reports on summary statistics. Macro
Language provides a facility that allows writing SQL procedure inside the SAS
environment. Therefore, SQL scripting extends SAS coding to the retrieval and
combination of data from tables or views. New ones can be created along with


                                                                          34
indexes, and data values in PROC SQL tables can be updated.              It is also
possible to update and retrieve data from Database Management System tables
or modify a PROC SQL table by adding, modifying, or dropping columns.

Example: Assume the Adverse Events Information from clinical studies, post-
marketing trials, spontaneous reports, and miscellaneous sources (including
independent drug identification numbers and retrospective data collection) are
saved in the above SQL tables. The following script generates a report that
shows Country of Origin for Patients receiving a drug in a post-marketing setting.



 proc sql;


 /* It extracts and manipulates grouped and ordered data from
 patient records to create a new temporary view table that includes
 only patient populations in each country. Country field is defined
 as an id number; to represent it by country name, it joins to the
 columns from countries table. After process is done, the temporary
 view table is dropped*/

 create view temp as
    select country, count(country) as count,
         calculated Count/Subtotal as Percent format=percent8.2
       from paitient,
            (select count(*) as Subtotal from paitient) as survey2
       group by country
       order by count;
 quit;

 proc sql;

 /* extracts required data from created temporary view table and
 then drop it */

 title1 'Country or Origin for Patients Receiving the suspected
 drug in a Postmarketing Setting';

 select c.countryname,t.count as cc,"(", t.Percent ,")"
 from countries c, temp t
 where c.ipcode = t.country;
 quit;

 proc sql;
 drop view temp;
 quit;




                                                                                35
Country or Origin for Patients Receiving the suspected drug in a Postmarketing Setting
                                   22:04 Monday, January 16, 2006

CountryName                 Percent
               ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Greece                                           1   (0.1%)
Uruguay                                          2   (0.2%)
Taiwan                                           2   (0.2%)
French Polynesia                                 2   (0.2%)
Peru                                             2   (0.2%)
Korea                                            2   (0.2%)
South Africa                                     3   (0.2%)
Portugal                                         3   (0.2%)
Turkey                                           4   (0.3%)
Hungary                                          4   (0.3%)
Austria                                          4   (0.3%)
New Zealand                                      7   (0.5%)
Brazil                                           7   (0.5%)
Norway                                          10   (0.8%)
Israel                                          11   (0.8%)
Chile                                           15   (1.1%)
Netherlands                                     26   (2.0%)
Italy                                           39   (3.0%)
Spain                                           38   (2.9%)
Belgium                                         38   (2.9%)
United States                                   42   (3.2%)
Finland                                         44   (3.4%)
Germany                                         50   (3.8%)
Sweden                                          69   (5.3%)
Denmark                                         91   (7.0%)
Canada                                          97   (7.4%)
Australia                                      107   (8.2%)
Great Britain                                  271   (20.8%)
France                                         313   (24.0%)

The patient exposure to the drug can be calculated and presented in different
ways. Although available exposure data are provided for a period of time, the
primary focus of a submitted report may be the number of exposures and cases
that occurred in a specific period of time. In the following report, global patient
exposures from 1989 to 2004 are provided:


proc sql;
create view temp1 as
   select region, count(region) as SachetSales
      from paitient
      group by region
      order by SachetSales;
quit;




                                                                                         36
proc sql;
create view temp2 as
   select region, count(region) as Exposures
   from paitient,
    where paitient_Id in (select paitient_Id from Patients_drugs where
substr(therapy_start_date,7,4) > '1983' && substr(therapy_end_date,7,4)
< '2001')
      group by region
      order by Exposures;
quit;

proc sql;
title1 'Wor ldwide Patient Exposure to the suspected drug 1989 to 1994';
select c.region,t1.SachetSales , t2.Exposures
from countries c, temp1 t1, temp2 t2
where c.ipcode = t1.region and c.ipcode = t2.region ;
quit;

proc sql;
select sum(t1.SachetSales)          as   SumSachetSales,       sum(t2.Exposures)   as
SumExposures
from temp1 t1, temp2 t2
quit;

proc sql;
drop view temp1, temp2;
quit;




Worldwide Patient Exposure to the suspected drug 1989 to 1994          23
                                        20:55 Saturday, January 21, 2006

          Region         SachetSales   Exposures
           ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
ƒƒƒƒƒ
                  Europe                     230,649,500           1,895,749
                  Australia                    5,292,542              43,500
                  Korea                         3,067,300             25,211
                  Canada                        1,497,100             12,305
                  Rest of World                 2,405,064             19,768

                            SumSachet       SumExposures
                                            ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
ƒƒƒƒ
                                                242,911,506         1,996,533




Inside the SQL scripting, one may occasionally work with data that are imported
from the MedDRA application. These data may have already existed in a
machine and it is not required to make access to the MedDRA environment a


                                                                                   37
second time. One can use the SAS utility to convert data from one form to
another or copy between machines. A free trial of MedDRA is available on the
MSSO website. This contains a copy sample of MedDRA data which are saved
in an Access data base. It could also be imported to an Excel file if needed. If
the data set is standard and completed it would then be better to use it as a
shared data source. This shared data source may be stored as a Relational
Database System (RDBMS), an Excel spreadsheet, or even as data stored on a
flat file. If it is stored in an external machine then it becomes an external data
source and a SAS connection is required for access.


The following SAS script retrieves MedDRA Classification from a data source. It
imports data from an external file (a spreadsheet) to a SAS table. This code was
generated and saved during the wizard importing process. Saving this type of
script helps to prevent redoing the work when the information is needed again.


 PROC IMPORT OUT= WORK.MEDDRAInfo
             DATAFILE= "C:thesisCTCAEv3.xls"
             DBMS=EXCEL REPLACE;
      SHEET="'CTCAE v3#0 MedDRA Codes$'";
      GETNAMES=YES;
      MIXED=NO;
      SCANTEXT=YES;
      USEDATE=YES;
      SCANTIME=YES;
 RUN;


The following script works as well:



  Filename xclfil
    'C:thesisCTCAEv3.xls’;
  proc import
    datafile=xclfil
    out= WORK.MEDDRAInfo
    dbms=excel97 replace;
    getnames= yes ;




                                                                                 38
The above script retrieves MedDRA Classification from a data source. Often
these data may not represent all MedDRA data. Usually, only a subset of these
data is required and is stored in an external file.

Assume MedDRAClassifications.xls includes only the MedDRA
Classifications Data. To generate reports related to side effects, importing this file
is enough to retrieve the appropriate symptoms information or signs listed by
outcomes.




 PROC IMPORT DBMS=EXCEL OUT= work.MedDRA
           DATAFILE="c:thesisMedDRAClassifications.xls"REPLACE;
 Run;




 infile ' c:thesisMedDRAClassifications.csv' delimiter=',' dsd;
 proc print data=MedDRA;
 run;




The SAS System      05:25 Thursday, December 15, 2005 1

Obs   MedDRATermLevel1       MedDRATermLevel2

  1 Nervous system disorders
  2                       Balance disorder
  3                       Convulsion
  4                       Lethargy
  5                       Optic neuritis
  6                       Paraesthesia
  7                       Speech disorder
  8                       Tunnel vision
  9                       Visual field defect
  10
  11 Eye disorders
  12                      Astigmatism
  13                      Blindness
  ………
………
…
.

* Sometimes the information that comes from a Report Adverse Event, clinical trials or any other
post-marketing or Pharmacovigilance Application has a provisional order number that is assigned
to outcome data which is cannot be correctly mapped to MedDRA. These order numbers alone
can be used when electronic reports or data are submitted and automatically converted to the
MedDRA codes.



                                                                                             39
From the parameter list created, values can be individually highlighted and
chosen for processing. These required parameter values may be retrieved from
tables that have been created by scripts such as following:

proc sql;
   create table reasonlist1
       ( Description char(60));

    insert into reasonlist1
    values('Patient Died')
    values('Life threatening illness')
    values('Required emergency room/doctor visit')
    values('Required hospitalization')
    values('Resulted in permanent disability')
    values('Resulted in prolongation of hospitalization')
    values('others');


The ordering of the above parameter values is important for selecting the rows by
their Order Number and the description of these values must be the same as
those found on the FDA forms. The following script creates a parameter table for
the abbreviations used by Drug Safety Reporting. The ordering and description of
these abbreviations is also consistent with FDA standards.


proc sql;
   create table abbreviations
       ( abb char(5), Description char(60));

    insert into abbreviations

values( 'ADR','adverse drug reaction')
values( 'AE','adverse event')
values( 'AERS','Adverse Event Reporting System ')
values( 'bid','twice daily')
values( 'CI','confidence interval')
values( 'CIOMS','Council for International Organizations of Medical
Sciences')
values( 'COSTAR','Coding Symbols for Thesaurus of Adverse Reaction
TermsT')
values( 'CSDS','Core Safety Data Sheet')
values( 'CV','coefficient of variation')
values( 'FDA','Food and Drug Administration')
values( 'GABA','Gamma amino butyric acid')
values( 'HARTS','')
values( 'IBD','International Birth Date' )
values( 'ICD9-1','International Classification of Diseases, 9th and
10th 0')
values( 'ICD9C','MEditions/Revisions')
values(    'ICH','International  Classification   of Diseases, Ninth
Revision, Clinical MedDRAModification')


                                                                              40
values(   'NDA','International Conference on Harmonisation ')
values(   'PSUR','Medical Dictionary for Regulatory Activities')
values(   'qd','New Drug Application')
values(   'qid','Periodic Safety Update Report')
values(   'SAE','once daily')
values(   'SD','four times daily')
values(   'SE','serious adverse event')
values(   'US','standard deviation')
values(   'WHO-AR','standard error T');
quit;


Formatting may be used for other parameter values. The ATTRIB Statement
permanently associates a format with a variable. SAS uses the format to write
the values of the variables specified.

attrib sales1-sales3 format=comma10.2;

Due to the permanent association of the ATTRIB Statement in the above
command, any subsequent DATA Step or PROC Step will use COMMA10.2
format to write the values of sales1, sales2, and sales3.

In addition to the default formats that are supplied by Base SAS Software, one
can create custom-made formats by the Format Procedure. The following format
procedure is used to define the Static Parameter Values that may be required. It
expresses weights; and measures using USP (United States Pharmacopeia)
standard abbreviations for dosage units.

Proc format;
    value $dosage_units
       ‘1’ = ‘m’
       ‘2’ = ‘kg’
       ‘3’ = ‘g’
       ‘4’ = ‘m’
       ’5’ = ‘mcg’
       ‘6’ = ‘L’
      ‘7’ = ‘mL’
      ’8’ = ‘mEq’
      ’9’ = ‘mmol’
      ‘10’ = ‘ %’
run;

*see legend below for definitions




                                                                             41
(1)         m (lower case) = meter
    (2)                     kg = kilogram
    (3)                      g = gram
    (4)                    mg = milligram
    (5)                   mcg = microgram
                          (do not use the Greek letter mu which has been misread as mg)
    (6)         L (upper case) = liter
    (7) mL (lower/upper case) = milliliter (do not use cc which has been misread as U or the
    number 4)
    (8)                   mEq = milliequivalent
    (9)                 mmol = millimole




It can also be used to define a format variable for the drug in question (see
procedure below):

proc format;
    value $dosage_form
      ‘1’ = ‘capsule’
      ‘2’ = ‘cream’
      ‘3’ = ‘ear drop’
      ‘4’ = ‘eye drop’
      ‘5’ = ‘inhaler’
      ‘6’ = ‘injection’
      ‘7’ = ‘oral solution’
      ‘8’ = ‘solution’
      ‘9’ = ‘suspension pediatric drop’
      ‘10’ = ‘syrup’
      ‘11’ = ‘tablet’
      ‘12’ = ‘chewable tablet’
      ‘13’ = ‘other’

run;


Time durations, age and formats are also available:


proc format;
  value $time_duration_form
      ‘1’ = ‘hour’
      ‘2’ = ‘day’



                                                                                           42
‘3’ = ‘week’
          ‘4’ = ‘month’
          ‘5’ = ‘year’

  run;
  proc format;
     value $age_range _form
        ‘1’ = ‘children’
        ‘2’ = ‘adult’

  run;

  proc format
      value      $eating-format
        ‘1’ =    ‘with meal’
        ‘2’ =    ‘without meal’
        ‘3’ =    ‘before meal’
        ‘4’ =    ‘after meal’
        ‘5’ =    ‘with a glass of water’
         ‘5’ =   ‘other’
   run;

  proc format
       value $time-format
          ‘1’ = ‘morning’
          ‘2’ = ‘noon’
          ‘3’ = ‘after noon’
          ‘4’ = ‘evening’
          ‘5’ = ‘midnight’

   run;




Other values are a combination of the above defined formats. For example, drug
labels may read: “for adults, every morning, 2 tablets, 2 hour before meals, with a
glass of water” or “for children, under 8 years of age, ½ a tablet before meals, with a
glass of water….”


In a database, grouping processes may be based on the “Sex/Gender” field where
the values of “Male” “Female” and “unknown” can define minor groupings. These
values can be stored as Numeric variables (1, 2, and 3). The ordering of numeric
levels in relation to classification variables must be done with care. If in a statistical
report, the data for female patients is required to appear after the data for males,
the “Sex/Gender” field would use “2” for females and “1” for males. The following
SAS script describes this formatting.


                                                                                       43
proc format library=proclib;

        value $sex
              '1'='male’
              '2'='female'
              '3'='unknown'
        picture pop low-high='000,000,000'

 run;


Formatting has other usages in scripting. Many of the data values must be defined
by format. In SAS one can use this format with any of the following:
1. PUT, PUTC, or PUTN functions
2. %SYSFUNC macro function
3. FORMAT/ATTRIB statement in a DATA step or a PROC step

   num=15;
   char=put(num,hex2.);

   population=1145.32;
   put population 10.2;
   result:      1,145.32


Also one can use a macro function to define a user defined function. This function
applies the defined format to the result of the function outside a DATA step.


  %macro tst(amount);
     %put %sysfunc(putn(&amount,dollar10.2));
  %mend tst;
  %tst (1154.23);



 Usually Patient records are the type of data that can come from an Open
 Database Connectivity (ODBC). It is very possible that these data have existed
 as a backbone of a medical client-server application. In this case, access to data
 via ODBC is required. The module "SAS/Access for ODBC" must be installed on
 the computer. Configuring the database by referring to the DNS (Data Source
 Name) and how it is accessed is can also be required.       Even parameter values



                                                                                44
can come from an ODBS. These data may have dynamic data values that get
up-dated by end-users through the web. Normally, these applications have
administration parts that allow the end-user to do parameter updating.


Example:
The following script shows how one can use a part of data that is stored in
another vendor's Database Management System (DBMS) files. This data then
goes into the SAS data set. In the following script a ‘libref’ is declared and points
to a library containing Oracle data. SAS reads data from an Oracle file into a SAS
data set:


libname dblib oracle user=halley password=halley path='hrdept_002';
data paitient.big;
   set dblib.paitient;
run;




Memory allocation is the most important concept in creating or extending a data
library. SAS allows for the request of space as needed. For optimizing system
performance and allocating space appropriately, one can pre-allocate the most
space that that may be needed. These methods are used more often when
multivolume access to SAS data libraries is required.


The above data statement may then change to:

/* Know this is a big data set. */
data paitient.big (alq=100000 deq=5000);


As is explained earlier, data can come from an external data file. Additionally,
one can connect to a data file and work on it. In the following script, we can
connect to Z/OS and UNIX server to use DB2 and Oracle data:

/*************************************/
/* connect to z/OS                   */
/*************************************/



                                                                                  45
options comamid=tcp;
filename rlink    '!sasrootconnectsaslinktcptso.scr';
signon os390host;

/*************************************/
/* download DB2 data views using     */
/* SAS/ACCESS engine                 */
/*************************************/

rsubmit os390host;
libname db db2;
proc download data=db.paitient
     out=db2dat;
run;
endrsubmit;

/*************************************/
/* connect to UNIX                   */
/*************************************/

options
remote=hrunix comamid=tcp;
filename rlink '!sasrootconnectsaslinktcpunix.scr';
signon;

/*************************************/
/* download Oracle data using        */
/* SAS/ACCESS engine                 */
/*************************************/

rsubmit hrunix;
libname oracle user=hzan password=halley;
proc download
    data=oracle.paitient out=oracdat;

 run;
endrsubmit;

/*************************************/
/* sign off both links               */
/*************************************/
signoff hrunix;
signoff os390host cscript=
   '!sasrootconnectsaslinktcptso.scr';


/*************************************/
/* union data into SAS view          */
/*************************************/
proc sql;

 create view temp_joindata as
 (select gender ,country, count(*) into population
 from db2dat group by gender,country ;)
  union
 (select gender,country, count(*) into population



                                                           46
from oracdat group by gender,country;)
      union
    (select gender,country, count(*) into population
    from paitient1 group by gender,country;
)

proc sql;
create view jointdata
select temp_joindata.gender,
       temp_joindata. population,
       countries.name
from temp_joindata, countries
      where countries.codeId = temp_joindata.country
order by gender, countries.name
group by gender, countries.name
options fmtsearch=(proclib);

/* The NOWD option runs the REPORT procedure without the REPORT window
and sends its output to the open output destination(s).*/

proc report data=jointdata nowd;
column gender country population;
format gender $SEX. Country & $50. Population pop;
title ‘Country or Origin for Patients Receiving           the   drug    in   Post
marketing’;
run;



Country or Origin for Patients Receiving this drug in Post marketing
for 04JAN06

Gender      country              Population

Female     Algeria              743,453
Male                         235,984
Unkown                        167

Female     Denmark                423,457,698
Male                         546,876,345
Unkown                        897

Female      Spain               456,9812,564
Male                         400,987,564
Unkown                        234

Female      United Kingdom          876,234,123
Male                         564,234,876
Unkown




Conclusions:
This thesis proposes ways on how to improve programming practices for
Standardizing Drug Safety Reporting Systems. The quality of a Drug Safety
Reporting Application depends on the system architecture, methodologies, and



                                                                              47
modeling used by the programmer. The degree to which an implementation is
standardized is in direct proportion to the correctness of methods in accessing,
gathering and manipulating the data, its classifications, control code, quality
control, formatting, statistical analyzing, and mining thereof. Classification terms
should follow a hierarchical structure that is consistent with FDA standards and
MedDRA. Using the control code with MedMinder and the SCM is also
important. Both this and quality control should not be overlooked by
programmers. Formatting of data must be done properly and again, consistent
with FDA standards. Statistical analyzing and data mining in these types of
applications must also be done correctly as it has a direct affect on the results.
Ultimately, gathering data and its access should be handled dynamically and
manual accessing should not be considered. Above all, details such as size of
data in the data accessing stage should be carefully protected.


As to the professional performing in the system, an advanced background in
computational, mathematical, and programming methods is obligatory for
accurately applying these terminologies. SAS programming, knowledge of
Object Oriented programming data structures, data base modeling and SQL are
all necessary skills for implementing a Standard Drug Safety Reporting System.
Knowledge of statistical modeling is particularly desirable in data mining
applications. Finally, a graduated computational science major or a professional
software designer can make the application work more dynamically and
accurately with good scripting skills. The workbench of Drug Safety Reporting
Systems is made up of SAS, and MedDRA applications. SAS supports an
advanced data accessing technology; and MedDRA classification matches the
metadata required for designing this application. These existing components
improve the reliability of design, and SQL scripting expands it.




                                                                                     48
References
     SAS Publishing, the Analyst Application, Second Edition (July 2002)


     Adriaans, P., and D.Zantings.1996. Data Mining. Edinburg Gate, England: Addison
      Wesley Longman.


     Hand, D.J. 1997. Construction and Assessment of Classification Rules. New York: John
      Wiley & Sons, Inc


     Berry, M.J.A., and G. Linoff. 1996. Data Mining Techniques for Marketing, Sales, and
      Customer Support. New York: John Wiley & Sons, Inc


     Bergeron, Bryan P. (2003). Prentice Hall Professional Technical Reference. Bioformatics
     Computing. New Jersey: Pearson Education, Inc.
     Pharmacoepidemiology and Drug Safety, Vol. 1 [1992], Vol. 2 [1993], Vol. 6 [1997]) &
      Vol. 7 [1998])


     Agresti, A. (1996) Introduction to categorical Data Analysis, Wiley, NY


     Collet, D. (1994) Modeling Survival Data in Medical researches, CRC/Chapman & Hall,
      London


     Benichou C., (ed) Adverse Reactions: A practical Guide to Diagnosis and Management
      (Wiley & Sons, 1994)


     Fuchi, K. (1981) “Aiming for knowledge information processing system.” Processing of
      international conference on fifth generation computing systems, Japan Information
      Processing Development center, Tokyo republished (1982) by North-Holland Publishing,
      Amsterdam


     SAS online documents http://www.sas.com/service/library/onlinedoc


     CDER (http://www.fda.gov/cder/handbook/index.htm)


     MedWatch http://www.fda.gov/medwatch/getforms.htm



                                                                                          49
   MedDRA http://www.meddrahelp.com/




                                        50

More Related Content

What's hot

Traditional vs modern dbms
Traditional vs modern dbmsTraditional vs modern dbms
Traditional vs modern dbmsAYUGUPTA98
 
Data warehouse presentaion
Data warehouse presentaionData warehouse presentaion
Data warehouse presentaionsridhark1981
 
Database management system
Database management systemDatabase management system
Database management systemMidhun Abraham
 
Lecture 01 introduction to database
Lecture 01 introduction to databaseLecture 01 introduction to database
Lecture 01 introduction to databaseemailharmeet
 
File system vs database
File system vs databaseFile system vs database
File system vs databaseSanthiNivas
 
Introduction to Database
Introduction to DatabaseIntroduction to Database
Introduction to DatabaseSiti Ismail
 
Ch1
Ch1Ch1
Ch1CAG
 
Database Systems Introduction (INTD-3535)
Database Systems Introduction (INTD-3535)Database Systems Introduction (INTD-3535)
Database Systems Introduction (INTD-3535)julyprum
 
Business intelligence databases and information management
Business intelligence databases and information managementBusiness intelligence databases and information management
Business intelligence databases and information managementProf. Othman Alsalloum
 
A Software Infrastructure for Multidimensional Data Analysis: A Data Modellin...
A Software Infrastructure for Multidimensional Data Analysis: A Data Modellin...A Software Infrastructure for Multidimensional Data Analysis: A Data Modellin...
A Software Infrastructure for Multidimensional Data Analysis: A Data Modellin...IJCSIS Research Publications
 

What's hot (20)

Traditional vs modern dbms
Traditional vs modern dbmsTraditional vs modern dbms
Traditional vs modern dbms
 
Data dictionary
Data dictionaryData dictionary
Data dictionary
 
Fundamentals of Database Design
Fundamentals of Database DesignFundamentals of Database Design
Fundamentals of Database Design
 
Data warehouse presentaion
Data warehouse presentaionData warehouse presentaion
Data warehouse presentaion
 
Database management system
Database management systemDatabase management system
Database management system
 
Lecture 01 introduction to database
Lecture 01 introduction to databaseLecture 01 introduction to database
Lecture 01 introduction to database
 
File system vs database
File system vs databaseFile system vs database
File system vs database
 
Introduction to Database
Introduction to DatabaseIntroduction to Database
Introduction to Database
 
Db lec 04_new
Db lec 04_newDb lec 04_new
Db lec 04_new
 
Ch1 2
Ch1 2Ch1 2
Ch1 2
 
Ch1
Ch1Ch1
Ch1
 
Ch1
Ch1Ch1
Ch1
 
Types of databases
Types of databases   Types of databases
Types of databases
 
Data mining
Data miningData mining
Data mining
 
Database Systems Introduction (INTD-3535)
Database Systems Introduction (INTD-3535)Database Systems Introduction (INTD-3535)
Database Systems Introduction (INTD-3535)
 
Business intelligence databases and information management
Business intelligence databases and information managementBusiness intelligence databases and information management
Business intelligence databases and information management
 
Data Flow Models part6
Data Flow Models part6Data Flow Models part6
Data Flow Models part6
 
Database Basics
Database BasicsDatabase Basics
Database Basics
 
A Software Infrastructure for Multidimensional Data Analysis: A Data Modellin...
A Software Infrastructure for Multidimensional Data Analysis: A Data Modellin...A Software Infrastructure for Multidimensional Data Analysis: A Data Modellin...
A Software Infrastructure for Multidimensional Data Analysis: A Data Modellin...
 
Database Concepts
Database ConceptsDatabase Concepts
Database Concepts
 

Viewers also liked

مقهى الرشفات
مقهى الرشفاتمقهى الرشفات
مقهى الرشفاتdangermind
 
Europese privacywetgeving_Gerrit-Jan Zwenne_GroenGras
Europese privacywetgeving_Gerrit-Jan Zwenne_GroenGrasEuropese privacywetgeving_Gerrit-Jan Zwenne_GroenGras
Europese privacywetgeving_Gerrit-Jan Zwenne_GroenGrasDAS
 
5 самых известных женщин в сфере геймификации
5 самых известных женщин  в сфере геймификации5 самых известных женщин  в сфере геймификации
5 самых известных женщин в сфере геймификацииMichel Vershinin
 
8-2 Subtr. Fractions/Mixed #s
8-2 Subtr. Fractions/Mixed #s8-2 Subtr. Fractions/Mixed #s
8-2 Subtr. Fractions/Mixed #sRudy Alfonso
 
90/10 Principle
90/10 Principle90/10 Principle
90/10 Principleslidale
 
وحدة الصف قوة وائتلاف
  وحدة الصف قوة وائتلاف   وحدة الصف قوة وائتلاف
وحدة الصف قوة وائتلاف dangermind
 
برنامج البر 4
برنامج البر 4برنامج البر 4
برنامج البر 4dangermind
 
Biologicals Regulation in Australia
Biologicals Regulation in AustraliaBiologicals Regulation in Australia
Biologicals Regulation in AustraliaAlbert Farrugia
 
Als uw incassopartner failliet gaat, dreigt ernstige financiële- en reputatie...
Als uw incassopartner failliet gaat, dreigt ernstige financiële- en reputatie...Als uw incassopartner failliet gaat, dreigt ernstige financiële- en reputatie...
Als uw incassopartner failliet gaat, dreigt ernstige financiële- en reputatie...DAS
 
Yokogawa user conf march 2013 final matt duxbury
Yokogawa user conf march 2013 final   matt duxburyYokogawa user conf march 2013 final   matt duxbury
Yokogawa user conf march 2013 final matt duxburyMatt Duxbury
 
4-7 Multiplying by One-Digit and Two-Digit Numbers
4-7 Multiplying by One-Digit and Two-Digit Numbers4-7 Multiplying by One-Digit and Two-Digit Numbers
4-7 Multiplying by One-Digit and Two-Digit NumbersRudy Alfonso
 
Деструктивные культы: вербовка и эксплуатация
Деструктивные культы: вербовка и эксплуатацияДеструктивные культы: вербовка и эксплуатация
Деструктивные культы: вербовка и эксплуатацияMichel Vershinin
 
The agency model
The agency modelThe agency model
The agency modelLisa Albert
 

Viewers also liked (20)

مقهى الرشفات
مقهى الرشفاتمقهى الرشفات
مقهى الرشفات
 
Europese privacywetgeving_Gerrit-Jan Zwenne_GroenGras
Europese privacywetgeving_Gerrit-Jan Zwenne_GroenGrasEuropese privacywetgeving_Gerrit-Jan Zwenne_GroenGras
Europese privacywetgeving_Gerrit-Jan Zwenne_GroenGras
 
E uniting 2012
E uniting 2012E uniting 2012
E uniting 2012
 
5 самых известных женщин в сфере геймификации
5 самых известных женщин  в сфере геймификации5 самых известных женщин  в сфере геймификации
5 самых известных женщин в сфере геймификации
 
8-2 Subtr. Fractions/Mixed #s
8-2 Subtr. Fractions/Mixed #s8-2 Subtr. Fractions/Mixed #s
8-2 Subtr. Fractions/Mixed #s
 
90/10 Principle
90/10 Principle90/10 Principle
90/10 Principle
 
Government ch. 3 - constitution
Government   ch. 3 - constitutionGovernment   ch. 3 - constitution
Government ch. 3 - constitution
 
وحدة الصف قوة وائتلاف
  وحدة الصف قوة وائتلاف   وحدة الصف قوة وائتلاف
وحدة الصف قوة وائتلاف
 
Dragonsat
DragonsatDragonsat
Dragonsat
 
برنامج البر 4
برنامج البر 4برنامج البر 4
برنامج البر 4
 
Biologicals Regulation in Australia
Biologicals Regulation in AustraliaBiologicals Regulation in Australia
Biologicals Regulation in Australia
 
Als uw incassopartner failliet gaat, dreigt ernstige financiële- en reputatie...
Als uw incassopartner failliet gaat, dreigt ernstige financiële- en reputatie...Als uw incassopartner failliet gaat, dreigt ernstige financiële- en reputatie...
Als uw incassopartner failliet gaat, dreigt ernstige financiële- en reputatie...
 
Yokogawa user conf march 2013 final matt duxbury
Yokogawa user conf march 2013 final   matt duxburyYokogawa user conf march 2013 final   matt duxbury
Yokogawa user conf march 2013 final matt duxbury
 
4-7 Multiplying by One-Digit and Two-Digit Numbers
4-7 Multiplying by One-Digit and Two-Digit Numbers4-7 Multiplying by One-Digit and Two-Digit Numbers
4-7 Multiplying by One-Digit and Two-Digit Numbers
 
Feature satip4
Feature satip4Feature satip4
Feature satip4
 
Деструктивные культы: вербовка и эксплуатация
Деструктивные культы: вербовка и эксплуатацияДеструктивные культы: вербовка и эксплуатация
Деструктивные культы: вербовка и эксплуатация
 
يحكى أن
يحكى أنيحكى أن
يحكى أن
 
The agency model
The agency modelThe agency model
The agency model
 
Woven Bags
Woven BagsWoven Bags
Woven Bags
 
1st Session
1st Session1st Session
1st Session
 

Similar to Standardization of “Drug Safety” Reporting Applications-doc file

Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewIRJET Journal
 
Decoding the Role of a Data Engineer.pdf
Decoding the Role of a Data Engineer.pdfDecoding the Role of a Data Engineer.pdf
Decoding the Role of a Data Engineer.pdfDatavalley.ai
 
Advancing life sciences with IBM reference architecture for genomics
Advancing life sciences with IBM reference architecture for genomicsAdvancing life sciences with IBM reference architecture for genomics
Advancing life sciences with IBM reference architecture for genomicsPatrick Berghaeger
 
Project 1Write 400 words that respond to the following questio.docx
Project 1Write 400 words that respond to the following questio.docxProject 1Write 400 words that respond to the following questio.docx
Project 1Write 400 words that respond to the following questio.docxbriancrawford30935
 
Fundamentals of DBMS
Fundamentals of DBMSFundamentals of DBMS
Fundamentals of DBMSAhmed478619
 
DOCUMENT SELECTION USING MAPREDUCE
DOCUMENT SELECTION USING MAPREDUCEDOCUMENT SELECTION USING MAPREDUCE
DOCUMENT SELECTION USING MAPREDUCEijsptm
 
Memory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective ViewMemory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective Viewijtsrd
 
Custom Software Development Checklist by Michael Cordova
Custom Software Development Checklist by Michael CordovaCustom Software Development Checklist by Michael Cordova
Custom Software Development Checklist by Michael Cordovahoolikar77
 
Office automation system report
Office automation system reportOffice automation system report
Office automation system reportAmit Kulkarni
 
Office automation system report
Office automation system reportOffice automation system report
Office automation system reportAmit Kulkarni
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformSanjay Padhi, Ph.D
 
25 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 202225 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 2022Kavika Roy
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Prof.Balakrishnan S
 

Similar to Standardization of “Drug Safety” Reporting Applications-doc file (20)

Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
 
BIG DATA and USE CASES
BIG DATA and USE CASESBIG DATA and USE CASES
BIG DATA and USE CASES
 
Decoding the Role of a Data Engineer.pdf
Decoding the Role of a Data Engineer.pdfDecoding the Role of a Data Engineer.pdf
Decoding the Role of a Data Engineer.pdf
 
U - 2 Emerging.pptx
U - 2 Emerging.pptxU - 2 Emerging.pptx
U - 2 Emerging.pptx
 
Advancing life sciences with IBM reference architecture for genomics
Advancing life sciences with IBM reference architecture for genomicsAdvancing life sciences with IBM reference architecture for genomics
Advancing life sciences with IBM reference architecture for genomics
 
[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh
[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh
[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh
 
A CRUD Matrix
A CRUD MatrixA CRUD Matrix
A CRUD Matrix
 
Project 1Write 400 words that respond to the following questio.docx
Project 1Write 400 words that respond to the following questio.docxProject 1Write 400 words that respond to the following questio.docx
Project 1Write 400 words that respond to the following questio.docx
 
Fundamentals of DBMS
Fundamentals of DBMSFundamentals of DBMS
Fundamentals of DBMS
 
DOCUMENT SELECTION USING MAPREDUCE
DOCUMENT SELECTION USING MAPREDUCEDOCUMENT SELECTION USING MAPREDUCE
DOCUMENT SELECTION USING MAPREDUCE
 
Memory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective ViewMemory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective View
 
Custom Software Development Checklist by Michael Cordova
Custom Software Development Checklist by Michael CordovaCustom Software Development Checklist by Michael Cordova
Custom Software Development Checklist by Michael Cordova
 
Office automation system report
Office automation system reportOffice automation system report
Office automation system report
 
Office automation system report
Office automation system reportOffice automation system report
Office automation system report
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
2007 REVISED-ACGME-Poster
2007 REVISED-ACGME-Poster2007 REVISED-ACGME-Poster
2007 REVISED-ACGME-Poster
 
Big Data
Big DataBig Data
Big Data
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
25 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 202225 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 2022
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 

Standardization of “Drug Safety” Reporting Applications-doc file

  • 1. The New York State University at Brockport Department of Computational Science Standardization of “Drug Safety” Reporting Applications Halley M. Zand Winter, 2005 Thesis advisor: Dr. Robert Tuzun 1
  • 2. Abstract: The purpose of this thesis is the development of an application process for preparing reports on drug safety. The FDA is responsible for protecting the public health by assuring the safety and security of human and veterinary drugs. Annually, companies who provide medications are required to generate reports that assure the FDA of the drug’s safety. This thesis proposes an Information Technology infrastructure model that provides drug providers IT organization with a strategic perspective on how to computerize their Drug Safety reporting activity. It introduces software development concepts, methods, techniques and tools for collecting data from multiple platforms and generates reports from them by scripting queries. Introduction: According to Guidance Documents for Drug Evaluations and Research from the U.S. Food and Drug Administration all prescription drugs, both new and generic need to be approved by the FDA. To obtain these approvals, drug providers are required to generate annual reports on product safety and attach them to their application letter. Also, any person can report to the FDA a reaction or problem with a drug. The FDA reviews applications and all reported clinical outcomes to see if the reported events happened because of other reasons or use the suspected drug. Manually reporting is not practical because of the large volume of data, and the differing platforms and formats in which they are stored. Unfortunately, tools and standards are often poorly used due to lack of Database Application Modeling, Programming and Software Engineering skills. User applications are often cobbled together with little more efficiency than manual processing, and tools for automation and large scale data processing are not utilized. 2
  • 3. The hiring of qualified staff and carefully selecting software increases the quality and reduces costs. A two-hour job may take a week due to poor technical skills, and the cost of software licensing may increase by as much as 5,000 USD from 10,000 USD because of the lack of attention paid to the productivity of the software tool. A standardized IT infrastructure provides higher computational quality at lower cost. In addition, professional developers with computational science backgrounds are the only group that has the sufficient computational knowledge and bookkeeping skills for software application design and the ability to apply technical concepts. Merging Computational Science and Drug Development Science for Drug Safety Evaluation can be evolved within a modern computer environment; and because Computational Technology grows quickly, designers would need an advanced vision for the future. A strong knowledge in computational science and bookkeeping helps developers use what is available and progress forward from it. This thesis explains a modern computational architecture for implementing Drug Safety Reporting Applications. This architecture uses advanced IT concepts to increase the quality of work on a large volume of data that may be dynamic rather than static and comes from distributed computer networks. This thesis aids in the study of Drug Safety in obtaining the best software solution advantages possible. Objectives: SAS is the software application that developers use to provide high-quality reporting applications for Drug Safety. The collection of concepts that work together is required in order to achieve a computer-based method for Drug Safety evaluation. This paper proposes an infrastructure that uses the optimal solutions for this process. The abstract is intended to use the information 3
  • 4. gathered to develop the system as a whole. It can accept data from both papers and electronic databases. Databases such as Oracle and Microsoft Access can be considered as backbones of the system. All computational terminologies that are recommended for this proposed infrastructure must be explained. For example; in some cases, data mining might be used to find a pattern and help to estimate descriptions of a data field. This ability of the proposed architecture in data mining should be illustrated. In this thesis, entity relational database modeling as well as data accessing, formatting, classification, and scripting is illustrated best by giving examples and working on creating descriptions of longitudinal data. Focusing on code consistency with all essential attributes and their effeciencies in the proposed infrastructure is included. Proposed software should support maintainablity; but focusing techniques on the data error concept is not within the scope of this paper. In order to achieve the best result, we need to use all available pieces of accurate data and perform the correct programming processing. These data can come from health care providers, consumers, literature, and other relevant databases. It is important to find the ordinary errors during scripting. Due to a missing part or step in coding for data processing (extracting and retrieving, manipulating data, or making narrative data from queries and assessing them) a large difference on the expected result and the accuracy of the reports may occur. Technical Specifications:  Data accessing: SAS data might come from other application platforms. These data might be formatted or non-formatted and therefore filed differently in varied environments. Accessing these data from several servers is done in the following steps. 4
  • 5. I. Use the SAS ODBC driver to access by communicating with either local or remote SAS servers using TCP/IP protocol. Data can come from a local, remote, or any type of database server. It can be in any format including raw data or any vendor’s software data set. The ability to read raw data in any format, from any kind of file (including variable-length records, binary files, and free- formatted data--even files with messy or missing data) is required. II. Combine and manipulate these data on the client side, analyze the out-coming data and distribute it by making an execute file from the server to multiple client. The following are examples of possible case in data accessing: a. Data may exist on a mainframe computer or pc network. These data might join to an existing data set, create new variables (columns), and produce tables and interactive graphs. b. Raw data may exist on a UNIX server. Compute other data values from them, form statistics, and create an HTML report to use in web application systems, then store on a web server in intranet /internet platform. c. Access may be needed to BMDP, SPSS, and OSIRIS files directly as well as files such as Microsoft Excel spreadsheet, Microsoft Access table, dBase, ORACLE forms and any other DBMS. In addition, both relational and non-relational databases, including any PC data source can be considered as a data file. d. The relational databases in DB2 format exists in OS/390, VM, DB2, UNIX or PC environment. e. ODBC, Informix, ORACLE and OLE DB data may come from any platform. They may also come from SYBAS 5
  • 6. machine or Teradata, MSSQL Server or any other machine. f. Baan or PeopleSoft files may come from ERP systems such as R/3 and SAP BW. Thus global data may be received and processed for creating an enterprise report.  Data Management: After accessing data, it is necessary to manage them, by creating, retrieving, and updating database information. This may require advanced programming skills because the information comes from a wide range of data sources and it is necessary to merge them together and then evaluate. Data with the same attributes need generic formatting that requires a manipulation process. Evaluating values of data requires computational operations that may be defined as functions. Saved sets of data in the data forms may have been extracted from subsets data. Complex conditional processes during data manipulation may be needed when a wide range of data source is merged.  After gathering and shaping information we need Statistical Analyzing to produce reports. These reports are customized and they may be complex. Tables, frequency counts, and cross-tabulation tables may be produced to create a variety of charts and plots. Also, the computation of a variety of descriptive statistics including linear regression analysis, standard deviation, correlations and other measures of association, as well as multi-way cross- tabulations and inferential statistics may be necessary.  These representations should be able to be reported to a wide variety of locations and platforms in order to suit client needs. Results may be required to be presented in many formats, such as an array of markup languages including HTML4 and XML, or formatted for a high-resolution printer such 6
  • 7. PostScript/ PDF/ PCL files, RTF or even color graphs that can be made interactive using ActiveX controls or Java applets. System architecture modeling Reporting data by investigators. Clinical Trials Hospital Labs. Data Dictionary Clinical Studies Modification Archive Data, (MedDRA/PubMed) (Oracle) Verifications Post marketing DATA Ware House Adverse Event Reporting Individual clinical Data User trials Analyses (SAS) Information must be gathered by drug providers. These data come from clinical studies by the FDA and other professional investigators. Other information comes from medical records of patients who were treated by the specific drug. Usually, drug providers do a study of their product before moving onto the evaluation step. The first step is the collecting of data to generate reports such as the country of origin for patients receiving the drug, worldwide patient exposure, demographic characteristics, most commonly reported body-system reactions (ordered by gender and/or age of patients), and the summary of death or other 7
  • 8. critical body reactions. Another resource is the company’s surveys on products completed by patients or clients who are volunteers in the U.S. or other countries. These surveys include match data from Med Watch Forms that the U.S. Department of Health and Human Services accepts as a voluntary reporting of adverse events and product problems. Also, these manufacturing companies may be able to receive FDA Reports generated on the basis of Med Watch Reports about this product. Furthermore, many of the surveys are answered by physicians and other doctors who have the EMR System and are able to answer detailed questions regarding medical conditions and other related medical issues. Any tool that is recommended here should be consistent with FDA Standards and the objectives that follow.  In any Adverse Event Reporting System, the Basic Calculation and Data Analysis have statistical bases on data sets that may frequently be ordered according to one or more variables coming from a variety of data sources. Thus an Adverse Events Reporting System can work on any possible platform. For example, if it uses E2B data element structures then it should be able of doing any possible interactive query or data flow transactions on shared data. SAS is compatible with all computer platforms. It works on any type of operating system. It supports data sharing concepts. It suports submission through the WEB or any other network that includes Oracle, Unix, NT servers or Mainframe machines. This means that any regardless of backbone, SAS can suport it.  Data sources may need to be summarized or checked before being reported. Scripting and programing concepts are one of the major necesities in development. SAS has a powerful scripting language that can do any required summarizing, verification and validation. 8
  • 9.  In the pharmaceutical field and bio-informatics, SAS software is generally thought of for statistical analysis programming but is also a largely untapped resource for its other many features. It’s screen building and object oriented development abilities are needed to keep up with the latest Information and Technology advances.  SAS is a stand-alone system produced by SAS, Inc. and sold in the open market. It exceeds all technical objectives specified here. The FDA has proposed MedDRA as a standardized dictionary of medical terminology. MedDRA has been used internationally to discuss the regulation of medical products. MedDRA provides symptoms, signs, diseases, and diagnoses information. It also includes other information such as:  Names of investigations (e.g. liver function analyses, metabolism tests)  Sites (e.g. application site reactions, implant site reactions and injection site reactions)  Therapeutic indications  Surgical and medical procedures  Social and family history terms SAS and MedDRA are FDA standards. They have high standard designing; and assure that company builders continue looking to find weaknesses and improve their products. All their documentation and userinterfaces are user friendly. SAS and MedDRA are generic softwares and any specific needs such as security of data or reliability of operations can be negotiated in a service level agreement. EMR Database: These data come from hospital laboratories and clinical data entry systems. They are documented before and after verification. All documentations are electronic and all reporting submitted electronically. MeDRA 9
  • 10. does encoding that is part of clinical data entry. All data entries are standard based approved by the FDA. Terminologies: A computerized Drug Safety Evaluations requires the following informatics terminologies:  Data classifications  Control Code  Formatting  Quality Control  Data Mining  Gathering information  Accessing and manipulating data  Scripting Each of these terminologies carries a process or methodology that will be discussed in the following. Data Classification: Any Structured Analysis of information needs classification. Data Classification is the first best-known task in data flow modeling. The data model of a Drug Adverse Event Reporting System is derived from conceptual information such as entities and their interrelationships. A mechanism serves as a store of all drug information which can link analysis, design, implementation and evolutions applied in most medical applications. This classification should be consistent and not clash. It is integrated in all parts that require maintainability. The outcome attributed to adverse events is the most important information that needs to be classified. The data classification for this attribute should be a standard classification that is matched by the FDA reporting program. 10
  • 11. The FDA uses MedDRA as a part of the proposed rule for post-marketing reporting. MedDRA is the abbreviation for Medical Dictionary for Regulatory Activities and it is an international terminology designed to support the classification, retrieval, presentation, and communication of medical information throughout the medical product regulatory cycle. Originally, MedDRA was written in English and distributed in ASCII file format; but it is now available in several other languages such as Dutch, French, German, Italian, Portuguese, Spanish, and Japanese. This on-line dictionary is intended to become the global medical terminology standard for use by every bio-pharmaceutical company in the world and has the best-known classification with an integrated platform in updating that can be used by all standard systems. In the majority of homegrown medical applications, the patient medical recording systems use this classification and it is valid for all phases of drugs and subscribing Pharmaceutical companies. MedDRA works as a catalog of medical disorders. It has a hierarchical data structure that has five terms. Developing queries or retrieving information about medical diagnoses need hierarchical searching on these terms, and other queries might be selected by grouping them thusly. The next page picture shows the SOC view of Cardiac and Vascular investigations (excl enzyme test): 11
  • 12. MedDRA classifications have an Object Oriented data structure as shown in the following screens. 12
  • 13. 13
  • 14. Each MedDRA has a unique code that can be use as a searching key. 14
  • 15. A query makes a link between collected data and terms in MeDRA. A query can create a selection on a description of medical data. This selection requires searching and enters the term to be sought into the 'Search for Value' field. The query then selects one of the records returned and identifies information about patients. After that, codes in the database are ready for any statistical evaluation. The other advantages of using MedDRA are:  MedDRA is on-line (not requiring installation or periodic updates on the client system). The application has a standardized interface, is well supported, and requires little effort to interface with any client computing environment. A good designer can get the best advantage of this classified information by using it as a shared data set. Updating this shared information maintains all the related outcomes that have referenced this data set.  Informatics terminologies such as encoding are already included in MedDRA for its own data sets.  MedDRA includes high standards that can be updated with queries or importing data; however, it requires quality control because it can disrupt everything.  Current MedDRA Version has MediMiner for the managing and analysis of the coded data included all data mining. This unique tool allows analysis of the impact of recoding the data sets from one MedDRA version to another when MedDRA is a standalone product that has been used as an integral component of our range of coding tools. MedDRA classification can be browsed by a tree that can be collapsed and viewed at every level of detail for all occurrences in every possible search category such as legend, terms and coding. 15
  • 16. Control Code: SAS and MedDRA both have code controlling utility to do the following:  Debugging system and maintenance ability in any branch of code to make a cross-reference listing showing all the program names that have been declared and used.  The analyzer discovers un-initialized variables, unreachable codes, uncalled functions and procedures as well as the number of times executed for each statement. MedDRA has MedMiner as its version control utility. During any updating in MedDRA MedDRA 3.1, MediMiner controls all changes by analyzing the coding sets. In MedDRA 4.1, it also impacts the recoding of data by identifying all codes that remain unchanged, and identifying those codes that may require recoding. It is also possible to identify the codes that no longer exist, those that have been changed in some way, and those that have a related change or where a multiracial (inherited from multiphase of original codes) change has had an impact. Primary and secondary changes are identified as well as changes in the current status of the code. SAS software includes Source Control Manager (SCM) utility as one of the options in Desktop selection of Solution menu. SAS->Solutions->Desktop->Development and Programming-> Source Code Manager 16
  • 17. SCM includes a friendly GUI that has SAS file check-in/check-out capabilities. This GUI lists all libraries, data sets, catalogs, and catalog entries in a hierarchical order. SCM has flexible testing, revision control, and version labeling with an easy application distribution utility. By having a version label, it is easy to create a copy of an application and place it in other locations on the network. Also, SAS/CONNECT utility can place the application on other remote machines. Formatting: Usability of information is one the most important components of any application implementation. Usability requires readability and the readability of any data set is facilitated by standardized formatting. Each line represents many separate 17
  • 18. pieces of information which are data values, and the formats determine how these values are displayed or used in calculations. These formats set the width of displayed values, the number of decimal points displayed, the handling of blanks, zeroes, and commas, as well as other details. SAS supports its own standard and user defined formatting. Standard formats might be use for numeric, character or picture data. Also, User can write or define custom-made formats in Data and Procedure steps. User defined formats are reusable and can be saved in format catalogs. If saved in a SAS Catalog they then remain there permanently. If saved in catalog WORK.FORMATS, they are there temporarily and retrievable only in the same SAS session or job in which they were created. Because catalogs are a type of SAS files that reside in a SAS data library, they work as an executable handling facility and intercept run- time error under undefined format. By this way, type-checking is supported and influences the readability of information. If the SAS system option NOFMTERR is in effect, SAS uses its own default formatting when it calls an undefined format so that in some cases we might ignore these errors and continue the executing. Quality Control: Delivering the correct result requires quality control. SAS recognizes common errors such as syntax, execution-time, data and semantic errors; however, users can check for common mistakes such as the following:  Check for syntax errors o statements ending with a semicolon o starting and ending quotation marks o keywords o Every DO and SELECT statement must be followed by an END statement  Check for execution errors: 18
  • 19. o illegal mathematical operations o observations out of order for BY-group processing o Incorrect reference in an INFILE statement such as misspelling or otherwise incorrectly stating the external files are recognized. o A program may run, yet give an incorrect result. These errors are often detectable by checking self-consistency and should always be reported, certainly in the debugging stage, and often during production runs.  SAS usually executes the statements in a DATA step one by one, in the order they appear. After executing the DATA step, SAS moves to the next step and continues in the same fashion. It must be certain that all the SAS statements appear in order so that SAS can execute them properly.  Check input statements and data. SAS can detect data errors during the execution; but this won’t terminate the processing. After executing, it prints a note describing the error. In that note SAS lists the related values that are stored in the input buffer and the program data vector. o The corresponding values with actual variable values in INPUT statements must be checked. o Any corresponding arrangement such as formats, lists and columns for input statements must be checked too. Data mining: Data mining is a class of database applications that look for hidden patterns in a group of data. Statistical analysis is the data analyzing method that is matched with the nature of data mining. Statistical analysis might uncover the hidden pattern of data for a large volume of information coming from Adverse Events Reports or survey systems. A data mining process might combine variables that occur more than expected. By applying statistical options, an optimal guess can be made about the best match behavior that may have occurred frequently. 19
  • 20. Data mining is a critical aspect of these reporting systems. Occasionally, the predictions may be even more important than detections in drug safety evaluation. In the United States, patients can file lawsuits against drug providers for severe adverse reactions. These legal actions often make American drug companies fearful to introduce drugs into the U.S. market. However, data mining on data from other parts of the world offers a way to move the drug safety process from a reactive process to a proactive posture in efforts. In effect, it would help drug providers to take a safer marketing strategy rather than take risks. Data mining on data from other parts of the world is also a way to move drug safety evaluation from detection to prediction If MedDRA System Organ Class terms are adopted as a class of events then one can select related data from patient records for that event and make it possible to discover statistical rules or patterns automatically from the data, later creating a hypothesis and runing tests on the patient record database to verify or refute it. Data mining can protect drug providers against lawsuit. This process uses data from other countries and clinical studies. SAS assists data analyzing in an instructional way, so that even people with no statistical knowledge are able to run the required processes on selected data sources (a basic option includes: counting missing and non-missing values, minimum, maximum, range, sum, mean, variance, standard deviation, standard error of the mean, coefficient of variance, skewness, kurtosis ). In addition, access to data sources can be secured to prevent unauthorized access. SAS also allows for the creating of different reports and presentations on results (including tabular tables, frequently reports with graphical presentations to visualize the results). SAS supports data mining for a large volume of statistical procedures (regression, association discovery, time series, and time series cross-Sectional 20
  • 21. (panel) data analysis), whereas, data is usually analyzed by regression (one observation for each patient). Sometimes it is required to correlate with cross- sectional data such as geographic region, gender, smoking, alcohol use, and so on. Gathering information and documenting system specifications: The available information (such as the toxicological and pharmacokinetic profiles of the individual drug, the treatment indication or indications, the intended populations, etc.) might have been defined by relational databases. The backbone of this system might be SQL, Access or even Excel; but the data query may not be suited to the performance of detailed statistical analyses of data in this stage. It is then that SAS helps in statistical analysis. SAS has been interfaced with databases to allow large volumes of data to be retrieved efficiently for analysis. All engines can be assigned to a SAS library. This library is a place that saves all access to the stored files. These files might come from a variety of engines such as ODBC, SPSS, SYMBAS, REMOTE, META, MYSQL, ACEESS, ORACLE, DB2, MySQL, ACCESS, etc. For the processing of data, it is required to define all connections that might be created between the different sets of data records. The first link can illustrate correspondence of the MedDRA classifications to the patient records. In concentrating on the relevance of available data, medical information of patient works in tandem with MedDRA classifications to build queries and analysis information. As a part of application developing process, specifying the following information is required: 1. Source data: Miscellaneous data sources may exist and in order to get the correct results, the prescription drug information provided by drug firms should be truthful, balanced, and accurately communicated. The same applies to data coming from clinical and post-marketing trials, or spontaneous reports (submitted individually by doctors or patients). Dynamic data are 21
  • 22. operational data from internal systems such as the homegrown applications of clinics or hospitals, the manual data coming from paper chart patient history, EMR (Electronic Medical Records), and Adverse Event Reporting (Med Watch). 2. Data Staging: This area includes the storage and processing for extracted data from the internal and external systems prior to loading in a SAS data bank. The following is a list of cases. • Information may be located in multiple SQL tables in a local computer or external servers. If it is required, one may make a connection to the database server and use the data dynamically. For example the Adverse Events Database has included side effects which are serious (such as death or risk of dying, hospitalization, disabilities, congenital anomaly or required intervention to prevent permanent impairment or damage). These data are required for generating some particular reports. • Part of the information is part of Aventis Reports or ClinTrace. Data from these two areas might work together to complete an assignment then create an executable program that makes a connection to the backbone database of these two licensed vendor applications and use the data. Note: Having a basic knowledge about these databases helps programmers to create standard codes. For example an Aventis or ClinTrace Case ID (Manufacturer Control #) is assigned on an “Episode” basis for each patient. Adverse Events (reporting side effects) are temporarily linked to the same episode and are entered in the same Case ID. For drugs that are given intermittently, additional episodes (Case ID) are created for events that occur after different treatment cycles. • Side effects are stored in Companies Core Safety Data Sheets. These sheets are for global labeling of reports and are based on the diagnoses which are in turn assessed by seriousness. All diagnoses reported from intensified monitoring (such as clinical trial or post-marketing surveillance study) are assessed as associated or not-associated with 22
  • 23. the study medications. These data may be joined to MedDRA information to build a larger directory that is used in SQL scripts. • Drug providers use certain information, such as the cause of side effects as a result of internal or natural body process, in a causality algorithm for internal clinical interpretation or signal evaluation purposes. In some particular cases, this algorithm is required to be applied as a part of script logic in the SAS code. If a company has a computerized analyzing application, depending on their software, it is possible to execute a connection for using this application inside the SAS script code. • In data mining related by diagnoses, MedDRA information is required. It is recommended to use SAS scripting for creating a remote connection to read MedDRA ASCII file, importing data to the temporary created tables. These tables would be deleted at the end of scripting process. Note: All transactions such as queries, statistical analyses or visualizations coming from sources should be consistent. Sometimes these data are not enough to be consistent. In order to solve this problem, all “no match” data need appropriate transformations or conversion from their original form to the MedDRA representation. 3. Metadata: A term used to describe or specify the data. It is used to define all of the characteristics of data required to build databases and applications, and to support knowledge workers and information producers. This includes data element name, meaning, format, domain values, business integrity rules, relationships, owner, etc. For example the following classification shows the analogy of data concepts in MedDRA: 1. SOC MedDRA CODE Numeric MedDRA Term String 23
  • 24. 2. HLGT MedDRA CODE Numeric MedDRA Term String 3. PT MedDRA CODE Numeric MedDRA Term String COSTART Symbol, AlphaNumeric WHO_ART Code, Numeric ICDS Code, Numeric PT ICD-10 Code Numeric HARTS Code, Numeric ICDS_CM Code, Numeric JART Code Numeric * SOC Code Numeric * SOC Name Numeric 4. LLT – Lowest Level Term MedDRA Code Numeric MedDRA Term String WHO_ART Code Numeric COSTART Symbol AlphaNumeric ICDS_CM Code Numeric CURRENCY Character/Boolean HARTS Code Numeric ICDS Code Numeric JART Code Numeric * Multi valued attribute Defining Metdata for the adverse event reporting data is also required. These data are: o Patient Identifier and patient information: age at time of event or date of birth, sex, weight, etc. o Outcomes attributed to adverse events such as death, life- threatening occurrences, hospitalization, initial or prolonged, disability, congenital anomaly, required intervention to prevent impairment/damage, other. o Date of event and report in mo/day/yr format. o Description of problem. o Relevant tests/laboratory data including dates. 24
  • 25. o Other relevant history including preexisting medical condition (e.g. allergies, race, pregnancy, smoking or alcohol use, hepatic/renal dysfunction, etc.) Still most popular medical clinics use Paper Medical Records (PMRs) but many others have begun to use Electronic Medical Records (EMRs). No standard form has been yet defined for EMRs, but all provide the same information that requires Metadata definitions. These data are: o Patient primary reason for medical visit o History of onset of clinical signs and symptoms, o Current list of medications the patient is using o Relevant past medical history, including hospital admission, surgeries, and diagnosis o History of family disease, such as diabetes, cancer, heart disease, and medical illness o Social history: use of drugs, smoking, job stability, and housing, living condition, incarceration. o Review of systems: patient relocation of systems and current medical problems, such as trouble sleeping at night, panic episodes, and results of tests. o Physical examination: the clinician’s hands-on examination of patient, including head, eyes, ears, nose, throat, chest, and extremities o Labs includes blood glucose, cholesterol, and drug levels o Studies such as X-ray, MRI, CT, and EKG. o Progress notes such as record of temporal progression of signs and symptoms, labs and studies for the length of the study or admission 4. The entity-relationship model 25
  • 26. The specification of required information for an adverse event serves as a starting point for constructing a conceptual schema (overall design of the database) for the suggested database. The identity set and attributes targeted here are drug and patient entity sets. These entity sets have a relationship that has attributes by itself. This relationship is a “many to many” relationship. Other relationships might be designed between subsets of an entity set. The relationship between drug entity sets and ingredient or side effect entity sets are examples of these relationships. Here, these relationships are “many to one” relationship. This method of designation helps in saving memory. In some other cases such as patient-drug relationship, the maximum participants are limited to two relations, which leave a designation in one general set. In the following diagram, small rectangles show the entity set; large rectangles specify attributes; diamonds represent relationship sets; lines link attributes to entity sets and entity sets to relationship sets; arrows indicate that an entity falls exclusively into another entity; double lines indicate many relationship sets; bold diamonds show “many to one” relationship sets, and rectangles with non-indexed information indicate information about a relationship set. 26
  • 27. ID Reason date_of_event date_of_report therapy_start_date therapy_end_date diagnose information Lot_number Exp_Date 1. MedDRACode NDC Num adverse_desc route and dosage 1. ID 2. Name 3. Value 4. Unit Adverse reactions and side effects Patient -Drugs Drug- ingredie 1. ID relevant information : 1. id 2. First Name 10. allergies 2. generic name 3. Middle Name 11. smoking 3. trade name 4. Last Name 12. alcohol 4. the dosage range 5. Date Of Birth 13. pregnancies 6. Sex 14. dysfunction 5. metric unit 7. Weight 15. Lab results 6. category 8. race information 7. the form of 9. country Above E_R model is a sample of what can be considered; although the attributes can be designed with more details in mind. For example, ‘rout and dosage’ could be designed as a separate entity because it includes many optional attributes that may be concatenated together as a description data text. They 27 Occur
  • 28. may also be saved seperatly in a data source. This designed E_R model gives substantial flexibility in the designing of the basic data base schema. Accessing and Manipulating Data: The first step in accessing and manipulating data is the DATA Step. The DATA Step is for accessing, reading and programming the data processing. As explained before, one of the strengths of SAS is the fast and easy access from many different sources. In addition to the programming components, SAS has many other features in the DATA Step Process that help to develop a standard application. SAS language has all the statements required for accomplishing typical data processing. Among these are the reading and adding of raw data files and SAS data sets and writing the results. Sub-setting data, combining multiple SAS files, creating SAS variables, recoding data values; and creating listing and summary reports that include advanced analyzing features such as web analytical solutions are also possible. Special focus should be placed on the management of SAS data set input and output, working with different data types, and the manipulation of data. It may also be necessary to control the SAS data set input and output, combine, summarize, and then process iteratively with programming to perform data manipulations and transformations Accessing data would be first needed here. Sometimes, the required data file will be saved in another server and location. With an ftp server running, SAS can make an ftp connection and use the external data source remotely without there remaining any copy of the downloaded data on the machine unless SAS writes it out. As an example, one can assume the data belongs to cps-users and is located at ~/halley/thesis/main.data. filename fromrcr ftp 'main.data' cd='halley/thesis' user='cps-user' host='cps.brockport.edu' 28 recfm=v prompt;
  • 29. Many data might come as raw data. This raw data must be entered into a SAS data set. As an example, one of the clients might send a letter or a txt file that includes parts of the patient’s information. The following script shows how to input these data into a SAS data set. data PatientInfo; infile 'c:thesisdata1.txt' ; input PatientId $ 1-13 age 14-17 sex $ 18-23 weight 24-30 +2 country run; proc print data=PatientInfo; run; The SAS System 05:25 Thursday, December 15, 2005 5 PatientId age sex weight country Hzan0616341 30 1 200 11 Amir5666892 40 2 180 12 J675bhgfdql 56 2 . 45 -> Nmjhg567908 12 1 100 23 Iu6-567-567 99 1 170 01 ***A missing value for a numeric variable is presented by a period (.) Processing Examples: • To use external files, it is required to tell SAS where to find them. To do this, there are the following choices: 29
  • 30. 1- Identify the file directly in the INFILE, FILE, or other SAS statement that uses the file. 2- Set up a fileref for the file by using the FILENAME statement, and then use the fileref in the INFILE, FILE, or other SAS statement. 3- Use operating environment commands to set up a fileref, and then use the fileref in the INFILE, FILE, or other SAS statement. Note: To use several files or members from the same directory, partitioned data sets (PDS), or MACLIB, use the FILENAME statement to create a fileref that will identify the name. The fileref can then be used in the INFILE statement and enclose the name of the file, PDS member, or MACLIB member in parentheses immediately after the fileref, as shown in the example below: /* filename data 'directory-or-PDS-or-MACLIB' */; /* data1.txt and data2.txt located in directory c:thesis */ filename data 'c:thesis'; data paitientdata1; infile data('data1.txt'); input PatientId $11. +2 age 2. +1 weight 3. +2 sex 1. +2 date1 mmddyy10. +2 date2 mmddyy10. + 2 country 12. ; run; data paitientdata2; infile data('data2.txt'); input PatientId $11. +2 age 2. +1 weight 3. +2 sex 1. +2 date1 mmddyy10. +2 date2 mmddyy10. + 2 country 12. ; run; • Also, from file menu, ADX can import data from a SAS data set or any of ACCESS data base, Excel spreadsheet, a dBase database, a delimited text file, and files with other common formats. This is helpful when one has saved information in a variety of formats. • In SAS one can gain access to data sources by defining ’libref’ and assigning accesses to them without copying them inside the SAS 30
  • 31. environment. ‘libref’ makes a shortcut to the metadata on the SAS Metadata Server. Any metadata in the SAS metadata server can be read by a Meta. Meta is an engine that has options for controlling the outputs. Meta creates just the metadata in the repository and does not affect the data sources. If the table does not exist in the data source, the Meta engine creates the metadata based on the information specified in the application for the output table. When deleting a table, this option deletes the metadata from the repository but does not delete the table from the data source. Also, when deleting a table, this option deletes the table from the data source but does not delete the metadata from the repository. SAS Library includes Metadata objects that are defined by ‘libref.’ These objects define the engines that are used to process the data. This library has URI (Uniform Resource Identifier) architecture. To get access to a SAS Metadata Server, define the host address. If working in a TCP network, define the port number. If the protocol is not a com but a bridge, define a user-id and password otherwise it will not be possible to log into a SAS Metadata Server. In addition, any repository Metadata may be used by a repository-id or name. To access these tables, one can use SAS/Warehouse Administrator as a tool. In order to determine the metadata, it needs to identify and search the objects by their name, URL and other identifiers such as their ID. The following script displays this process. Ibname upcase metan liburi="SASLibrary?@name='oralib' " ipaddr=d6292.us.GCS.com Scripting: 31
  • 32. SQL Scripting Goal is the driving of available data from any possible data source. Most vendor applications have SQL backbone so that with SQL scripting it is possible to perform queries on original or manipulated data (retrieving data from multiple tables; creating views, indexes, and tables; and updating or deleting values in existing tables and views as well as summarizing them). SQL scripting can happen in SAS or SQL environment. In the following example, the reduction of the earlier E_R schema ids is created from inside the SQL environment: /*------------------------------------------------------------------------------------------*/ /* create a higher-level entity set for drug information */ CREATE TABLE drug( id CHAR(12) NOT NULL, generic_name CHAR(25), trade_name CHAR(25), dosage INT, unit INT, category INT, FOREIGN KEY (category) REFERENCES drug_category(category_id) ON DELETE CASCADE, FOREIGN KEY (unit) REFERENCES unit(unit_id) ON DELETE CASCADE, PRIMARY KEY (id) ) ENGINE=INNODB; /* create the lower level entity sets for drug information */ CREATE TABLE ingredient ( id INT, drug_id CHAR(12), ingredient_name CHAR(25), ingredient_value INT, unit INT, INDEX drug_ind (drug_id), FOREIGN KEY (drug_id) REFERENCES drug(id) ON DELETE CASCADE, FOREIGN KEY (unit) REFERENCES unit(unit_id) ON DELETE CASCADE, ) ENGINE=INNODB; /* the side effects of each drug have description that should be compatible with MedDRAClassification */ CREATE TABLE sideeffects ( 32
  • 33. MedDRACode INT, drug_id CHAR(12), INDEX drug_ind (drug_id), FOREIGN KEY (drug_id) REFERENCES drug(id) ON DELETE CASCADE ) ENGINE=INNODB; /* create a general entity set for patient information; This entity set can be expanded by other entity sub sets such as patient laboratory information or more information about the history of that patient */ CREATE TABLE paitient( id CHAR(12) NOT NULL, first_name CHAR(25), middle_name CHAR(25), last_name CHAR(25), DateOfBirth DATE, Sex INT, weight INT, race INT, country INT, FOREIGN KEY (race) REFERENCES drug(race_id) ON DELETE CASCADE, FOREIGN KEY (country) REFERENCES drug(country_id) ON DELETE CASCADE, PRIMARY KEY (id) ) ENGINE=INNODB; /* some revalent paitient information might come from following sugested sub entity set */ CREATE TABLE Relevant_Patients_Info ( Info_id INT NOT NULL AUTO_INCREMENT, paitient_id CHAR(25) NOT NULL, allergies_id INT, races_id INT, Num_pregnancies INT, smoking INT, alcohol_use INT, hepatic_id INT, dysfunctions_id INT, INDEX (allergies_id), FOREIGN KEY (allergies_id) REFERENCES allergies(allergies_id) ON UPDATE CASCADE ON DELETE RESTRICT, INDEX (races_id), FOREIGN KEY (races_id) REFERENCES races(races_id) ON UPDATE CASCADE ON DELETE RESTRICT, INDEX (hepatic_id), FOREIGN KEY (hepatic_id) REFERENCES hepatic(hepatic_id) ON UPDATE CASCADE ON DELETE RESTRICT, INDEX (dysfunctions_id), FOREIGN KEY (dysfunctions_id) REFERENCES dysfunctions(dysfunctions_id) ON UPDATE CASCADE ON DELETE RESTRICT, INDEX (paitient_id), FOREIGN KEY (paitient_id) REFERENCES paitient(id) ON UPDATE CASCADE ON DELETE RESTRICT, 33
  • 34. PRIMARY KEY(Info_id) ) ENGINE=INNODB; /* transforming to a tabular form of this E_R model includes aggration is streightforward. Paitient-Drug relationship includes a column for each attribute in the primary key of the entity set for this relationship (any oconcomitant medical products that paitient uses and therapy dates might come from related tables in the drug id and paitient id. Also, any available adverse event information that shows the problem of using that drugshould be included.) */ CREATE TABLE Patients_drugs ( Info_id INT NOT NULL AUTO_INCREMENT, paitient_id CHAR(25) NOT NULL, drug_id CHAR(12) NOT NULL, therapy_start_date DATE, therapy_end_date DATE, MedDRACode_DiagnoseForUse INT, /* 1 == yes, 2==no, 3==doesn’t apply */ /* Event abated after use stopped or dose reduced */ Quest1 INT, /* event reappeared after reintroduction */ Quest2 INT, Lot_number INT, Exp_Date DATE, NDCno INT, reason INT NOT NULL, date_of_event DATE, date_of_report DATE, adverse_desc TEXT, ----- INDEX (paitient_id), FOREIGN KEY (paitient_id) REFERENCES paitient(id) ON UPDATE CASCADE ON DELETE RESTRICT, INDEX (drug_id), FOREIGN KEY (drug_id) REFERENCES drug(id) ON UPDATE CASCADE ON DELETE RESTRICT, PRIMARY KEY(Info_id) ) ENGINE=INNODB; SQL scripting is required to generate reports on summary statistics. Macro Language provides a facility that allows writing SQL procedure inside the SAS environment. Therefore, SQL scripting extends SAS coding to the retrieval and combination of data from tables or views. New ones can be created along with 34
  • 35. indexes, and data values in PROC SQL tables can be updated. It is also possible to update and retrieve data from Database Management System tables or modify a PROC SQL table by adding, modifying, or dropping columns. Example: Assume the Adverse Events Information from clinical studies, post- marketing trials, spontaneous reports, and miscellaneous sources (including independent drug identification numbers and retrospective data collection) are saved in the above SQL tables. The following script generates a report that shows Country of Origin for Patients receiving a drug in a post-marketing setting. proc sql; /* It extracts and manipulates grouped and ordered data from patient records to create a new temporary view table that includes only patient populations in each country. Country field is defined as an id number; to represent it by country name, it joins to the columns from countries table. After process is done, the temporary view table is dropped*/ create view temp as select country, count(country) as count, calculated Count/Subtotal as Percent format=percent8.2 from paitient, (select count(*) as Subtotal from paitient) as survey2 group by country order by count; quit; proc sql; /* extracts required data from created temporary view table and then drop it */ title1 'Country or Origin for Patients Receiving the suspected drug in a Postmarketing Setting'; select c.countryname,t.count as cc,"(", t.Percent ,")" from countries c, temp t where c.ipcode = t.country; quit; proc sql; drop view temp; quit; 35
  • 36. Country or Origin for Patients Receiving the suspected drug in a Postmarketing Setting 22:04 Monday, January 16, 2006 CountryName Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Greece 1 (0.1%) Uruguay 2 (0.2%) Taiwan 2 (0.2%) French Polynesia 2 (0.2%) Peru 2 (0.2%) Korea 2 (0.2%) South Africa 3 (0.2%) Portugal 3 (0.2%) Turkey 4 (0.3%) Hungary 4 (0.3%) Austria 4 (0.3%) New Zealand 7 (0.5%) Brazil 7 (0.5%) Norway 10 (0.8%) Israel 11 (0.8%) Chile 15 (1.1%) Netherlands 26 (2.0%) Italy 39 (3.0%) Spain 38 (2.9%) Belgium 38 (2.9%) United States 42 (3.2%) Finland 44 (3.4%) Germany 50 (3.8%) Sweden 69 (5.3%) Denmark 91 (7.0%) Canada 97 (7.4%) Australia 107 (8.2%) Great Britain 271 (20.8%) France 313 (24.0%) The patient exposure to the drug can be calculated and presented in different ways. Although available exposure data are provided for a period of time, the primary focus of a submitted report may be the number of exposures and cases that occurred in a specific period of time. In the following report, global patient exposures from 1989 to 2004 are provided: proc sql; create view temp1 as select region, count(region) as SachetSales from paitient group by region order by SachetSales; quit; 36
  • 37. proc sql; create view temp2 as select region, count(region) as Exposures from paitient, where paitient_Id in (select paitient_Id from Patients_drugs where substr(therapy_start_date,7,4) > '1983' && substr(therapy_end_date,7,4) < '2001') group by region order by Exposures; quit; proc sql; title1 'Wor ldwide Patient Exposure to the suspected drug 1989 to 1994'; select c.region,t1.SachetSales , t2.Exposures from countries c, temp1 t1, temp2 t2 where c.ipcode = t1.region and c.ipcode = t2.region ; quit; proc sql; select sum(t1.SachetSales) as SumSachetSales, sum(t2.Exposures) as SumExposures from temp1 t1, temp2 t2 quit; proc sql; drop view temp1, temp2; quit; Worldwide Patient Exposure to the suspected drug 1989 to 1994 23 20:55 Saturday, January 21, 2006 Region SachetSales Exposures ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ƒƒƒƒƒ Europe 230,649,500 1,895,749 Australia 5,292,542 43,500 Korea 3,067,300 25,211 Canada 1,497,100 12,305 Rest of World 2,405,064 19,768 SumSachet SumExposures ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ƒƒƒƒ 242,911,506 1,996,533 Inside the SQL scripting, one may occasionally work with data that are imported from the MedDRA application. These data may have already existed in a machine and it is not required to make access to the MedDRA environment a 37
  • 38. second time. One can use the SAS utility to convert data from one form to another or copy between machines. A free trial of MedDRA is available on the MSSO website. This contains a copy sample of MedDRA data which are saved in an Access data base. It could also be imported to an Excel file if needed. If the data set is standard and completed it would then be better to use it as a shared data source. This shared data source may be stored as a Relational Database System (RDBMS), an Excel spreadsheet, or even as data stored on a flat file. If it is stored in an external machine then it becomes an external data source and a SAS connection is required for access. The following SAS script retrieves MedDRA Classification from a data source. It imports data from an external file (a spreadsheet) to a SAS table. This code was generated and saved during the wizard importing process. Saving this type of script helps to prevent redoing the work when the information is needed again. PROC IMPORT OUT= WORK.MEDDRAInfo DATAFILE= "C:thesisCTCAEv3.xls" DBMS=EXCEL REPLACE; SHEET="'CTCAE v3#0 MedDRA Codes$'"; GETNAMES=YES; MIXED=NO; SCANTEXT=YES; USEDATE=YES; SCANTIME=YES; RUN; The following script works as well: Filename xclfil 'C:thesisCTCAEv3.xls’; proc import datafile=xclfil out= WORK.MEDDRAInfo dbms=excel97 replace; getnames= yes ; 38
  • 39. The above script retrieves MedDRA Classification from a data source. Often these data may not represent all MedDRA data. Usually, only a subset of these data is required and is stored in an external file. Assume MedDRAClassifications.xls includes only the MedDRA Classifications Data. To generate reports related to side effects, importing this file is enough to retrieve the appropriate symptoms information or signs listed by outcomes. PROC IMPORT DBMS=EXCEL OUT= work.MedDRA DATAFILE="c:thesisMedDRAClassifications.xls"REPLACE; Run; infile ' c:thesisMedDRAClassifications.csv' delimiter=',' dsd; proc print data=MedDRA; run; The SAS System 05:25 Thursday, December 15, 2005 1 Obs MedDRATermLevel1 MedDRATermLevel2 1 Nervous system disorders 2 Balance disorder 3 Convulsion 4 Lethargy 5 Optic neuritis 6 Paraesthesia 7 Speech disorder 8 Tunnel vision 9 Visual field defect 10 11 Eye disorders 12 Astigmatism 13 Blindness ……… ……… … . * Sometimes the information that comes from a Report Adverse Event, clinical trials or any other post-marketing or Pharmacovigilance Application has a provisional order number that is assigned to outcome data which is cannot be correctly mapped to MedDRA. These order numbers alone can be used when electronic reports or data are submitted and automatically converted to the MedDRA codes. 39
  • 40. From the parameter list created, values can be individually highlighted and chosen for processing. These required parameter values may be retrieved from tables that have been created by scripts such as following: proc sql; create table reasonlist1 ( Description char(60)); insert into reasonlist1 values('Patient Died') values('Life threatening illness') values('Required emergency room/doctor visit') values('Required hospitalization') values('Resulted in permanent disability') values('Resulted in prolongation of hospitalization') values('others'); The ordering of the above parameter values is important for selecting the rows by their Order Number and the description of these values must be the same as those found on the FDA forms. The following script creates a parameter table for the abbreviations used by Drug Safety Reporting. The ordering and description of these abbreviations is also consistent with FDA standards. proc sql; create table abbreviations ( abb char(5), Description char(60)); insert into abbreviations values( 'ADR','adverse drug reaction') values( 'AE','adverse event') values( 'AERS','Adverse Event Reporting System ') values( 'bid','twice daily') values( 'CI','confidence interval') values( 'CIOMS','Council for International Organizations of Medical Sciences') values( 'COSTAR','Coding Symbols for Thesaurus of Adverse Reaction TermsT') values( 'CSDS','Core Safety Data Sheet') values( 'CV','coefficient of variation') values( 'FDA','Food and Drug Administration') values( 'GABA','Gamma amino butyric acid') values( 'HARTS','') values( 'IBD','International Birth Date' ) values( 'ICD9-1','International Classification of Diseases, 9th and 10th 0') values( 'ICD9C','MEditions/Revisions') values( 'ICH','International Classification of Diseases, Ninth Revision, Clinical MedDRAModification') 40
  • 41. values( 'NDA','International Conference on Harmonisation ') values( 'PSUR','Medical Dictionary for Regulatory Activities') values( 'qd','New Drug Application') values( 'qid','Periodic Safety Update Report') values( 'SAE','once daily') values( 'SD','four times daily') values( 'SE','serious adverse event') values( 'US','standard deviation') values( 'WHO-AR','standard error T'); quit; Formatting may be used for other parameter values. The ATTRIB Statement permanently associates a format with a variable. SAS uses the format to write the values of the variables specified. attrib sales1-sales3 format=comma10.2; Due to the permanent association of the ATTRIB Statement in the above command, any subsequent DATA Step or PROC Step will use COMMA10.2 format to write the values of sales1, sales2, and sales3. In addition to the default formats that are supplied by Base SAS Software, one can create custom-made formats by the Format Procedure. The following format procedure is used to define the Static Parameter Values that may be required. It expresses weights; and measures using USP (United States Pharmacopeia) standard abbreviations for dosage units. Proc format; value $dosage_units ‘1’ = ‘m’ ‘2’ = ‘kg’ ‘3’ = ‘g’ ‘4’ = ‘m’ ’5’ = ‘mcg’ ‘6’ = ‘L’ ‘7’ = ‘mL’ ’8’ = ‘mEq’ ’9’ = ‘mmol’ ‘10’ = ‘ %’ run; *see legend below for definitions 41
  • 42. (1) m (lower case) = meter (2) kg = kilogram (3) g = gram (4) mg = milligram (5) mcg = microgram (do not use the Greek letter mu which has been misread as mg) (6) L (upper case) = liter (7) mL (lower/upper case) = milliliter (do not use cc which has been misread as U or the number 4) (8) mEq = milliequivalent (9) mmol = millimole It can also be used to define a format variable for the drug in question (see procedure below): proc format; value $dosage_form ‘1’ = ‘capsule’ ‘2’ = ‘cream’ ‘3’ = ‘ear drop’ ‘4’ = ‘eye drop’ ‘5’ = ‘inhaler’ ‘6’ = ‘injection’ ‘7’ = ‘oral solution’ ‘8’ = ‘solution’ ‘9’ = ‘suspension pediatric drop’ ‘10’ = ‘syrup’ ‘11’ = ‘tablet’ ‘12’ = ‘chewable tablet’ ‘13’ = ‘other’ run; Time durations, age and formats are also available: proc format; value $time_duration_form ‘1’ = ‘hour’ ‘2’ = ‘day’ 42
  • 43. ‘3’ = ‘week’ ‘4’ = ‘month’ ‘5’ = ‘year’ run; proc format; value $age_range _form ‘1’ = ‘children’ ‘2’ = ‘adult’ run; proc format value $eating-format ‘1’ = ‘with meal’ ‘2’ = ‘without meal’ ‘3’ = ‘before meal’ ‘4’ = ‘after meal’ ‘5’ = ‘with a glass of water’ ‘5’ = ‘other’ run; proc format value $time-format ‘1’ = ‘morning’ ‘2’ = ‘noon’ ‘3’ = ‘after noon’ ‘4’ = ‘evening’ ‘5’ = ‘midnight’ run; Other values are a combination of the above defined formats. For example, drug labels may read: “for adults, every morning, 2 tablets, 2 hour before meals, with a glass of water” or “for children, under 8 years of age, ½ a tablet before meals, with a glass of water….” In a database, grouping processes may be based on the “Sex/Gender” field where the values of “Male” “Female” and “unknown” can define minor groupings. These values can be stored as Numeric variables (1, 2, and 3). The ordering of numeric levels in relation to classification variables must be done with care. If in a statistical report, the data for female patients is required to appear after the data for males, the “Sex/Gender” field would use “2” for females and “1” for males. The following SAS script describes this formatting. 43
  • 44. proc format library=proclib; value $sex '1'='male’ '2'='female' '3'='unknown' picture pop low-high='000,000,000' run; Formatting has other usages in scripting. Many of the data values must be defined by format. In SAS one can use this format with any of the following: 1. PUT, PUTC, or PUTN functions 2. %SYSFUNC macro function 3. FORMAT/ATTRIB statement in a DATA step or a PROC step num=15; char=put(num,hex2.); population=1145.32; put population 10.2; result: 1,145.32 Also one can use a macro function to define a user defined function. This function applies the defined format to the result of the function outside a DATA step. %macro tst(amount); %put %sysfunc(putn(&amount,dollar10.2)); %mend tst; %tst (1154.23); Usually Patient records are the type of data that can come from an Open Database Connectivity (ODBC). It is very possible that these data have existed as a backbone of a medical client-server application. In this case, access to data via ODBC is required. The module "SAS/Access for ODBC" must be installed on the computer. Configuring the database by referring to the DNS (Data Source Name) and how it is accessed is can also be required. Even parameter values 44
  • 45. can come from an ODBS. These data may have dynamic data values that get up-dated by end-users through the web. Normally, these applications have administration parts that allow the end-user to do parameter updating. Example: The following script shows how one can use a part of data that is stored in another vendor's Database Management System (DBMS) files. This data then goes into the SAS data set. In the following script a ‘libref’ is declared and points to a library containing Oracle data. SAS reads data from an Oracle file into a SAS data set: libname dblib oracle user=halley password=halley path='hrdept_002'; data paitient.big; set dblib.paitient; run; Memory allocation is the most important concept in creating or extending a data library. SAS allows for the request of space as needed. For optimizing system performance and allocating space appropriately, one can pre-allocate the most space that that may be needed. These methods are used more often when multivolume access to SAS data libraries is required. The above data statement may then change to: /* Know this is a big data set. */ data paitient.big (alq=100000 deq=5000); As is explained earlier, data can come from an external data file. Additionally, one can connect to a data file and work on it. In the following script, we can connect to Z/OS and UNIX server to use DB2 and Oracle data: /*************************************/ /* connect to z/OS */ /*************************************/ 45
  • 46. options comamid=tcp; filename rlink '!sasrootconnectsaslinktcptso.scr'; signon os390host; /*************************************/ /* download DB2 data views using */ /* SAS/ACCESS engine */ /*************************************/ rsubmit os390host; libname db db2; proc download data=db.paitient out=db2dat; run; endrsubmit; /*************************************/ /* connect to UNIX */ /*************************************/ options remote=hrunix comamid=tcp; filename rlink '!sasrootconnectsaslinktcpunix.scr'; signon; /*************************************/ /* download Oracle data using */ /* SAS/ACCESS engine */ /*************************************/ rsubmit hrunix; libname oracle user=hzan password=halley; proc download data=oracle.paitient out=oracdat; run; endrsubmit; /*************************************/ /* sign off both links */ /*************************************/ signoff hrunix; signoff os390host cscript= '!sasrootconnectsaslinktcptso.scr'; /*************************************/ /* union data into SAS view */ /*************************************/ proc sql; create view temp_joindata as (select gender ,country, count(*) into population from db2dat group by gender,country ;) union (select gender,country, count(*) into population 46
  • 47. from oracdat group by gender,country;) union (select gender,country, count(*) into population from paitient1 group by gender,country; ) proc sql; create view jointdata select temp_joindata.gender, temp_joindata. population, countries.name from temp_joindata, countries where countries.codeId = temp_joindata.country order by gender, countries.name group by gender, countries.name options fmtsearch=(proclib); /* The NOWD option runs the REPORT procedure without the REPORT window and sends its output to the open output destination(s).*/ proc report data=jointdata nowd; column gender country population; format gender $SEX. Country & $50. Population pop; title ‘Country or Origin for Patients Receiving the drug in Post marketing’; run; Country or Origin for Patients Receiving this drug in Post marketing for 04JAN06 Gender country Population Female Algeria 743,453 Male 235,984 Unkown 167 Female Denmark 423,457,698 Male 546,876,345 Unkown 897 Female Spain 456,9812,564 Male 400,987,564 Unkown 234 Female United Kingdom 876,234,123 Male 564,234,876 Unkown Conclusions: This thesis proposes ways on how to improve programming practices for Standardizing Drug Safety Reporting Systems. The quality of a Drug Safety Reporting Application depends on the system architecture, methodologies, and 47
  • 48. modeling used by the programmer. The degree to which an implementation is standardized is in direct proportion to the correctness of methods in accessing, gathering and manipulating the data, its classifications, control code, quality control, formatting, statistical analyzing, and mining thereof. Classification terms should follow a hierarchical structure that is consistent with FDA standards and MedDRA. Using the control code with MedMinder and the SCM is also important. Both this and quality control should not be overlooked by programmers. Formatting of data must be done properly and again, consistent with FDA standards. Statistical analyzing and data mining in these types of applications must also be done correctly as it has a direct affect on the results. Ultimately, gathering data and its access should be handled dynamically and manual accessing should not be considered. Above all, details such as size of data in the data accessing stage should be carefully protected. As to the professional performing in the system, an advanced background in computational, mathematical, and programming methods is obligatory for accurately applying these terminologies. SAS programming, knowledge of Object Oriented programming data structures, data base modeling and SQL are all necessary skills for implementing a Standard Drug Safety Reporting System. Knowledge of statistical modeling is particularly desirable in data mining applications. Finally, a graduated computational science major or a professional software designer can make the application work more dynamically and accurately with good scripting skills. The workbench of Drug Safety Reporting Systems is made up of SAS, and MedDRA applications. SAS supports an advanced data accessing technology; and MedDRA classification matches the metadata required for designing this application. These existing components improve the reliability of design, and SQL scripting expands it. 48
  • 49. References  SAS Publishing, the Analyst Application, Second Edition (July 2002)  Adriaans, P., and D.Zantings.1996. Data Mining. Edinburg Gate, England: Addison Wesley Longman.  Hand, D.J. 1997. Construction and Assessment of Classification Rules. New York: John Wiley & Sons, Inc  Berry, M.J.A., and G. Linoff. 1996. Data Mining Techniques for Marketing, Sales, and Customer Support. New York: John Wiley & Sons, Inc  Bergeron, Bryan P. (2003). Prentice Hall Professional Technical Reference. Bioformatics  Computing. New Jersey: Pearson Education, Inc.  Pharmacoepidemiology and Drug Safety, Vol. 1 [1992], Vol. 2 [1993], Vol. 6 [1997]) & Vol. 7 [1998])  Agresti, A. (1996) Introduction to categorical Data Analysis, Wiley, NY  Collet, D. (1994) Modeling Survival Data in Medical researches, CRC/Chapman & Hall, London  Benichou C., (ed) Adverse Reactions: A practical Guide to Diagnosis and Management (Wiley & Sons, 1994)  Fuchi, K. (1981) “Aiming for knowledge information processing system.” Processing of international conference on fifth generation computing systems, Japan Information Processing Development center, Tokyo republished (1982) by North-Holland Publishing, Amsterdam  SAS online documents http://www.sas.com/service/library/onlinedoc  CDER (http://www.fda.gov/cder/handbook/index.htm)  MedWatch http://www.fda.gov/medwatch/getforms.htm 49
  • 50. MedDRA http://www.meddrahelp.com/ 50