Standardization of “Drug Safety” Reporting Applications-doc file
1. The New York State University at Brockport
Department of Computational Science
Standardization of “Drug Safety” Reporting Applications
Halley M. Zand
Winter, 2005
Thesis advisor: Dr. Robert Tuzun
1
2. Abstract:
The purpose of this thesis is the development of an application process for
preparing reports on drug safety. The FDA is responsible for protecting the public
health by assuring the safety and security of human and veterinary drugs.
Annually, companies who provide medications are required to generate reports
that assure the FDA of the drug’s safety.
This thesis proposes an Information Technology infrastructure model that
provides drug providers IT organization with a strategic perspective on how to
computerize their Drug Safety reporting activity. It introduces software
development concepts, methods, techniques and tools for collecting data from
multiple platforms and generates reports from them by scripting queries.
Introduction:
According to Guidance Documents for Drug Evaluations and Research from the
U.S. Food and Drug Administration all prescription drugs, both new and generic
need to be approved by the FDA. To obtain these approvals, drug providers are
required to generate annual reports on product safety and attach them to their
application letter. Also, any person can report to the FDA a reaction or problem
with a drug. The FDA reviews applications and all reported clinical outcomes to
see if the reported events happened because of other reasons or use the
suspected drug.
Manually reporting is not practical because of the large volume of data, and the
differing platforms and formats in which they are stored. Unfortunately, tools and
standards are often poorly used due to lack of Database Application Modeling,
Programming and Software Engineering skills. User applications are often
cobbled together with little more efficiency than manual processing, and tools for
automation and large scale data processing are not utilized.
2
3. The hiring of qualified staff and carefully selecting software increases the quality
and reduces costs. A two-hour job may take a week due to poor technical skills,
and the cost of software licensing may increase by as much as 5,000 USD from
10,000 USD because of the lack of attention paid to the productivity of the
software tool. A standardized IT infrastructure provides higher computational
quality at lower cost. In addition, professional developers with computational
science backgrounds are the only group that has the sufficient computational
knowledge and bookkeeping skills for software application design and the ability
to apply technical concepts.
Merging Computational Science and Drug Development Science for Drug Safety
Evaluation can be evolved within a modern computer environment; and because
Computational Technology grows quickly, designers would need an advanced
vision for the future. A strong knowledge in computational science and
bookkeeping helps developers use what is available and progress forward from
it.
This thesis explains a modern computational architecture for implementing Drug
Safety Reporting Applications. This architecture uses advanced IT concepts to
increase the quality of work on a large volume of data that may be dynamic
rather than static and comes from distributed computer networks. This thesis
aids in the study of Drug Safety in obtaining the best software solution
advantages possible.
Objectives:
SAS is the software application that developers use to provide high-quality
reporting applications for Drug Safety. The collection of concepts that work
together is required in order to achieve a computer-based method for Drug
Safety evaluation. This paper proposes an infrastructure that uses the optimal
solutions for this process. The abstract is intended to use the information
3
4. gathered to develop the system as a whole. It can accept data from both papers
and electronic databases. Databases such as Oracle and Microsoft Access can
be considered as backbones of the system. All computational terminologies that
are recommended for this proposed infrastructure must be explained. For
example; in some cases, data mining might be used to find a pattern and help to
estimate descriptions of a data field. This ability of the proposed architecture in
data mining should be illustrated.
In this thesis, entity relational database modeling as well as data accessing,
formatting, classification, and scripting is illustrated best by giving examples and
working on creating descriptions of longitudinal data. Focusing on code
consistency with all essential attributes and their effeciencies in the proposed
infrastructure is included. Proposed software should support maintainablity; but
focusing techniques on the data error concept is not within the scope of this
paper. In order to achieve the best result, we need to use all available pieces of
accurate data and perform the correct programming processing. These data can
come from health care providers, consumers, literature, and other relevant
databases. It is important to find the ordinary errors during scripting. Due to a
missing part or step in coding for data processing (extracting and retrieving,
manipulating data, or making narrative data from queries and assessing them) a
large difference on the expected result and the accuracy of the reports may
occur.
Technical Specifications:
Data accessing:
SAS data might come from other application platforms. These data might be
formatted or non-formatted and therefore filed differently in varied environments.
Accessing these data from several servers is done in the following steps.
4
5. I. Use the SAS ODBC driver to access by communicating with
either local or remote SAS servers using TCP/IP protocol. Data
can come from a local, remote, or any type of database server. It
can be in any format including raw data or any vendor’s software
data set. The ability to read raw data in any format, from any kind
of file (including variable-length records, binary files, and free-
formatted data--even files with messy or missing data) is required.
II. Combine and manipulate these data on the client side, analyze
the out-coming data and distribute it by making an execute file
from the server to multiple client.
The following are examples of possible case in data accessing:
a. Data may exist on a mainframe computer or pc network.
These data might join to an existing data set, create new
variables (columns), and produce tables and interactive
graphs.
b. Raw data may exist on a UNIX server. Compute other
data values from them, form statistics, and create an
HTML report to use in web application systems, then
store on a web server in intranet /internet platform.
c. Access may be needed to BMDP, SPSS, and OSIRIS
files directly as well as files such as Microsoft Excel
spreadsheet, Microsoft Access table, dBase, ORACLE
forms and any other DBMS. In addition, both relational
and non-relational databases, including any PC data
source can be considered as a data file.
d. The relational databases in DB2 format exists in OS/390,
VM, DB2, UNIX or PC environment.
e. ODBC, Informix, ORACLE and OLE DB data may come
from any platform. They may also come from SYBAS
5
6. machine or Teradata, MSSQL Server or any other
machine.
f. Baan or PeopleSoft files may come from ERP systems
such as R/3 and SAP BW. Thus global data may be
received and processed for creating an enterprise report.
Data Management: After accessing data, it is necessary to manage them, by
creating, retrieving, and updating database information. This may require
advanced programming skills because the information comes from a wide
range of data sources and it is necessary to merge them together and then
evaluate. Data with the same attributes need generic formatting that
requires a manipulation process. Evaluating values of data requires
computational operations that may be defined as functions. Saved sets of
data in the data forms may have been extracted from subsets data. Complex
conditional processes during data manipulation may be needed when a wide
range of data source is merged.
After gathering and shaping information we need Statistical Analyzing to
produce reports. These reports are customized and they may be complex.
Tables, frequency counts, and cross-tabulation tables may be produced to
create a variety of charts and plots. Also, the computation of a variety of
descriptive statistics including linear regression analysis, standard deviation,
correlations and other measures of association, as well as multi-way cross-
tabulations and inferential statistics may be necessary.
These representations should be able to be reported to a wide variety of
locations and platforms in order to suit client needs. Results may be required
to be presented in many formats, such as an array of markup languages
including HTML4 and XML, or formatted for a high-resolution printer such
6
7. PostScript/ PDF/ PCL files, RTF or even color graphs that can be made
interactive using ActiveX controls or Java applets.
System architecture modeling
Reporting data by
investigators.
Clinical Trials
Hospital Labs.
Data Dictionary
Clinical Studies Modification
Archive
Data, (MedDRA/PubMed)
(Oracle)
Verifications
Post marketing DATA Ware
House
Adverse Event
Reporting
Individual clinical Data
User
trials Analyses
(SAS)
Information must be gathered by drug providers. These data come from clinical
studies by the FDA and other professional investigators. Other information
comes from medical records of patients who were treated by the specific drug.
Usually, drug providers do a study of their product before moving onto the
evaluation step. The first step is the collecting of data to generate reports such as
the country of origin for patients receiving the drug, worldwide patient exposure,
demographic characteristics, most commonly reported body-system reactions
(ordered by gender and/or age of patients), and the summary of death or other
7
8. critical body reactions. Another resource is the company’s surveys on products
completed by patients or clients who are volunteers in the U.S. or other
countries. These surveys include match data from Med Watch Forms that the
U.S. Department of Health and Human Services accepts as a voluntary reporting
of adverse events and product problems. Also, these manufacturing companies
may be able to receive FDA Reports generated on the basis of Med Watch
Reports about this product. Furthermore, many of the surveys are answered by
physicians and other doctors who have the EMR System and are able to answer
detailed questions regarding medical conditions and other related medical
issues.
Any tool that is recommended here should be consistent with FDA Standards
and the objectives that follow.
In any Adverse Event Reporting System, the Basic Calculation and Data
Analysis have statistical bases on data sets that may frequently be ordered
according to one or more variables coming from a variety of data sources.
Thus an Adverse Events Reporting System can work on any possible
platform. For example, if it uses E2B data element structures then it should be
able of doing any possible interactive query or data flow transactions on
shared data. SAS is compatible with all computer platforms. It works on any
type of operating system. It supports data sharing concepts. It suports
submission through the WEB or any other network that includes Oracle, Unix,
NT servers or Mainframe machines. This means that any regardless of
backbone, SAS can suport it.
Data sources may need to be summarized or checked before being reported.
Scripting and programing concepts are one of the major necesities in
development. SAS has a powerful scripting language that can do any
required summarizing, verification and validation.
8
9. In the pharmaceutical field and bio-informatics, SAS software is generally
thought of for statistical analysis programming but is also a largely untapped
resource for its other many features. It’s screen building and object oriented
development abilities are needed to keep up with the latest Information and
Technology advances.
SAS is a stand-alone system produced by SAS, Inc. and sold in the open
market. It exceeds all technical objectives specified here.
The FDA has proposed MedDRA as a standardized dictionary of medical
terminology. MedDRA has been used internationally to discuss the regulation of
medical products. MedDRA provides symptoms, signs, diseases, and diagnoses
information. It also includes other information such as:
Names of investigations (e.g. liver function analyses, metabolism tests)
Sites (e.g. application site reactions, implant site reactions and injection
site reactions)
Therapeutic indications
Surgical and medical procedures
Social and family history terms
SAS and MedDRA are FDA standards. They have high standard designing; and
assure that company builders continue looking to find weaknesses and improve
their products. All their documentation and userinterfaces are user friendly. SAS
and MedDRA are generic softwares and any specific needs such as security of
data or reliability of operations can be negotiated in a service level agreement.
EMR Database: These data come from hospital laboratories and clinical data
entry systems. They are documented before and after verification. All
documentations are electronic and all reporting submitted electronically. MeDRA
9
10. does encoding that is part of clinical data entry. All data entries are standard
based approved by the FDA.
Terminologies:
A computerized Drug Safety Evaluations requires the following informatics
terminologies:
Data classifications
Control Code
Formatting
Quality Control
Data Mining
Gathering information
Accessing and manipulating data
Scripting
Each of these terminologies carries a process or methodology that will be
discussed in the following.
Data Classification:
Any Structured Analysis of information needs classification. Data Classification is
the first best-known task in data flow modeling. The data model of a Drug
Adverse Event Reporting System is derived from conceptual information such as
entities and their interrelationships. A mechanism serves as a store of all drug
information which can link analysis, design, implementation and evolutions
applied in most medical applications. This classification should be consistent and
not clash. It is integrated in all parts that require maintainability.
The outcome attributed to adverse events is the most important information that
needs to be classified. The data classification for this attribute should be a
standard classification that is matched by the FDA reporting program.
10
11. The FDA uses MedDRA as a part of the proposed rule for post-marketing
reporting. MedDRA is the abbreviation for Medical Dictionary for Regulatory
Activities and it is an international terminology designed to support the
classification, retrieval, presentation, and communication of medical information
throughout the medical product regulatory cycle. Originally, MedDRA was written
in English and distributed in ASCII file format; but it is now available in several
other languages such as Dutch, French, German, Italian, Portuguese, Spanish,
and Japanese. This on-line dictionary is intended to become the global medical
terminology standard for use by every bio-pharmaceutical company in the world
and has the best-known classification with an integrated platform in updating that
can be used by all standard systems. In the majority of homegrown medical
applications, the patient medical recording systems use this classification and it
is valid for all phases of drugs and subscribing Pharmaceutical companies.
MedDRA works as a catalog of medical disorders. It has a hierarchical data
structure that has five terms. Developing queries or retrieving information about
medical diagnoses need hierarchical searching on these terms, and other
queries might be selected by grouping them thusly.
The next page picture shows the SOC view of Cardiac and Vascular
investigations (excl enzyme test):
11
14. Each MedDRA has a unique code that can be use as a searching key.
14
15. A query makes a link between collected data and terms in MeDRA. A
query can create a selection on a description of medical data. This
selection requires searching and enters the term to be sought into the
'Search for Value' field. The query then selects one of the records
returned and identifies information about patients. After that, codes in
the database are ready for any statistical evaluation.
The other advantages of using MedDRA are:
MedDRA is on-line (not requiring installation or periodic updates on the client
system). The application has a standardized interface, is well supported, and
requires little effort to interface with any client computing environment. A
good designer can get the best advantage of this classified information by
using it as a shared data set. Updating this shared information maintains all
the related outcomes that have referenced this data set.
Informatics terminologies such as encoding are already included in MedDRA
for its own data sets.
MedDRA includes high standards that can be updated with queries or
importing data; however, it requires quality control because it can disrupt
everything.
Current MedDRA Version has MediMiner for the managing and analysis of
the coded data included all data mining. This unique tool allows analysis of
the impact of recoding the data sets from one MedDRA version to another
when MedDRA is a standalone product that has been used as an integral
component of our range of coding tools. MedDRA classification can be
browsed by a tree that can be collapsed and viewed at every level of detail for
all occurrences in every possible search category such as legend, terms and
coding.
15
16. Control Code:
SAS and MedDRA both have code controlling utility to do the following:
Debugging system and maintenance ability in any branch of code to make a
cross-reference listing showing all the program names that have been
declared and used.
The analyzer discovers un-initialized variables, unreachable codes, uncalled
functions and procedures as well as the number of times executed for each
statement.
MedDRA has MedMiner as its version control utility. During any updating in
MedDRA MedDRA 3.1, MediMiner controls all changes by analyzing the coding
sets. In MedDRA 4.1, it also impacts the recoding of data by identifying all codes
that remain unchanged, and identifying those codes that may require recoding. It
is also possible to identify the codes that no longer exist, those that have been
changed in some way, and those that have a related change or where a
multiracial (inherited from multiphase of original codes) change has had an
impact. Primary and secondary changes are identified as well as changes in the
current status of the code.
SAS software includes Source Control Manager (SCM) utility as one of the
options in Desktop selection of Solution menu.
SAS->Solutions->Desktop->Development and Programming-> Source Code
Manager
16
17. SCM includes a friendly GUI that has SAS file check-in/check-out capabilities.
This GUI lists all libraries, data sets, catalogs, and catalog entries in a
hierarchical order. SCM has flexible testing, revision control, and version labeling
with an easy application distribution utility. By having a version label, it is easy to
create a copy of an application and place it in other locations on the network.
Also, SAS/CONNECT utility can place the application on other remote machines.
Formatting:
Usability of information is one the most important components of any application
implementation. Usability requires readability and the readability of any data set
is facilitated by standardized formatting. Each line represents many separate
17
18. pieces of information which are data values, and the formats determine how
these values are displayed or used in calculations. These formats set the width
of displayed values, the number of decimal points displayed, the handling of
blanks, zeroes, and commas, as well as other details.
SAS supports its own standard and user defined formatting. Standard formats
might be use for numeric, character or picture data. Also, User can write or
define custom-made formats in Data and Procedure steps. User defined formats
are reusable and can be saved in format catalogs. If saved in a SAS Catalog
they then remain there permanently. If saved in catalog WORK.FORMATS, they
are there temporarily and retrievable only in the same SAS session or job in
which they were created. Because catalogs are a type of SAS files that reside in
a SAS data library, they work as an executable handling facility and intercept run-
time error under undefined format. By this way, type-checking is supported and
influences the readability of information. If the SAS system option NOFMTERR is
in effect, SAS uses its own default formatting when it calls an undefined format
so that in some cases we might ignore these errors and continue the executing.
Quality Control:
Delivering the correct result requires quality control. SAS recognizes common
errors such as syntax, execution-time, data and semantic errors; however, users
can check for common mistakes such as the following:
Check for syntax errors
o statements ending with a semicolon
o starting and ending quotation marks
o keywords
o Every DO and SELECT statement must be followed by an END statement
Check for execution errors:
18
19. o illegal mathematical operations
o observations out of order for BY-group processing
o Incorrect reference in an INFILE statement such as misspelling or
otherwise incorrectly stating the external files are recognized.
o A program may run, yet give an incorrect result. These errors are often
detectable by checking self-consistency and should always be reported,
certainly in the debugging stage, and often during production runs.
SAS usually executes the statements in a DATA step one by one, in the
order they appear. After executing the DATA step, SAS moves to the next
step and continues in the same fashion. It must be certain that all the SAS
statements appear in order so that SAS can execute them properly.
Check input statements and data. SAS can detect data errors during the
execution; but this won’t terminate the processing. After executing, it prints
a note describing the error. In that note SAS lists the related values that are
stored in the input buffer and the program data vector.
o The corresponding values with actual variable values in INPUT statements
must be checked.
o Any corresponding arrangement such as formats, lists and columns for input
statements must be checked too.
Data mining:
Data mining is a class of database applications that look for hidden patterns in a
group of data. Statistical analysis is the data analyzing method that is matched
with the nature of data mining. Statistical analysis might uncover the hidden
pattern of data for a large volume of information coming from Adverse Events
Reports or survey systems. A data mining process might combine variables that
occur more than expected. By applying statistical options, an optimal guess can
be made about the best match behavior that may have occurred frequently.
19
20. Data mining is a critical aspect of these reporting systems. Occasionally, the
predictions may be even more important than detections in drug safety
evaluation. In the United States, patients can file lawsuits against drug providers
for severe adverse reactions. These legal actions often make American drug
companies fearful to introduce drugs into the U.S. market. However, data mining
on data from other parts of the world offers a way to move the drug safety
process from a reactive process to a proactive posture in efforts. In effect, it
would help drug providers to take a safer marketing strategy rather than take
risks.
Data mining on data from other parts of the world is also a way to move drug
safety evaluation from detection to prediction
If MedDRA System Organ Class terms are adopted as a class of events then one
can select related data from patient records for that event and make it possible to
discover statistical rules or patterns automatically from the data, later creating a
hypothesis and runing tests on the patient record database to verify or refute it.
Data mining can protect drug providers against lawsuit. This process uses data
from other countries and clinical studies.
SAS assists data analyzing in an instructional way, so that even people with no
statistical knowledge are able to run the required processes on selected data
sources (a basic option includes: counting missing and non-missing values,
minimum, maximum, range, sum, mean, variance, standard deviation, standard
error of the mean, coefficient of variance, skewness, kurtosis ). In addition,
access to data sources can be secured to prevent unauthorized access. SAS
also allows for the creating of different reports and presentations on results
(including tabular tables, frequently reports with graphical presentations to
visualize the results).
SAS supports data mining for a large volume of statistical procedures
(regression, association discovery, time series, and time series cross-Sectional
20
21. (panel) data analysis), whereas, data is usually analyzed by regression (one
observation for each patient). Sometimes it is required to correlate with cross-
sectional data such as geographic region, gender, smoking, alcohol use, and so
on.
Gathering information and documenting system specifications:
The available information (such as the toxicological and pharmacokinetic profiles
of the individual drug, the treatment indication or indications, the intended
populations, etc.) might have been defined by relational databases. The
backbone of this system might be SQL, Access or even Excel; but the data query
may not be suited to the performance of detailed statistical analyses of data in
this stage. It is then that SAS helps in statistical analysis. SAS has been
interfaced with databases to allow large volumes of data to be retrieved efficiently
for analysis. All engines can be assigned to a SAS library. This library is a place
that saves all access to the stored files. These files might come from a variety of
engines such as ODBC, SPSS, SYMBAS, REMOTE, META, MYSQL, ACEESS,
ORACLE, DB2, MySQL, ACCESS, etc. For the processing of data, it is required
to define all connections that might be created between the different sets of data
records. The first link can illustrate correspondence of the MedDRA
classifications to the patient records. In concentrating on the relevance of
available data, medical information of patient works in tandem with MedDRA
classifications to build queries and analysis information.
As a part of application developing process, specifying the following information
is required:
1. Source data: Miscellaneous data sources may exist and in order to get the
correct results, the prescription drug information provided by drug firms
should be truthful, balanced, and accurately communicated. The same
applies to data coming from clinical and post-marketing trials, or spontaneous
reports (submitted individually by doctors or patients). Dynamic data are
21
22. operational data from internal systems such as the homegrown applications
of clinics or hospitals, the manual data coming from paper chart patient
history, EMR (Electronic Medical Records), and Adverse Event Reporting
(Med Watch).
2. Data Staging: This area includes the storage and processing for extracted
data from the internal and external systems prior to loading in a SAS data
bank. The following is a list of cases.
• Information may be located in multiple SQL tables in a local computer or
external servers. If it is required, one may make a connection to the
database server and use the data dynamically. For example the Adverse
Events Database has included side effects which are serious (such as
death or risk of dying, hospitalization, disabilities, congenital anomaly or
required intervention to prevent permanent impairment or damage).
These data are required for generating some particular reports.
• Part of the information is part of Aventis Reports or ClinTrace. Data from
these two areas might work together to complete an assignment then
create an executable program that makes a connection to the backbone
database of these two licensed vendor applications and use the data.
Note: Having a basic knowledge about these databases helps programmers
to create standard codes. For example an Aventis or ClinTrace Case ID
(Manufacturer Control #) is assigned on an “Episode” basis for each
patient. Adverse Events (reporting side effects) are temporarily linked to
the same episode and are entered in the same Case ID. For drugs that are
given intermittently, additional episodes (Case ID) are created for events
that occur after different treatment cycles.
• Side effects are stored in Companies Core Safety Data Sheets. These
sheets are for global labeling of reports and are based on the diagnoses
which are in turn assessed by seriousness. All diagnoses reported from
intensified monitoring (such as clinical trial or post-marketing
surveillance study) are assessed as associated or not-associated with
22
23. the study medications. These data may be joined to MedDRA
information to build a larger directory that is used in SQL scripts.
• Drug providers use certain information, such as the cause of side effects
as a result of internal or natural body process, in a causality algorithm for
internal clinical interpretation or signal evaluation purposes. In some
particular cases, this algorithm is required to be applied as a part of
script logic in the SAS code. If a company has a computerized analyzing
application, depending on their software, it is possible to execute a
connection for using this application inside the SAS script code.
• In data mining related by diagnoses, MedDRA information is required. It
is recommended to use SAS scripting for creating a remote connection
to read MedDRA ASCII file, importing data to the temporary created
tables. These tables would be deleted at the end of scripting process.
Note: All transactions such as queries, statistical analyses or visualizations
coming from sources should be consistent. Sometimes these data are not
enough to be consistent. In order to solve this problem, all “no match” data
need appropriate transformations or conversion from their original form to
the MedDRA representation.
3. Metadata: A term used to describe or specify the data. It is used to define all
of the characteristics of data required to build databases and applications,
and to support knowledge workers and information producers. This includes
data element name, meaning, format, domain values, business integrity rules,
relationships, owner, etc.
For example the following classification shows the analogy of data concepts in
MedDRA:
1. SOC
MedDRA CODE Numeric
MedDRA Term String
23
24. 2. HLGT
MedDRA CODE Numeric
MedDRA Term String
3. PT
MedDRA CODE Numeric
MedDRA Term String
COSTART Symbol, AlphaNumeric
WHO_ART Code, Numeric
ICDS Code, Numeric
PT ICD-10 Code Numeric
HARTS Code, Numeric
ICDS_CM Code, Numeric
JART Code Numeric
* SOC Code Numeric
* SOC Name Numeric
4. LLT – Lowest Level Term
MedDRA Code Numeric
MedDRA Term String
WHO_ART Code Numeric
COSTART Symbol AlphaNumeric
ICDS_CM Code Numeric
CURRENCY Character/Boolean
HARTS Code Numeric
ICDS Code Numeric
JART Code Numeric
* Multi valued attribute
Defining Metdata for the adverse event reporting data is also required. These
data are:
o Patient Identifier and patient information: age at time of event or
date of birth, sex, weight, etc.
o Outcomes attributed to adverse events such as death, life-
threatening occurrences, hospitalization, initial or prolonged,
disability, congenital anomaly, required intervention to prevent
impairment/damage, other.
o Date of event and report in mo/day/yr format.
o Description of problem.
o Relevant tests/laboratory data including dates.
24
25. o Other relevant history including preexisting medical condition (e.g.
allergies, race, pregnancy, smoking or alcohol use, hepatic/renal
dysfunction, etc.)
Still most popular medical clinics use Paper Medical Records (PMRs) but many
others have begun to use Electronic Medical Records (EMRs). No standard form
has been yet defined for EMRs, but all provide the same information that requires
Metadata definitions. These data are:
o Patient primary reason for medical visit
o History of onset of clinical signs and symptoms,
o Current list of medications the patient is using
o Relevant past medical history, including hospital admission,
surgeries, and diagnosis
o History of family disease, such as diabetes, cancer, heart disease,
and medical illness
o Social history: use of drugs, smoking, job stability, and housing,
living condition, incarceration.
o Review of systems: patient relocation of systems and current
medical problems, such as trouble sleeping at night, panic
episodes, and results of tests.
o Physical examination: the clinician’s hands-on examination of
patient, including head, eyes, ears, nose, throat, chest, and
extremities
o Labs includes blood glucose, cholesterol, and drug levels
o Studies such as X-ray, MRI, CT, and EKG.
o Progress notes such as record of temporal progression of signs
and symptoms, labs and studies for the length of the study or
admission
4. The entity-relationship model
25
26. The specification of required information for an adverse event serves as a
starting point for constructing a conceptual schema (overall design of the
database) for the suggested database. The identity set and attributes targeted
here are drug and patient entity sets. These entity sets have a relationship that
has attributes by itself. This relationship is a “many to many” relationship. Other
relationships might be designed between subsets of an entity set. The
relationship between drug entity sets and ingredient or side effect entity sets are
examples of these relationships. Here, these relationships are “many to one”
relationship. This method of designation helps in saving memory. In some other
cases such as patient-drug relationship, the maximum participants are limited to
two relations, which leave a designation in one general set.
In the following diagram, small rectangles show the entity set; large rectangles
specify attributes; diamonds represent relationship sets; lines link attributes to
entity sets and entity sets to relationship sets; arrows indicate that an entity falls
exclusively into another entity; double lines indicate many relationship sets; bold
diamonds show “many to one” relationship sets, and rectangles with non-indexed
information indicate information about a relationship set.
26
27. ID
Reason
date_of_event
date_of_report
therapy_start_date
therapy_end_date
diagnose information
Lot_number
Exp_Date
1. MedDRACode
NDC Num
adverse_desc
route and dosage 1. ID
2. Name
3. Value
4. Unit
Adverse reactions
and side effects
Patient -Drugs Drug-
ingredie
1. ID relevant information : 1. id
2. First Name 10. allergies 2. generic name
3. Middle Name 11. smoking 3. trade name
4. Last Name 12. alcohol
4. the dosage range
5. Date Of Birth 13. pregnancies
6. Sex 14. dysfunction 5. metric unit
7. Weight 15. Lab results 6. category
8. race information 7. the form of
9. country
Above E_R model is a sample of what can be considered; although the attributes
can be designed with more details in mind. For example, ‘rout and dosage’
could be designed as a separate entity because it includes many optional
attributes that may be concatenated together as a description data text. They
27
Occur
28. may also be saved seperatly in a data source. This designed E_R model gives
substantial flexibility in the designing of the basic data base schema.
Accessing and Manipulating Data:
The first step in accessing and manipulating data is the DATA Step. The DATA
Step is for accessing, reading and programming the data processing. As
explained before, one of the strengths of SAS is the fast and easy access from
many different sources. In addition to the programming components, SAS has
many other features in the DATA Step Process that help to develop a standard
application. SAS language has all the statements required for accomplishing
typical data processing. Among these are the reading and adding of raw data
files and SAS data sets and writing the results. Sub-setting data, combining
multiple SAS files, creating SAS variables, recoding data values; and creating
listing and summary reports that include advanced analyzing features such as
web analytical solutions are also possible.
Special focus should be placed on the management of SAS data set input and
output, working with different data types, and the manipulation of data. It may
also be necessary to control the SAS data set input and output, combine,
summarize, and then process iteratively with programming to perform data
manipulations and transformations
Accessing data would be first needed here. Sometimes, the required data file
will be saved in another server and location. With an ftp server running, SAS can
make an ftp connection and use the external data source remotely without there
remaining any copy of the downloaded data on the machine unless SAS writes it
out. As an example, one can assume the data belongs to cps-users and is
located at ~/halley/thesis/main.data.
filename fromrcr
ftp 'main.data'
cd='halley/thesis'
user='cps-user'
host='cps.brockport.edu' 28
recfm=v
prompt;
29. Many data might come as raw data. This raw data must be entered into a SAS
data set. As an example, one of the clients might send a letter or a txt file that
includes parts of the patient’s information. The following script shows how to
input these data into a SAS data set.
data PatientInfo;
infile 'c:thesisdata1.txt' ;
input PatientId $ 1-13 age 14-17 sex $ 18-23 weight
24-30 +2 country
run;
proc print data=PatientInfo;
run;
The SAS System 05:25 Thursday, December 15, 2005 5
PatientId age sex weight country
Hzan0616341 30 1 200 11
Amir5666892 40 2 180 12
J675bhgfdql 56 2 . 45 ->
Nmjhg567908 12 1 100 23
Iu6-567-567 99 1 170 01
***A missing value for a numeric variable is presented by a period (.)
Processing Examples:
• To use external files, it is required to tell SAS where to find them. To do
this, there are the following choices:
29
30. 1- Identify the file directly in the INFILE, FILE, or other SAS statement that
uses the file.
2- Set up a fileref for the file by using the FILENAME statement, and then
use the fileref in the INFILE, FILE, or other SAS statement.
3- Use operating environment commands to set up a fileref, and then use the
fileref in the INFILE, FILE, or other SAS statement.
Note: To use several files or members from the same directory, partitioned data
sets (PDS), or MACLIB, use the FILENAME statement to create a fileref that will
identify the name. The fileref can then be used in the INFILE statement and
enclose the name of the file, PDS member, or MACLIB member in parentheses
immediately after the fileref, as shown in the example below:
/* filename data 'directory-or-PDS-or-MACLIB' */;
/* data1.txt and data2.txt located in directory c:thesis */
filename data 'c:thesis';
data paitientdata1;
infile data('data1.txt');
input PatientId $11. +2 age 2. +1 weight 3. +2 sex 1. +2 date1 mmddyy10. +2 date2
mmddyy10. + 2 country 12. ;
run;
data paitientdata2;
infile data('data2.txt');
input PatientId $11. +2 age 2. +1 weight 3. +2 sex 1. +2 date1 mmddyy10. +2 date2
mmddyy10. + 2 country 12. ;
run;
• Also, from file menu, ADX can import data from a SAS data set or any of
ACCESS data base, Excel spreadsheet, a dBase database, a delimited
text file, and files with other common formats. This is helpful when one has
saved information in a variety of formats.
• In SAS one can gain access to data sources by defining ’libref’ and
assigning accesses to them without copying them inside the SAS
30
31. environment. ‘libref’ makes a shortcut to the metadata on the SAS
Metadata Server. Any metadata in the SAS metadata server can be read
by a Meta. Meta is an engine that has options for controlling the outputs.
Meta creates just the metadata in the repository and does not affect the
data sources. If the table does not exist in the data source, the Meta
engine creates the metadata based on the information specified in the
application for the output table. When deleting a table, this option deletes
the metadata from the repository but does not delete the table from the
data source. Also, when deleting a table, this option deletes the table from
the data source but does not delete the metadata from the repository.
SAS Library includes Metadata objects that are defined by ‘libref.’ These
objects define the engines that are used to process the data. This library
has URI (Uniform Resource Identifier) architecture. To get access to a
SAS Metadata Server, define the host address. If working in a TCP
network, define the port number. If the protocol is not a com but a bridge,
define a user-id and password otherwise it will not be possible to log into a
SAS Metadata Server. In addition, any repository Metadata may be used
by a repository-id or name.
To access these tables, one can use SAS/Warehouse Administrator as a
tool. In order to determine the metadata, it needs to identify and search
the objects by their name, URL and other identifiers such as their ID. The
following script displays this process.
Ibname upcase metan liburi="SASLibrary?@name='oralib' "
ipaddr=d6292.us.GCS.com
Scripting:
31
32. SQL Scripting Goal is the driving of available data from any possible data source.
Most vendor applications have SQL backbone so that with SQL scripting it is
possible to perform queries on original or manipulated data (retrieving data from
multiple tables; creating views, indexes, and tables; and updating or deleting
values in existing tables and views as well as summarizing them). SQL scripting
can happen in SAS or SQL environment.
In the following example, the reduction of the earlier E_R schema ids is created
from inside the SQL environment:
/*------------------------------------------------------------------------------------------*/
/* create a higher-level entity set for drug information */
CREATE TABLE drug(
id CHAR(12) NOT NULL,
generic_name CHAR(25),
trade_name CHAR(25),
dosage INT,
unit INT,
category INT,
FOREIGN KEY (category) REFERENCES drug_category(category_id)
ON DELETE CASCADE,
FOREIGN KEY (unit) REFERENCES unit(unit_id)
ON DELETE CASCADE,
PRIMARY KEY (id)
) ENGINE=INNODB;
/* create the lower level entity sets for drug information */
CREATE TABLE ingredient (
id INT,
drug_id CHAR(12),
ingredient_name CHAR(25),
ingredient_value INT,
unit INT,
INDEX drug_ind (drug_id),
FOREIGN KEY (drug_id) REFERENCES drug(id)
ON DELETE CASCADE,
FOREIGN KEY (unit) REFERENCES unit(unit_id)
ON DELETE CASCADE,
) ENGINE=INNODB;
/* the side effects of each drug have description that should be
compatible with MedDRAClassification */
CREATE TABLE sideeffects (
32
33. MedDRACode INT,
drug_id CHAR(12),
INDEX drug_ind (drug_id),
FOREIGN KEY (drug_id) REFERENCES drug(id)
ON DELETE CASCADE
) ENGINE=INNODB;
/* create a general entity set for patient information; This entity set
can be expanded by other entity sub sets such as patient laboratory
information or more information about the history of that patient */
CREATE TABLE paitient(
id CHAR(12) NOT NULL,
first_name CHAR(25),
middle_name CHAR(25),
last_name CHAR(25),
DateOfBirth DATE,
Sex INT,
weight INT,
race INT,
country INT,
FOREIGN KEY (race) REFERENCES drug(race_id)
ON DELETE CASCADE,
FOREIGN KEY (country) REFERENCES drug(country_id)
ON DELETE CASCADE,
PRIMARY KEY (id)
) ENGINE=INNODB;
/* some revalent paitient information might come from following
sugested sub entity set */
CREATE TABLE Relevant_Patients_Info (
Info_id INT NOT NULL AUTO_INCREMENT,
paitient_id CHAR(25) NOT NULL,
allergies_id INT,
races_id INT,
Num_pregnancies INT,
smoking INT,
alcohol_use INT,
hepatic_id INT,
dysfunctions_id INT,
INDEX (allergies_id),
FOREIGN KEY (allergies_id) REFERENCES allergies(allergies_id) ON UPDATE
CASCADE ON DELETE RESTRICT,
INDEX (races_id),
FOREIGN KEY (races_id) REFERENCES races(races_id) ON UPDATE CASCADE
ON DELETE RESTRICT,
INDEX (hepatic_id),
FOREIGN KEY (hepatic_id) REFERENCES hepatic(hepatic_id) ON UPDATE
CASCADE ON DELETE RESTRICT,
INDEX (dysfunctions_id),
FOREIGN KEY (dysfunctions_id) REFERENCES dysfunctions(dysfunctions_id)
ON UPDATE CASCADE ON DELETE RESTRICT,
INDEX (paitient_id),
FOREIGN KEY (paitient_id) REFERENCES paitient(id) ON UPDATE CASCADE ON
DELETE RESTRICT,
33
34. PRIMARY KEY(Info_id)
) ENGINE=INNODB;
/* transforming to a tabular form of this E_R model includes aggration
is streightforward. Paitient-Drug relationship includes a column for
each attribute in the primary key of the entity set for this
relationship (any oconcomitant medical products that paitient uses and
therapy dates might come from related tables in the drug id and
paitient id. Also, any available adverse event information that shows
the problem of using that drugshould be included.)
*/
CREATE TABLE Patients_drugs (
Info_id INT NOT NULL AUTO_INCREMENT,
paitient_id CHAR(25) NOT NULL,
drug_id CHAR(12) NOT NULL,
therapy_start_date DATE,
therapy_end_date DATE,
MedDRACode_DiagnoseForUse INT,
/* 1 == yes, 2==no, 3==doesn’t apply */
/* Event abated after use stopped or dose reduced */
Quest1 INT,
/* event reappeared after reintroduction */
Quest2 INT,
Lot_number INT,
Exp_Date DATE,
NDCno INT,
reason INT NOT NULL,
date_of_event DATE,
date_of_report DATE,
adverse_desc TEXT, -----
INDEX (paitient_id),
FOREIGN KEY (paitient_id) REFERENCES paitient(id) ON UPDATE CASCADE ON
DELETE RESTRICT,
INDEX (drug_id),
FOREIGN KEY (drug_id) REFERENCES drug(id) ON UPDATE CASCADE ON DELETE
RESTRICT,
PRIMARY KEY(Info_id)
) ENGINE=INNODB;
SQL scripting is required to generate reports on summary statistics. Macro
Language provides a facility that allows writing SQL procedure inside the SAS
environment. Therefore, SQL scripting extends SAS coding to the retrieval and
combination of data from tables or views. New ones can be created along with
34
35. indexes, and data values in PROC SQL tables can be updated. It is also
possible to update and retrieve data from Database Management System tables
or modify a PROC SQL table by adding, modifying, or dropping columns.
Example: Assume the Adverse Events Information from clinical studies, post-
marketing trials, spontaneous reports, and miscellaneous sources (including
independent drug identification numbers and retrospective data collection) are
saved in the above SQL tables. The following script generates a report that
shows Country of Origin for Patients receiving a drug in a post-marketing setting.
proc sql;
/* It extracts and manipulates grouped and ordered data from
patient records to create a new temporary view table that includes
only patient populations in each country. Country field is defined
as an id number; to represent it by country name, it joins to the
columns from countries table. After process is done, the temporary
view table is dropped*/
create view temp as
select country, count(country) as count,
calculated Count/Subtotal as Percent format=percent8.2
from paitient,
(select count(*) as Subtotal from paitient) as survey2
group by country
order by count;
quit;
proc sql;
/* extracts required data from created temporary view table and
then drop it */
title1 'Country or Origin for Patients Receiving the suspected
drug in a Postmarketing Setting';
select c.countryname,t.count as cc,"(", t.Percent ,")"
from countries c, temp t
where c.ipcode = t.country;
quit;
proc sql;
drop view temp;
quit;
35
36. Country or Origin for Patients Receiving the suspected drug in a Postmarketing Setting
22:04 Monday, January 16, 2006
CountryName Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Greece 1 (0.1%)
Uruguay 2 (0.2%)
Taiwan 2 (0.2%)
French Polynesia 2 (0.2%)
Peru 2 (0.2%)
Korea 2 (0.2%)
South Africa 3 (0.2%)
Portugal 3 (0.2%)
Turkey 4 (0.3%)
Hungary 4 (0.3%)
Austria 4 (0.3%)
New Zealand 7 (0.5%)
Brazil 7 (0.5%)
Norway 10 (0.8%)
Israel 11 (0.8%)
Chile 15 (1.1%)
Netherlands 26 (2.0%)
Italy 39 (3.0%)
Spain 38 (2.9%)
Belgium 38 (2.9%)
United States 42 (3.2%)
Finland 44 (3.4%)
Germany 50 (3.8%)
Sweden 69 (5.3%)
Denmark 91 (7.0%)
Canada 97 (7.4%)
Australia 107 (8.2%)
Great Britain 271 (20.8%)
France 313 (24.0%)
The patient exposure to the drug can be calculated and presented in different
ways. Although available exposure data are provided for a period of time, the
primary focus of a submitted report may be the number of exposures and cases
that occurred in a specific period of time. In the following report, global patient
exposures from 1989 to 2004 are provided:
proc sql;
create view temp1 as
select region, count(region) as SachetSales
from paitient
group by region
order by SachetSales;
quit;
36
37. proc sql;
create view temp2 as
select region, count(region) as Exposures
from paitient,
where paitient_Id in (select paitient_Id from Patients_drugs where
substr(therapy_start_date,7,4) > '1983' && substr(therapy_end_date,7,4)
< '2001')
group by region
order by Exposures;
quit;
proc sql;
title1 'Wor ldwide Patient Exposure to the suspected drug 1989 to 1994';
select c.region,t1.SachetSales , t2.Exposures
from countries c, temp1 t1, temp2 t2
where c.ipcode = t1.region and c.ipcode = t2.region ;
quit;
proc sql;
select sum(t1.SachetSales) as SumSachetSales, sum(t2.Exposures) as
SumExposures
from temp1 t1, temp2 t2
quit;
proc sql;
drop view temp1, temp2;
quit;
Worldwide Patient Exposure to the suspected drug 1989 to 1994 23
20:55 Saturday, January 21, 2006
Region SachetSales Exposures
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
ƒƒƒƒƒ
Europe 230,649,500 1,895,749
Australia 5,292,542 43,500
Korea 3,067,300 25,211
Canada 1,497,100 12,305
Rest of World 2,405,064 19,768
SumSachet SumExposures
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
ƒƒƒƒ
242,911,506 1,996,533
Inside the SQL scripting, one may occasionally work with data that are imported
from the MedDRA application. These data may have already existed in a
machine and it is not required to make access to the MedDRA environment a
37
38. second time. One can use the SAS utility to convert data from one form to
another or copy between machines. A free trial of MedDRA is available on the
MSSO website. This contains a copy sample of MedDRA data which are saved
in an Access data base. It could also be imported to an Excel file if needed. If
the data set is standard and completed it would then be better to use it as a
shared data source. This shared data source may be stored as a Relational
Database System (RDBMS), an Excel spreadsheet, or even as data stored on a
flat file. If it is stored in an external machine then it becomes an external data
source and a SAS connection is required for access.
The following SAS script retrieves MedDRA Classification from a data source. It
imports data from an external file (a spreadsheet) to a SAS table. This code was
generated and saved during the wizard importing process. Saving this type of
script helps to prevent redoing the work when the information is needed again.
PROC IMPORT OUT= WORK.MEDDRAInfo
DATAFILE= "C:thesisCTCAEv3.xls"
DBMS=EXCEL REPLACE;
SHEET="'CTCAE v3#0 MedDRA Codes$'";
GETNAMES=YES;
MIXED=NO;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
RUN;
The following script works as well:
Filename xclfil
'C:thesisCTCAEv3.xls’;
proc import
datafile=xclfil
out= WORK.MEDDRAInfo
dbms=excel97 replace;
getnames= yes ;
38
39. The above script retrieves MedDRA Classification from a data source. Often
these data may not represent all MedDRA data. Usually, only a subset of these
data is required and is stored in an external file.
Assume MedDRAClassifications.xls includes only the MedDRA
Classifications Data. To generate reports related to side effects, importing this file
is enough to retrieve the appropriate symptoms information or signs listed by
outcomes.
PROC IMPORT DBMS=EXCEL OUT= work.MedDRA
DATAFILE="c:thesisMedDRAClassifications.xls"REPLACE;
Run;
infile ' c:thesisMedDRAClassifications.csv' delimiter=',' dsd;
proc print data=MedDRA;
run;
The SAS System 05:25 Thursday, December 15, 2005 1
Obs MedDRATermLevel1 MedDRATermLevel2
1 Nervous system disorders
2 Balance disorder
3 Convulsion
4 Lethargy
5 Optic neuritis
6 Paraesthesia
7 Speech disorder
8 Tunnel vision
9 Visual field defect
10
11 Eye disorders
12 Astigmatism
13 Blindness
………
………
…
.
* Sometimes the information that comes from a Report Adverse Event, clinical trials or any other
post-marketing or Pharmacovigilance Application has a provisional order number that is assigned
to outcome data which is cannot be correctly mapped to MedDRA. These order numbers alone
can be used when electronic reports or data are submitted and automatically converted to the
MedDRA codes.
39
40. From the parameter list created, values can be individually highlighted and
chosen for processing. These required parameter values may be retrieved from
tables that have been created by scripts such as following:
proc sql;
create table reasonlist1
( Description char(60));
insert into reasonlist1
values('Patient Died')
values('Life threatening illness')
values('Required emergency room/doctor visit')
values('Required hospitalization')
values('Resulted in permanent disability')
values('Resulted in prolongation of hospitalization')
values('others');
The ordering of the above parameter values is important for selecting the rows by
their Order Number and the description of these values must be the same as
those found on the FDA forms. The following script creates a parameter table for
the abbreviations used by Drug Safety Reporting. The ordering and description of
these abbreviations is also consistent with FDA standards.
proc sql;
create table abbreviations
( abb char(5), Description char(60));
insert into abbreviations
values( 'ADR','adverse drug reaction')
values( 'AE','adverse event')
values( 'AERS','Adverse Event Reporting System ')
values( 'bid','twice daily')
values( 'CI','confidence interval')
values( 'CIOMS','Council for International Organizations of Medical
Sciences')
values( 'COSTAR','Coding Symbols for Thesaurus of Adverse Reaction
TermsT')
values( 'CSDS','Core Safety Data Sheet')
values( 'CV','coefficient of variation')
values( 'FDA','Food and Drug Administration')
values( 'GABA','Gamma amino butyric acid')
values( 'HARTS','')
values( 'IBD','International Birth Date' )
values( 'ICD9-1','International Classification of Diseases, 9th and
10th 0')
values( 'ICD9C','MEditions/Revisions')
values( 'ICH','International Classification of Diseases, Ninth
Revision, Clinical MedDRAModification')
40
41. values( 'NDA','International Conference on Harmonisation ')
values( 'PSUR','Medical Dictionary for Regulatory Activities')
values( 'qd','New Drug Application')
values( 'qid','Periodic Safety Update Report')
values( 'SAE','once daily')
values( 'SD','four times daily')
values( 'SE','serious adverse event')
values( 'US','standard deviation')
values( 'WHO-AR','standard error T');
quit;
Formatting may be used for other parameter values. The ATTRIB Statement
permanently associates a format with a variable. SAS uses the format to write
the values of the variables specified.
attrib sales1-sales3 format=comma10.2;
Due to the permanent association of the ATTRIB Statement in the above
command, any subsequent DATA Step or PROC Step will use COMMA10.2
format to write the values of sales1, sales2, and sales3.
In addition to the default formats that are supplied by Base SAS Software, one
can create custom-made formats by the Format Procedure. The following format
procedure is used to define the Static Parameter Values that may be required. It
expresses weights; and measures using USP (United States Pharmacopeia)
standard abbreviations for dosage units.
Proc format;
value $dosage_units
‘1’ = ‘m’
‘2’ = ‘kg’
‘3’ = ‘g’
‘4’ = ‘m’
’5’ = ‘mcg’
‘6’ = ‘L’
‘7’ = ‘mL’
’8’ = ‘mEq’
’9’ = ‘mmol’
‘10’ = ‘ %’
run;
*see legend below for definitions
41
42. (1) m (lower case) = meter
(2) kg = kilogram
(3) g = gram
(4) mg = milligram
(5) mcg = microgram
(do not use the Greek letter mu which has been misread as mg)
(6) L (upper case) = liter
(7) mL (lower/upper case) = milliliter (do not use cc which has been misread as U or the
number 4)
(8) mEq = milliequivalent
(9) mmol = millimole
It can also be used to define a format variable for the drug in question (see
procedure below):
proc format;
value $dosage_form
‘1’ = ‘capsule’
‘2’ = ‘cream’
‘3’ = ‘ear drop’
‘4’ = ‘eye drop’
‘5’ = ‘inhaler’
‘6’ = ‘injection’
‘7’ = ‘oral solution’
‘8’ = ‘solution’
‘9’ = ‘suspension pediatric drop’
‘10’ = ‘syrup’
‘11’ = ‘tablet’
‘12’ = ‘chewable tablet’
‘13’ = ‘other’
run;
Time durations, age and formats are also available:
proc format;
value $time_duration_form
‘1’ = ‘hour’
‘2’ = ‘day’
42
43. ‘3’ = ‘week’
‘4’ = ‘month’
‘5’ = ‘year’
run;
proc format;
value $age_range _form
‘1’ = ‘children’
‘2’ = ‘adult’
run;
proc format
value $eating-format
‘1’ = ‘with meal’
‘2’ = ‘without meal’
‘3’ = ‘before meal’
‘4’ = ‘after meal’
‘5’ = ‘with a glass of water’
‘5’ = ‘other’
run;
proc format
value $time-format
‘1’ = ‘morning’
‘2’ = ‘noon’
‘3’ = ‘after noon’
‘4’ = ‘evening’
‘5’ = ‘midnight’
run;
Other values are a combination of the above defined formats. For example, drug
labels may read: “for adults, every morning, 2 tablets, 2 hour before meals, with a
glass of water” or “for children, under 8 years of age, ½ a tablet before meals, with a
glass of water….”
In a database, grouping processes may be based on the “Sex/Gender” field where
the values of “Male” “Female” and “unknown” can define minor groupings. These
values can be stored as Numeric variables (1, 2, and 3). The ordering of numeric
levels in relation to classification variables must be done with care. If in a statistical
report, the data for female patients is required to appear after the data for males,
the “Sex/Gender” field would use “2” for females and “1” for males. The following
SAS script describes this formatting.
43
44. proc format library=proclib;
value $sex
'1'='male’
'2'='female'
'3'='unknown'
picture pop low-high='000,000,000'
run;
Formatting has other usages in scripting. Many of the data values must be defined
by format. In SAS one can use this format with any of the following:
1. PUT, PUTC, or PUTN functions
2. %SYSFUNC macro function
3. FORMAT/ATTRIB statement in a DATA step or a PROC step
num=15;
char=put(num,hex2.);
population=1145.32;
put population 10.2;
result: 1,145.32
Also one can use a macro function to define a user defined function. This function
applies the defined format to the result of the function outside a DATA step.
%macro tst(amount);
%put %sysfunc(putn(&amount,dollar10.2));
%mend tst;
%tst (1154.23);
Usually Patient records are the type of data that can come from an Open
Database Connectivity (ODBC). It is very possible that these data have existed
as a backbone of a medical client-server application. In this case, access to data
via ODBC is required. The module "SAS/Access for ODBC" must be installed on
the computer. Configuring the database by referring to the DNS (Data Source
Name) and how it is accessed is can also be required. Even parameter values
44
45. can come from an ODBS. These data may have dynamic data values that get
up-dated by end-users through the web. Normally, these applications have
administration parts that allow the end-user to do parameter updating.
Example:
The following script shows how one can use a part of data that is stored in
another vendor's Database Management System (DBMS) files. This data then
goes into the SAS data set. In the following script a ‘libref’ is declared and points
to a library containing Oracle data. SAS reads data from an Oracle file into a SAS
data set:
libname dblib oracle user=halley password=halley path='hrdept_002';
data paitient.big;
set dblib.paitient;
run;
Memory allocation is the most important concept in creating or extending a data
library. SAS allows for the request of space as needed. For optimizing system
performance and allocating space appropriately, one can pre-allocate the most
space that that may be needed. These methods are used more often when
multivolume access to SAS data libraries is required.
The above data statement may then change to:
/* Know this is a big data set. */
data paitient.big (alq=100000 deq=5000);
As is explained earlier, data can come from an external data file. Additionally,
one can connect to a data file and work on it. In the following script, we can
connect to Z/OS and UNIX server to use DB2 and Oracle data:
/*************************************/
/* connect to z/OS */
/*************************************/
45
46. options comamid=tcp;
filename rlink '!sasrootconnectsaslinktcptso.scr';
signon os390host;
/*************************************/
/* download DB2 data views using */
/* SAS/ACCESS engine */
/*************************************/
rsubmit os390host;
libname db db2;
proc download data=db.paitient
out=db2dat;
run;
endrsubmit;
/*************************************/
/* connect to UNIX */
/*************************************/
options
remote=hrunix comamid=tcp;
filename rlink '!sasrootconnectsaslinktcpunix.scr';
signon;
/*************************************/
/* download Oracle data using */
/* SAS/ACCESS engine */
/*************************************/
rsubmit hrunix;
libname oracle user=hzan password=halley;
proc download
data=oracle.paitient out=oracdat;
run;
endrsubmit;
/*************************************/
/* sign off both links */
/*************************************/
signoff hrunix;
signoff os390host cscript=
'!sasrootconnectsaslinktcptso.scr';
/*************************************/
/* union data into SAS view */
/*************************************/
proc sql;
create view temp_joindata as
(select gender ,country, count(*) into population
from db2dat group by gender,country ;)
union
(select gender,country, count(*) into population
46
47. from oracdat group by gender,country;)
union
(select gender,country, count(*) into population
from paitient1 group by gender,country;
)
proc sql;
create view jointdata
select temp_joindata.gender,
temp_joindata. population,
countries.name
from temp_joindata, countries
where countries.codeId = temp_joindata.country
order by gender, countries.name
group by gender, countries.name
options fmtsearch=(proclib);
/* The NOWD option runs the REPORT procedure without the REPORT window
and sends its output to the open output destination(s).*/
proc report data=jointdata nowd;
column gender country population;
format gender $SEX. Country & $50. Population pop;
title ‘Country or Origin for Patients Receiving the drug in Post
marketing’;
run;
Country or Origin for Patients Receiving this drug in Post marketing
for 04JAN06
Gender country Population
Female Algeria 743,453
Male 235,984
Unkown 167
Female Denmark 423,457,698
Male 546,876,345
Unkown 897
Female Spain 456,9812,564
Male 400,987,564
Unkown 234
Female United Kingdom 876,234,123
Male 564,234,876
Unkown
Conclusions:
This thesis proposes ways on how to improve programming practices for
Standardizing Drug Safety Reporting Systems. The quality of a Drug Safety
Reporting Application depends on the system architecture, methodologies, and
47
48. modeling used by the programmer. The degree to which an implementation is
standardized is in direct proportion to the correctness of methods in accessing,
gathering and manipulating the data, its classifications, control code, quality
control, formatting, statistical analyzing, and mining thereof. Classification terms
should follow a hierarchical structure that is consistent with FDA standards and
MedDRA. Using the control code with MedMinder and the SCM is also
important. Both this and quality control should not be overlooked by
programmers. Formatting of data must be done properly and again, consistent
with FDA standards. Statistical analyzing and data mining in these types of
applications must also be done correctly as it has a direct affect on the results.
Ultimately, gathering data and its access should be handled dynamically and
manual accessing should not be considered. Above all, details such as size of
data in the data accessing stage should be carefully protected.
As to the professional performing in the system, an advanced background in
computational, mathematical, and programming methods is obligatory for
accurately applying these terminologies. SAS programming, knowledge of
Object Oriented programming data structures, data base modeling and SQL are
all necessary skills for implementing a Standard Drug Safety Reporting System.
Knowledge of statistical modeling is particularly desirable in data mining
applications. Finally, a graduated computational science major or a professional
software designer can make the application work more dynamically and
accurately with good scripting skills. The workbench of Drug Safety Reporting
Systems is made up of SAS, and MedDRA applications. SAS supports an
advanced data accessing technology; and MedDRA classification matches the
metadata required for designing this application. These existing components
improve the reliability of design, and SQL scripting expands it.
48
49. References
SAS Publishing, the Analyst Application, Second Edition (July 2002)
Adriaans, P., and D.Zantings.1996. Data Mining. Edinburg Gate, England: Addison
Wesley Longman.
Hand, D.J. 1997. Construction and Assessment of Classification Rules. New York: John
Wiley & Sons, Inc
Berry, M.J.A., and G. Linoff. 1996. Data Mining Techniques for Marketing, Sales, and
Customer Support. New York: John Wiley & Sons, Inc
Bergeron, Bryan P. (2003). Prentice Hall Professional Technical Reference. Bioformatics
Computing. New Jersey: Pearson Education, Inc.
Pharmacoepidemiology and Drug Safety, Vol. 1 [1992], Vol. 2 [1993], Vol. 6 [1997]) &
Vol. 7 [1998])
Agresti, A. (1996) Introduction to categorical Data Analysis, Wiley, NY
Collet, D. (1994) Modeling Survival Data in Medical researches, CRC/Chapman & Hall,
London
Benichou C., (ed) Adverse Reactions: A practical Guide to Diagnosis and Management
(Wiley & Sons, 1994)
Fuchi, K. (1981) “Aiming for knowledge information processing system.” Processing of
international conference on fifth generation computing systems, Japan Information
Processing Development center, Tokyo republished (1982) by North-Holland Publishing,
Amsterdam
SAS online documents http://www.sas.com/service/library/onlinedoc
CDER (http://www.fda.gov/cder/handbook/index.htm)
MedWatch http://www.fda.gov/medwatch/getforms.htm
49