SlideShare a Scribd company logo
1 of 117
Download to read offline
De-identifying Clinical Data
Khaled El Emam, CHEO RI & uOttawa
www.ehealthinformation.ca




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Secondary Use/Disclosure

                                              disclosure                            collection

     recipient

                                                                                                                   individuals
                                                                                custodian

      agent
          t

                                                                                                                          custodian
                                                 use
                                                                         disclosure



Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Data Flows
 • Mandatory disclosures
 • Uses by an agent for secondary
   purposes
 • Permitted discretionary disclosures for
   secondary purposes
 • Other disclosures for secondary
   purposes



Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Obtaining Consent - I
 • Sometimes it is not possible or
   practical to obtain consent:
           – Making contact to obtain consent may
             reveal the individual’s condition to others
             against their wishes
                      h       h
           – The size of the population may be too large
             to obtain consent from everyone
           – Many patients may have relocated or died



Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Obtaining Consent - II
           – There may be a lack of existing or
             continuing relationship with the patients
           – There is a risk of inflicting psychological,
             social or other harm by contacting
             individuals or their families in delicate
             circumstances
           – It would be difficult to contact individuals
             through advertisements and other public
             notices



Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Impact of Obtaining Consent
 • In the case where explicit consent is
   used, consenters and non-consenters
                        non consenters
   differ on:
           – age, sex, race, marital status, educational
             level, socioeconomic status, health status,
             mortality, lifestyle factors, functioning
 • The consent rate for express consent
   varied from 16% to 93%


Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Limiting Principles
 • Do not collect, use, or disclose PHI if
   other information will serve the
   purpose
 • For example, even if it is easier to
              p,
   disclose a whole record, that should
   not be done if lesser information will
   reasonably satisfy the purpose
 • De-identification would be one element
   in limiting the amount of PHI that is
   i li iti    th        tf       th t i
   collected/used/disclosed
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Breaches
 • In many large research hospitals and
   hospital networks it is simply not
   possible to control and manage all of
   the databases and data sets that are
   created, used, and disclosed for
   research
 • Breach frequency and severity is
   growing
 • D id tifi ti
   De-identification provides one way to
                          id          t
   manage the risks, however
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Trust
 • Patients change their behavior if they
   perceive a threat to privacy
 • This can have a negative impact on the
   q
   quality of the data that is used for
         y
   research




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Deloitte Survey (2007)
  • N=827 respondents in North America
  • 43% reported more than 10 privacy breaches
    within the last 12 months in their
    organizations
  • Over 85% reported at least one privacy
    breach
  • Over 63% reported multiple privacy breaches
    requiring notification
  • Breaches involving 1000+ records were
    reported by 34% of respondents

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Verizon Study
  • Based on forensic engagements conducted by
    Verizon
  • Breaches resulting from external sources:
    73%
  • Caused by insiders: 18%
  • Implicated business partners: 39%
  • The median number of records involved in an
       e ed a     u be o eco ds       o ed    a
    insider breach were 10 times more than an
    external breach
  • Bi
    Biggest causes are errors and hackers
           t                    dh k

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
HIMSS Leadership Survey
  • Survey of healthcare IT executives, n=307
  • Conducted in the 2007-2008 timeframe
  • 24% of respondents reported that they have
    had a security breach in their organization in
    the last 12 months
  • 16% of respondents reported that they have
    had a security breach in their organization in
    the last 6 months
  • Half indicated that an internal security breach
    is a concern to their organizations

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
HIMSS Analytics Report
  • IT executives and security officers at
    healthcare institutions; n=263
  • Half of respondents are concerned with
    internal inadvertent access to patient data
  • 13% indicated that their organization has had
    a security breach in the last 12 months
  • 80% of these were internal breaches




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Medical Record Breaches 2008
  • For all of 2008 (datalossdb.org)
  • 83 breaches involving medical records (14%
    of total)
  • Approx. 7.2 million records involved in these
    breaches (21.5% of all records)
               (21 5%




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Does this Happen Here ?
  • Do you know of any cases where computer
    equipment was stolen from a hospital ? Did this
    equipment contain personal health information ?
  • Do you know if any cases where memory sticks with
    data on them were lost ?
  • Does anyone email data to their hotmail or gmail
    accounts so that they can access them from home
    or while travelling ?
  • Do people still share passwords ?



Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Known Data Leaks
  •      PHI on second hand computers
  •      Leaks through peer-to-peer file sharing networks
  •      PowerPoint files on th I t
         P     P i t fil     the Internet
                                        t
  •      Password protected files sent by email




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Identity Theft
  • William Ernst Black (Edmonton 1999)
  • The creation of identity packages using
    information about dead children who were
    living in one jurisdiction but died in another
    ($37k for each identity package)
  • Example: drug smuggler was caught with
    these identity packages
  • Example: American getting free medical care
    in Canada
    iC       d
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Patient Concerns
  • There is evidence (from surveys) that the general
    public has changed their behavior to adjust for
    perceived privacy risks wrt th i PHI
          idi          ik     t their PHI:
            – 15% to 17% of US adults
            – 11% to 13% of Canadian adults
  • There is also evidence that vulnerable populations
    exhibit similar behaviors (e.g., adolescents, people
    with HIV or at high risk for HIV, those undergoing
                                 HIV
    genetic testing, mental health patients and battered
    women)


Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Behavior Change - I
  • Going to another doctor
  • Paying out of pocket when insured to avoid
    disclosure
  • Not seeking care to avoid disclosure to an employer
    or to not be seen entering a clinic by other members
    of the community
  • Giving inaccurate or incomplete information on
    medical historyy
  • Asking a doctor not to record a health problem or
    record a less serious or embarrassing one


Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Behavior Change - II
  • 87% of US physicians reported that a patient
    had asked them not to include certain
    information in their record
  • 78% of US physicians reported that they
    have withheld information due to privacy
    concerns




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
S




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Asymmetry Principle - I
  • Trust is hard to gain but easy to lose:
            – Negative events/news carry more weight than
                 g                        y            g
              positive ones (negativity bias); it is more
              diagnostic
            – Avoiding loss – people weight negative
              information more greatly in an effort to avoid loss
            – Sources of negative information appear more
                             g                     pp
              credible (positive information seems self-serving)




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Asymmetry Principle - II
            – People interpret information according to their
              prior beliefs: if they have negative prior beliefs
              then
              th negative events will re-enforce that and
                         ti          t  ill     f     th t d
              positive events will have little impact
            – Undecided individuals tend to be affected more
              by negative information
            – People with positive prior beliefs may feel
              betrayed b negative i f
              bt      d by         ti information/events
                                              ti /      t




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Canadian Public - 2007
                   100

                     90

                     80

                     70

                     60
                                                             46                                                         44
                     50
                                                                                          40
                               39             37                                                         37                             35
                                                                           34
                     40

                     30

                     20

                     10

                      0
                              Total           BC          Alberta       Prairies          Ont           Que          Atlantic     Territories




                     In your opinion, how safe and secure is the health
                        y      p    ,
                             information which EXISTS about you?
                                                                          (5-7 on a 7 pt scale)


Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Canadian Public - 2003

                                                                                                                               Agree (5-7)
                                                                                                                                     (5 7)
                                                                                                                               Neither (4)
                                                                                                                               Disagree (1-3)
                                                                                                                               DK/NR


          0        10        20         30        40         50        60         70        80         90       100


                  I really worry that my personal health information
                      might be used for other purposes in the future
                        i ht b     df      th            i th f t
                           which have little to do with my health
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
How not to De identify
            De-identify
 • Just removing the name and address
   information is not enough
 • It is quite easy to re-identify
   individuals from the other data that is
   left
 • There are a number of public real life
                            p
   examples of re-identification actually
   happening


Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Example Data With PHI




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Types of Variables
 • Identifying variables: variables that
   can directly identify a patient
 • Quasi-identifiers: variables that can
   indirectly identify a patient
            y        yp
 • Sensitive variables: sensitive clinical
   information that the patient would not
                          p
   want to be known beyond the circle of
   care


Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
De identified
 De-identified Data ?




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Examples of Re-identification
            Re identification




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Examples of Re-identification
            Re identification




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Examples of Re-identification
            Re identification




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Examples of Re-identification
            Re identification




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
User #4417749
  •      “tea for good health”
  •      “numb fingers”, “hand tremors”
          numb fingers , hand tremors
  •      “dry mouth”
  •      “60 single men
          60        men”
  •      “dog that urinates on everything”
  •      “landscapers in Lilburn Ga”
          landscapers    Lilburn, Ga
  •      “homes sold in shadow lake subdivision
         gwinnett county georgia”
                          georgia

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Thelma Arnold
                                                                        • 62 year old widow
                                                                          living in Lilburn Ga
                                                                          re-identified by the
                                                                          New York Times
                                                                        • She has three dogs




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
What Happened Next ?
  • Maureen Govern, CTO of AOL “resigns”
  • Abdur Chowdhury, AOL researcher who
    released the data was fired
  • Abdur’s boss in the research
    department was fired
  • Big embarrassment for AOL
      g




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Examples of Re-identification
            Re identification




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Examples of Re-identification
            Re identification




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Examples of Re-identification
            Re identification




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Uniqueness in the US Population
 • Studies show that between 63% to
   87% of the US population is unique on
   their date of birth + ZIP code + gender
 • Uniqueness makes it q
      q                   quite easy to re-
                                   y
   identify individuals using a variety of
   techniques




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Uniqueness in Canadian Population
                                100%




                                80%
                         ques




                                60%
              Percent Uniq




                                40%




                                20%




                                 0%
                                                                                                          PC
                                                                                                          PC + Gender
                                                                                                          PC + DoB
                                        1       2         3          4         5         6
                                                                                                          PC + DoB + Gender
                                       Number of Characters in Postal Code
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Example
 • This example shows the risk of re-
   identification using just demographics




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Types of Disclosure
 • Identity Disclosure: being able to
   determine the identity associated with
   a record
 • Attribute Disclosure: discovering g
   something new about an individual
   known to be in the database




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Disclosure and Invasion-of-Privacy
                Invasion of Privacy
 • An important first criterion is deciding
   on the sensitivity of the data and the
   potential for harm to the patients from
   a secondary use/disclosure
 • If the invasion-of-privacy is deemed
   low then there may not be a need to
   de-identify the data



Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Invasion of Privacy
 Invasion-of-Privacy - I
 • The personal information in the Data is
   highly detailed
 • The information in the Data is of a
   highly sensitive and personal nature
     gy                 p
 • The information in the Data comes
   from a highly sensitive context
             gy
 • Many people would be affected if there
   was a Data breach or the Data was
   processed inappropriately by the
   recipient/agent
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Invasion of Privacy
 Invasion-of-Privacy - II
 • If there was a Data breach or the Data
   was processed inappropriately by the
   recipient/agent that may cause direct
   and quantifiable damages and
   measurable injury to the patients
 • If the recipient/agent is located in a
   different jurisdiction, there is a
   possibility, for practical purposes, that
   the data sharing agreement will be
   difficult to enforce
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Invasion of Privacy
 Invasion-of-Privacy – Consent - I
 • There is a provision in the relevant
   legislation permitting the
   disclosure/use of the Data without the
   consent of the patients
 • The Data was unsolicited or given
   freely or voluntarily by the patients
   with little expectation of it being
   maintained in total confidence


Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Invasion of Privacy
 Invasion-of-Privacy – Consent - II
 • The patients have provided express
   consent that their Data can be
   disclosed for this secondary Purpose
   when it was originally collected or at
   some point since then
 • The custodian has consulted well-
   defined groups or communities
   regarding the disclosure of the Data
   and had a positive response

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Invasion of Privacy
 Invasion-of-Privacy – Consent - III
 • A strategy for informing/notifying the
   public about potential disclosures for
   the recipient’s secondary Purpose was
   in place when the data was collected or
   since then
 • Obtaining consent from the individuals
   at this point is inappropriate or
   impractical


Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Identity Disclosure
 • Three common types:
           – Prosecutor risk
           – Journalist risk
           – Rareness
 • All three are concerned with the risk of
   re-identifying a single individual




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Prosecutor vs. Journalist
 • If all of the following is true then
   p
   prosecutor risk is relevant:
           – The data represents the whole population
             such that everyone is known to be in it or
             the sampling fraction is very high
           – If not the whole population, it is possible
             for an intruder to know that a particular
                                             p
             person has a record in the data
              • Patient may self-reveal
              • Data collection method is revealing
 • Otherwise journalist risk is relevant
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Prosecutor Risk - I
 • The intruder has background
   information about a specific individual
                         p
   known to be in the database
 • The amount of background information
   will depend on the intruder
 • The intruder is attempting to find the
   record belonging to that individual in
   the database


Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Prosecutor Risk - II
 • Examples of intruders:
           –     Neighbor
                    g
           –     Ex-spouse
           –     Employer
           –     Relative




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Example
    Date of Birth                         Gender                    Postal Code                       Diagnosis
      12/03/1957                                M                        K0J 1P0                               …
        01/7/1978                               M                        K0J 1P0                               …
      09/12/1968                                 F                       K0J 1P0                               …
      17/08/1987                                 F                      K0J 1P0                                …
      25/02/1974                                 F                       K0J 1T0                               …
      23/05/1985                                M                        K0J 1T0                               …
      14/03/1965                                 F                       K0J 2A0                               …




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Selecting Variables – Prosecutor - I
 • In the best case assumption, a
   neighbor would know:
      g
           – Address and telephone information about
             the VIP
           – Household and dwelling information
             (number of children, value of property,
             type of property)
           –KKey dates (births, deaths, weddings)
                  d t (bi th d th          ddi     )
           – Visible characteristics: gender, race,
             ethnicity, language spoken at home,
             weight, height, physical disabilities
           – Profession
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Selecting Variables – Prosecutor - II
 • What would an ex-spouse know:
           – The same things that a neighbor would
                           g             g
             know
           – Basic medical history (allergies, chronic
             diseases)
           – Income, years of schooling
 • All of these variables would be
   considered quasi-identifiers if they
   appear in the database


Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Journalist Risk
 • The journalist is not looking for a
   specific p
    p       person – re-identifying any
                                yg      y
   person will do
 • The journalist has access to a database
   that s/he can use for matching
 • This is called an identification database




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Journalist Matching Example
            Medical Database                                                             Identification DB




                                                                       DoB
                                                                       DB                          Name
                                        Clinical                      Initials
                                        and lab                                                  Address
                                        data                         Gender
                                                                                            Telephone No.
                                                                      Postal
                                                                      Code




                           Quasi-Identifiers
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Assessing Journalist Risk
  • In general, we want to know how rare
    the quasi-identifier values would be in
         q
    the population (e.g.,
    homeowners/professionals/civil
    servants i th geographic area of
            t in the         hi       f
    interest)
  • If the combination is not rare then
       th      bi ti    i    t     th
    there is small journalist risk


Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Selecting Variables – Journalist - I
 • Depends on what information can be
   obtained in an identification database
 • For an external intruder, likely
   variables are those available in public
   registries:
    egist ies
           –     Key dates (birth, death, marriage)
           –     Profession
           –     Home address and telephone number
           –     Type of dwelling
           –     Gender, ethnicity, race
           –     Income if a highly paid public servant
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Selecting Variables – Journalist - II
 • Assume that an internal intruder would
   be able to get all relevant
              g
   administrative data:
           – Key dates (birth, death, admission,
             discharge,
             discharge visit)
           – Gender, address, telephone number




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Inference of Variables - I
 • Even though a particular quasi-
   identifier may not be known to the
                 y
   intruder (prosecutor risk), available in
   an identification database (journalist),
   or available in the disclosed database
   (all three risks), it may be possible to
   infer it from other variables
 • Variables that can be inferred should
   be treated as quasi-identifiers


Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Inference of Variables - II
 • Inferred variables should be added to
   the disclosed database if they are not
                                 y
   there because they may be used in a
   re-identification attack, and you want
   to take them into account during risk
   assessment




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Inference Examples
 • Gender, ethnicity, religious origin from
   name
 • Age from graduation date
 • Profession from payer of insurance
   claim (e.g., civil servants have a single
   health insurer)
 • Age and gender from a diagnostic or
   lab code (e.g., mamogram or PSA test)


Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Rareness
  • If individuals are rare on the quasi-
    identifiers, then they are at higher
               ,         y          g
    prosecutor and journalist re-
    identification risk
  • If an individual has a rare and visible
    characteristic/feature, then that also
    makes th
        k them easier to re-identify (
                      it       id tif (eg,
    put an ad in the radio)


Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Attribute Disclosure
  • If there is very little variation on
    sensitive variables
  • The data set can represent a whole
    population or some subset
  • Learn something new about a person
    without actually finding which record
    belongs to them



Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
A Pragmatic Approach
 • It is important to ensure that the
   q
   quasi-identifiers are plausible for the
                          p
   data and the recipients of the data
 • If you select many quasi-identifiers
   then that will b definition inc ease the
                ill by          increase
   re-identification risk
 • Ideally each selected quasi-identifier
   Ideally,                 quasi identifier
   should be associated with a realistic re-
   identification scenario


Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Constructing an Identification DB
 • This may be a single physical database
   or a join of multiple sources together
   to construct a virtual database
 • It will have the quasi-identifiers as well
                     q
   as identity information, but will not
   have the sensitive information (e.g.,
   clinical or financial details)
 • The sources may be public and free,
   public and for a fee, or fully
      bli    df       f      f ll
   commercial
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Examples of Identification DBs - I
 • These are databases or sources
   (Canada):
           – Obituaries: available from newspapers and
             funeral homes; there are obituary
             aggregator sites that make this simple
                               h      kh          l
           – PPSR: Private Property Security
             Registration; contains information on loans
             secured by property (e.g., cars)
           – Land Registry: information on house
             ownership

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Examples of Identification DBs - II
           – Membership Lists: provide comprehensive
             listings of professionals (e.g., doctors,
             lawyers, civil servants)
           – Salary Disclosure Reports: provided by
             governments for those earning higher than
             a certain threshold
           – White Pages: public telephone directory
           – Job Sites: CVs posted in public and closed
             job web sites
           –DDonations: Di l
                   ti     Disclosures of donations to
                                        fd     ti   t
             political parties (include address)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Voter Lists - I
 • Cannot legally be used for purposes
   outside of an election (in Canada)
                           (          )
 • But, a charity allegedly supporting a
   terrorist group (Tamil Tigers) was
   found by
   fo nd b the RCMP to ha e Canadian
                           have
   voter lists
 • Volunteers do not necessarily destroy
   or dispose of the lists after an election
   (and in many cases do not sign
   anything b f
       thi    before th
                     they get them)
                             t th   )
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Voter Lists - II
 • It is not expensive (or difficult) to
   become a candidate in an election and
   get the voter list:
           –     Alberta: $500
           –     BC: $100
           –     NB: $100 (+nominated by 25 electors)
           –     Ontario: $100
                          $
           –     Quebec: 0$ (+nominated by 100 electors)
 • Canadian voter lists do not contain the
   DoB ( t)
   D B (yet)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Economics of Identification DBs
 • Some data sources have a fee for each
   individual record/search
 • This makes the cost of creating an
   identification database quite high
 • This may impose a large economic
   burden on an intruder and act as a
   deterrent from creating identification
   databases



Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Internal Identification Databases
 • An internal intruder may have access
   to administrative databases that can
   act as Identification DB
 • For example, in a hospital an internal
   intruder may ha e
   int de ma have access to all
   admissions; this is not sensitive data
   so is less protected but has enough
              p                        g
   demographics that it can be good as an
   identification database
 • Thi puts i t
   This    t internal i t d
                     l intruders at a huge
                                  th
   advantage
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Internal Access
 • An internal intruder can get access to
   such an administrative database:
           – had access in a previous position but that access
             was not revoked
           – people in the organization share access credentials,
             so the intruder can use someone else’s credentials
             to get the administrative database
           – has access as part of his/her job and there are no
             audit trails
           – internal systems are not well protected because
             internal people are trusted and intruder knows how
             to break-in the system to get the data
                break in


Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Public Registries
 • In the following slides I will explain
   how to create identification databases
   from public registries in Canada




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Professional Groups - I
        We can construct identification databases for specific
                         professional groups


                      Membership                                                                         PPSR
                        Lists




                                                              White Pages




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Professional Groups - II
  •      College of Physicians and Surgeons of Ontario
  •      Law Society of Upper Canada
  •      Professional Engineers O t i
         Pf      i   lE i       Ontario
  •      College of Occupational Therapists
  •      College of Physical Therapists
  •      Public servants (eg, GEDS)
  •      …….




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
What is the success rate ?
                                                                                                                              CPSO        LSUC

   •      Ability to get home postal codes (source: PPSR and                                                                    60%       45%
          telephone directory)
   •      Ability to get practice/firm postal codes (source:                                                                   100%       100%
          CPSO/LSUC)
   •      Ability to get date of birth (source: PPSR)                                                                           40%       45%

   •      Ability to get gender (source: CPSO/genderizing                                                                      100%       100%
          LSUC)
   •      Ability to get initials (source: CPSO/LSUC)                                                                          100%       100%




  Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
What is the success rate by gender?
                                                                                                                             CPSO        LSUC
                                                                          MALE
  •      Ability to get home postal codes (source: PPSR and                                                                    63%       48%
         telephone directory)
  •      Ability to get date of birth (source: PPSR)                                                                           45%       48%

                                                                       FEMALE
  •      Ability to get home postal codes (source: PPSR and                                                                    49%       40%
         telephone directory)
  •      Ability to get date of birth (source: PPSR)                                                                           29%       40%




 Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Homeowners
        We can construct identification databases for specific
                             postal codes



                       Canada                                                   Land                                                    PPSR
                        Post                                                   Registry




                                                                                                          White Pages




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
What is the success rate ?
                                                                                                                         Ott               To

   • Ability to get initials                                                                                            93%               100%

   • Ability to get DoB                                                                                                 33%               40%

   • Ability to get telephone number                                                                                    80%               50%

   • Ability to get gender                                                                                              87%               95%



  Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Re id
  Re-id Risk for Homeowners
  • The number of households per postal
    code is quite small (
             q          (Ott: 15; To: 20)
                                ;         )
  • The individuals (homeowners) were
    unique on common combinations of
    quasi-identifiers (eg, gender and DoB)
  • For these individuals re-identification
    risk is very high



Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Civil Servants - I
  • GEDS is on the Internet: Government
    Electronic Directory Services
  • There are 386,630 individuals in the
    federal government (159,652 in
    Ontario and 28 046 in Alberta)
                 28,046
  • GEDS has approx. 170,000 entries
  • Incomplete because: organizations can
    opt-out, some individuals need to opt-
    in, and some employees and orgs are
    exempted (
             d (eg, CSIS DND)
                    CSIS,
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Civil Servants - II
  • We selected a sample of 40 individuals
    in health care related federal
    departments in Ontario
  • Able to get home address for 50%,
    home telephone number for 40%,
    gender for 100%, DoB for 22.5%
  • Provincial governments have similar
    sources


Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Re identification
  Re-identification Threshold
  • There is a spectrum of re-identification
    risk
  • When does the probability of re-
    identification become so high that the
    information is deemed identifiable ?
  • Canadian privacy law tends not to be
    precise about this
  • Gordon case: serious possibility test


Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Canadian Definitions - I
Privacy Law                 Definition
Ontario PHIPA               “Identifying information” means information that identifies an
                            individual or for which it is reasonably foreseeable in the
                            circumstances that it could be utilized, either alone or with other
                            information, to identify an individual.
Nfld PPHI                   “Identifying information means information that identifies an
                             Identifying information”
                            individual or for which it is reasonably foreseeable in the
                            circumstances that it could be utilized either alone or together
                            with other information to identify an individual.
Sask THIPA                  “De-identified personal health information” means personal
                            health information from which any information that may
                            reasonably be expected to identify an individual has been
                            removed.
                            removed


     Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Canadian Definitions - II
Privacy Law                  Definition
Alberta HIA
  be a                       “Individually identifying” means that the identity o the individual
                                d dua y de y g ea s a e de y of e d dua
                             who is the subject of the information can be readily ascertained
                             from the information; “nonidentifying” means that the identity of
                             the individual who is the subject of the information cannot be
                             readily ascertained from the information
                                                           information.
NB PPIA                      “Identifiable individual” means an individual can be identified by
                             the contents of the information because the information includes
                             the individual s name, makes the individual s identity obvious, or
                                  individual’s name             individual’s        obvious
                             is likely in the circumstances to be combined with other
                             information that includes the individual’s name or makes the
                             individual’s identity obvious.



      Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Re identification
  Re-identification Risk Spectrum




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Re identification
  Re-identification Threshold
  • Privacy legislation treats the threshold
    in two ways:
              y
            – Discretionary/permitted disclosures and
              uses = threshold can be anywhere along
              the spectrum
            – Only de-identified information without
              consent = information id identifiable or
              not; there is no spectrum
  • Any systematic approach to dealing
    with thresholds must cover both
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Threshold Precedents - I
  • We will use healthcare precedents as
    an indication of the risk that society
                                         y
    has agreed to take:
            – The largest probability of re-identification
              that i
              th t is used in any policy or guideline
                          di         li       id li
              document in Canada or the US is 0.33
            – If the probability is > 0.33 then the
              information would certainly be considered
              identifiable


Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Threshold Precedents - II
            – The most common probability of re-
              identification used in disclosure control of
              health d t i 0 2 ( ll i
              h lth data is 0.2 (cell size of 5)
                                             f
            – It makes sense that a value of 0.2 would
              be used as a “default” risk
                              default
  • Below 0.33 there are many degrees of
    de-identification




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Example
  • The choice of threshold has a
    significant impact on risk assessment
      g           p
    results




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
De identification
 De-identification Techniques

                                                                   D1                         quasi
                                  identifying
                                         yg                                                   identifying
                                                                                                     yg
                                   variables                                                  variables




                                                                                                        D3
                           D2


                                                                                                                        Analytics
                                                                                Heuristics
Randomization                                Coding


                      Suppression




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Examples of Analytics
  • Table aggregation – disclose only
    summary tables
            y
  • Generalization
  • Record or variable suppression
                         pp
  • Geographic aggregation
  • Sub-sampling
    Sub sampling
  • Adding noise


Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Common De-identification Heuristic
         De identification
  • If geographic area has a small
    pp
    population, then:
              ,
            – Suppress all data from that area
            – Aggregate the geographic area
  • Applied for a variety of data sets,
    including public health data sets
  • For many applications this heuristic
    results in significant loss of data or
    imperils analysis

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Examples
  •      HIPAA: 20k rule
  •      Census Bureau: 100k rule
  •      Statistics Canada: 70k rule
  •      British Census: 120k rule




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
The Problem
  • Such generic rules ignore the specific
    variables that are included in a data
    set
  • A smaller cutoff should be used if few
    variables are in a data set
  • A larger cutoff should be used if many
    variables are in a data set



Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Automation - I




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Automation - II




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
20,000                             70,000                                100,000
                                Our GAPS Models
Province                                                                      Cutoff                             Cutoff                                Cutoff

                                   FSA               Pop               FSA               Pop              FSA               Pop              FSA             Pop

Alberta
Alb t                              55%               84%               38%              71%               1.4%
                                                                                                          1 4%               5%                    0             0

British Columbia                   68%               87%               46%              70%               1.1%               4%                    0             0

Manitoba                           59%               88%               39%              68%                  0                 0                   0             0

New Brunswick                      20%               51%              4.5%              19%                  0                 0                   0             0

Newfoundland                       55%               83%               30%              62%                  0                 0                   0             0

Nova Scotia                        47%               82%               16%              43%                  0                 0                   0             0

Ontario                            69%               91%               49%              76%               1.4%               5%              0.2%               1%

PEI                                57%               90%               43%              79%                  0                 0                   0             0

Quebec                             59%               84%               36%              63%                1%                5%             0.25%                0

Saskatchewan                       60%               93%               49%              84%                2%                7%                    0            2%




           Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Risk Methodology
  • De-identification by itself is not
    sufficient:
            – Using low thresholds results in rapid data
              quality deterioration
            – Using high thresholds is perceived as too
              risky
            – We want to create incentives for the data
              recipients to improve their security and
              privacy practices
  • M th d l
    Methodology allows you to select and
                    ll     t    lt     d
    justify a threshold
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Managing Re identification Risk
          Re-identification
                                                                      V     A




                                                         Amount of
                                                       De-identification

                                                                           -

                                                                 Risk
                                                               Exposure
                                                                 p
                                                -                                                 +
                                                                           +
                        Mitigating                          Invasion-of-                          Motives &
                        Controls                              Privacy
           V      A
                                                                                                  Capacity
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
The Tradeoffs
                                        Ability to Re-identify the Data
                                                     Low                                                 High
                                                                                                           g
    gating Controls
                  s




                                                balanced                                           dangerous
                      Low
           C




                                                                                                                                                     higher cost
                                                                                                                                                     burden on
                                                                                                                                                     data recipient
                      High
Mitig




                                           conservative                                              balanced




                                                              lower data quality
                       Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Steps in Risk Methodology
  • The methodology has two steps to
    evaluate the overall risks
  • First we determine the probability of a
    re-identification attempt
  • Then we determine the re-identification
    risk to use




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Determining Pr Re-identification Attempts




    Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Determining Risk Threshold to Use




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Implementation of Methodology
 • An important component of this
   methodology is the ability to audit the
               gy            y
   data recipient/agent receiving the data
 • Update audits are performed regularly
 • Data sharing agreements are put in
   place for external recipients and
   external agents (internal ones usually
   covered by employment agreements)
 • The elements in the security maturity
                                y        y
   profile are part of the data sharing
   agreement
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Compliance Audits
 • The audits use a publicly available
   checklist
 • Audit results would be generally
   accepted so that recipients do not need
   to get audited repeatedly for different
          a dited epeatedl fo diffe ent
   disclosures
 • Intended to be rapid (one or two day
   on-site) and cheap ($1k to $2k)



Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Example - Pharmacy Data
 • Request to CHEO for prescription data
   from a commercial data broker
 • Concern that this data could potentially
   identify patients
 • We performed a study to evaluate re-
   identification risk and come up with an
   anonymous version of the data




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Prescription Records Example
•   Patient age in days
                                                                          •      Gender
•   Patient gender
                                                                          •      Length of stay in days
•   Forward Sortation Area
•   Admission date                                                        •      Quarter and year of admission
•   Discharge date                                                        •      Patient’s region (first character of the
•   Diagnosis                                                                    postal code)
•   Dispensed drug                                                        •      Patient’s age in weeks
                                                                          •      Diagnosis
                                                                          •      Dispensed drug




                          •      Regular third party privacy/security audits
                          •      Breach notification protocols must be in place
                                 B     h    ifi  i           l        bi     l
                          •      Restrictions on further distribution of raw data
                          •      Data destruction provisions




         Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
An Example Deployment




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
An Example Deployment




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
An Example Deployment




Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

More Related Content

Viewers also liked

6.what is mind of man
6.what is mind of man 6.what is mind of man
6.what is mind of man Ashish Shukla
 
Slide share2013
Slide share2013Slide share2013
Slide share2013scscb
 
Invisible cloth streaming video presentation
Invisible cloth streaming video presentationInvisible cloth streaming video presentation
Invisible cloth streaming video presentationMarybeth O'Brien
 
The De-Identification of a Large Electronic Medical Records Database for Seco...
The De-Identification of a Large Electronic Medical Records Database for Seco...The De-Identification of a Large Electronic Medical Records Database for Seco...
The De-Identification of a Large Electronic Medical Records Database for Seco...Luk Arbuckle
 
Optical Camouflage Technology Latest 2
Optical Camouflage Technology Latest 2Optical Camouflage Technology Latest 2
Optical Camouflage Technology Latest 2neopreety56prateek
 
Optical camouflage abhinav.ppt
Optical camouflage abhinav.pptOptical camouflage abhinav.ppt
Optical camouflage abhinav.pptAbhinav Sagar
 

Viewers also liked (7)

6.what is mind of man
6.what is mind of man 6.what is mind of man
6.what is mind of man
 
Slide share2013
Slide share2013Slide share2013
Slide share2013
 
Invisible cloth streaming video presentation
Invisible cloth streaming video presentationInvisible cloth streaming video presentation
Invisible cloth streaming video presentation
 
The De-Identification of a Large Electronic Medical Records Database for Seco...
The De-Identification of a Large Electronic Medical Records Database for Seco...The De-Identification of a Large Electronic Medical Records Database for Seco...
The De-Identification of a Large Electronic Medical Records Database for Seco...
 
OPTICAL CAMOUFLAGE
OPTICAL CAMOUFLAGEOPTICAL CAMOUFLAGE
OPTICAL CAMOUFLAGE
 
Optical Camouflage Technology Latest 2
Optical Camouflage Technology Latest 2Optical Camouflage Technology Latest 2
Optical Camouflage Technology Latest 2
 
Optical camouflage abhinav.ppt
Optical camouflage abhinav.pptOptical camouflage abhinav.ppt
Optical camouflage abhinav.ppt
 

Similar to The De-identification of Clinical Data

Canadian AI 2014 Conference Keynote - Deploying SMC in Practice
Canadian AI 2014 Conference Keynote - Deploying SMC in PracticeCanadian AI 2014 Conference Keynote - Deploying SMC in Practice
Canadian AI 2014 Conference Keynote - Deploying SMC in PracticeKhaled El Emam
 
Towards Personalized Medicine in Estonia
Towards Personalized Medicine in EstoniaTowards Personalized Medicine in Estonia
Towards Personalized Medicine in EstoniaLeego
 
Sharing Health Research Data
Sharing Health Research DataSharing Health Research Data
Sharing Health Research DataKhaled El Emam
 
Hemsley ECU public lecture PCEHR 4 june 2015
Hemsley ECU public lecture PCEHR 4 june 2015Hemsley ECU public lecture PCEHR 4 june 2015
Hemsley ECU public lecture PCEHR 4 june 2015Bronwyn Hemsley
 
Big Data and Smart Healthcare
Big Data and Smart Healthcare Big Data and Smart Healthcare
Big Data and Smart Healthcare Sujan Perera
 
Universal Patient Identity: eliminating duplicate records, medical identity t...
Universal Patient Identity: eliminating duplicate records, medical identity t...Universal Patient Identity: eliminating duplicate records, medical identity t...
Universal Patient Identity: eliminating duplicate records, medical identity t...3GDR
 
Doc patientinternet
Doc patientinternetDoc patientinternet
Doc patientinternetAlex Sanchez
 
The Role of Laboratory Reports in the Adoption of Electronic Medical Records
The Role of Laboratory Reports in the Adoption of Electronic Medical RecordsThe Role of Laboratory Reports in the Adoption of Electronic Medical Records
The Role of Laboratory Reports in the Adoption of Electronic Medical Recordssmartlinkemr
 
Electronic Health Records: What Does The HITECH Act Teach Thailand?
Electronic Health Records: What Does The HITECH Act Teach Thailand?Electronic Health Records: What Does The HITECH Act Teach Thailand?
Electronic Health Records: What Does The HITECH Act Teach Thailand?Nawanan Theera-Ampornpunt
 
Ann Cavoukian Presentation
Ann Cavoukian PresentationAnn Cavoukian Presentation
Ann Cavoukian PresentationCityAge
 
EHRs, PHRs, EMRs: Making Sense of the Alphabet Soup
EHRs, PHRs, EMRs: Making Sense of the Alphabet SoupEHRs, PHRs, EMRs: Making Sense of the Alphabet Soup
EHRs, PHRs, EMRs: Making Sense of the Alphabet SoupCHI*Atlanta
 
Free online health resources 11 30-12
Free online health resources 11 30-12Free online health resources 11 30-12
Free online health resources 11 30-12evardell
 
Medical Identity Theft - What you Need to Know
Medical Identity Theft - What you Need to KnowMedical Identity Theft - What you Need to Know
Medical Identity Theft - What you Need to KnowRightPatient®
 
An Electronic Health Record ( Ehr )
An Electronic Health Record ( Ehr )An Electronic Health Record ( Ehr )
An Electronic Health Record ( Ehr )Tracy Huang
 

Similar to The De-identification of Clinical Data (20)

Canadian AI 2014 Conference Keynote - Deploying SMC in Practice
Canadian AI 2014 Conference Keynote - Deploying SMC in PracticeCanadian AI 2014 Conference Keynote - Deploying SMC in Practice
Canadian AI 2014 Conference Keynote - Deploying SMC in Practice
 
Towards Personalized Medicine in Estonia
Towards Personalized Medicine in EstoniaTowards Personalized Medicine in Estonia
Towards Personalized Medicine in Estonia
 
Sharing Health Research Data
Sharing Health Research DataSharing Health Research Data
Sharing Health Research Data
 
Hemsley ECU public lecture PCEHR 4 june 2015
Hemsley ECU public lecture PCEHR 4 june 2015Hemsley ECU public lecture PCEHR 4 june 2015
Hemsley ECU public lecture PCEHR 4 june 2015
 
IT Tools Supporting P4 Medicine
IT Tools Supporting P4 MedicineIT Tools Supporting P4 Medicine
IT Tools Supporting P4 Medicine
 
Big Data and Smart Healthcare
Big Data and Smart Healthcare Big Data and Smart Healthcare
Big Data and Smart Healthcare
 
Universal Patient Identity: eliminating duplicate records, medical identity t...
Universal Patient Identity: eliminating duplicate records, medical identity t...Universal Patient Identity: eliminating duplicate records, medical identity t...
Universal Patient Identity: eliminating duplicate records, medical identity t...
 
Health IT Beyond Hospitals
Health IT Beyond HospitalsHealth IT Beyond Hospitals
Health IT Beyond Hospitals
 
Cite colegio 2012
Cite colegio 2012Cite colegio 2012
Cite colegio 2012
 
Doc patientinternet
Doc patientinternetDoc patientinternet
Doc patientinternet
 
The Role of Laboratory Reports in the Adoption of Electronic Medical Records
The Role of Laboratory Reports in the Adoption of Electronic Medical RecordsThe Role of Laboratory Reports in the Adoption of Electronic Medical Records
The Role of Laboratory Reports in the Adoption of Electronic Medical Records
 
Electronic Health Records: What Does The HITECH Act Teach Thailand?
Electronic Health Records: What Does The HITECH Act Teach Thailand?Electronic Health Records: What Does The HITECH Act Teach Thailand?
Electronic Health Records: What Does The HITECH Act Teach Thailand?
 
Navigating the New Health Care Delivery System
Navigating the New Health Care Delivery SystemNavigating the New Health Care Delivery System
Navigating the New Health Care Delivery System
 
Ann Cavoukian Presentation
Ann Cavoukian PresentationAnn Cavoukian Presentation
Ann Cavoukian Presentation
 
hhn
hhnhhn
hhn
 
Dr. Ralph Hanson - NEHTA: Electronic Health Records
Dr. Ralph Hanson - NEHTA: Electronic Health RecordsDr. Ralph Hanson - NEHTA: Electronic Health Records
Dr. Ralph Hanson - NEHTA: Electronic Health Records
 
EHRs, PHRs, EMRs: Making Sense of the Alphabet Soup
EHRs, PHRs, EMRs: Making Sense of the Alphabet SoupEHRs, PHRs, EMRs: Making Sense of the Alphabet Soup
EHRs, PHRs, EMRs: Making Sense of the Alphabet Soup
 
Free online health resources 11 30-12
Free online health resources 11 30-12Free online health resources 11 30-12
Free online health resources 11 30-12
 
Medical Identity Theft - What you Need to Know
Medical Identity Theft - What you Need to KnowMedical Identity Theft - What you Need to Know
Medical Identity Theft - What you Need to Know
 
An Electronic Health Record ( Ehr )
An Electronic Health Record ( Ehr )An Electronic Health Record ( Ehr )
An Electronic Health Record ( Ehr )
 

More from Khaled El Emam

Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...
Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...
Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...Khaled El Emam
 
Facilitating Analytics while Protecting Privacy
Facilitating Analytics while Protecting PrivacyFacilitating Analytics while Protecting Privacy
Facilitating Analytics while Protecting PrivacyKhaled El Emam
 
Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and ...
Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and ...Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and ...
Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and ...Khaled El Emam
 
Anonymizing Health Data
Anonymizing Health DataAnonymizing Health Data
Anonymizing Health DataKhaled El Emam
 
Risk Based De-identification for Sharing Health Data
Risk Based De-identification for Sharing Health DataRisk Based De-identification for Sharing Health Data
Risk Based De-identification for Sharing Health DataKhaled El Emam
 
The Adoption of Personal Health Records by Consumers
The Adoption of Personal Health Records by ConsumersThe Adoption of Personal Health Records by Consumers
The Adoption of Personal Health Records by ConsumersKhaled El Emam
 
The Use of EDC in Canadian Clinical Trials
The Use of EDC in Canadian Clinical TrialsThe Use of EDC in Canadian Clinical Trials
The Use of EDC in Canadian Clinical TrialsKhaled El Emam
 

More from Khaled El Emam (7)

Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...
Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...
Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...
 
Facilitating Analytics while Protecting Privacy
Facilitating Analytics while Protecting PrivacyFacilitating Analytics while Protecting Privacy
Facilitating Analytics while Protecting Privacy
 
Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and ...
Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and ...Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and ...
Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and ...
 
Anonymizing Health Data
Anonymizing Health DataAnonymizing Health Data
Anonymizing Health Data
 
Risk Based De-identification for Sharing Health Data
Risk Based De-identification for Sharing Health DataRisk Based De-identification for Sharing Health Data
Risk Based De-identification for Sharing Health Data
 
The Adoption of Personal Health Records by Consumers
The Adoption of Personal Health Records by ConsumersThe Adoption of Personal Health Records by Consumers
The Adoption of Personal Health Records by Consumers
 
The Use of EDC in Canadian Clinical Trials
The Use of EDC in Canadian Clinical TrialsThe Use of EDC in Canadian Clinical Trials
The Use of EDC in Canadian Clinical Trials
 

Recently uploaded

Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...narwatsonia7
 
Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Gabriel Guevara MD
 
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowSonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowRiya Pathan
 
Call Girl Surat Madhuri 7001305949 Independent Escort Service Surat
Call Girl Surat Madhuri 7001305949 Independent Escort Service SuratCall Girl Surat Madhuri 7001305949 Independent Escort Service Surat
Call Girl Surat Madhuri 7001305949 Independent Escort Service Suratnarwatsonia7
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiNehru place Escorts
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingNehru place Escorts
 
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000aliya bhat
 
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.MiadAlsulami
 
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Miss joya
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdf
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdfHemostasis Physiology and Clinical correlations by Dr Faiza.pdf
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdfMedicoseAcademics
 
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Servicesonalikaur4
 
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking ModelsMumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Modelssonalikaur4
 
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceNehru place Escorts
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipurparulsinha
 
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...Miss joya
 
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safenarwatsonia7
 
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service LucknowVIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknownarwatsonia7
 

Recently uploaded (20)

Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
 
Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024
 
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowSonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
 
Call Girl Surat Madhuri 7001305949 Independent Escort Service Surat
Call Girl Surat Madhuri 7001305949 Independent Escort Service SuratCall Girl Surat Madhuri 7001305949 Independent Escort Service Surat
Call Girl Surat Madhuri 7001305949 Independent Escort Service Surat
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
 
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
 
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
 
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
 
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdf
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdfHemostasis Physiology and Clinical correlations by Dr Faiza.pdf
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdf
 
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
 
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking ModelsMumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
 
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
 
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCREscort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
 
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
 
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
 
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service LucknowVIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
 

The De-identification of Clinical Data

  • 1. De-identifying Clinical Data Khaled El Emam, CHEO RI & uOttawa
  • 2. www.ehealthinformation.ca Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 3. Secondary Use/Disclosure disclosure collection recipient individuals custodian agent t custodian use disclosure Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 4. Data Flows • Mandatory disclosures • Uses by an agent for secondary purposes • Permitted discretionary disclosures for secondary purposes • Other disclosures for secondary purposes Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 5. Obtaining Consent - I • Sometimes it is not possible or practical to obtain consent: – Making contact to obtain consent may reveal the individual’s condition to others against their wishes h h – The size of the population may be too large to obtain consent from everyone – Many patients may have relocated or died Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 6. Obtaining Consent - II – There may be a lack of existing or continuing relationship with the patients – There is a risk of inflicting psychological, social or other harm by contacting individuals or their families in delicate circumstances – It would be difficult to contact individuals through advertisements and other public notices Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 7. Impact of Obtaining Consent • In the case where explicit consent is used, consenters and non-consenters non consenters differ on: – age, sex, race, marital status, educational level, socioeconomic status, health status, mortality, lifestyle factors, functioning • The consent rate for express consent varied from 16% to 93% Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 8. Limiting Principles • Do not collect, use, or disclose PHI if other information will serve the purpose • For example, even if it is easier to p, disclose a whole record, that should not be done if lesser information will reasonably satisfy the purpose • De-identification would be one element in limiting the amount of PHI that is i li iti th tf th t i collected/used/disclosed Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 9. Breaches • In many large research hospitals and hospital networks it is simply not possible to control and manage all of the databases and data sets that are created, used, and disclosed for research • Breach frequency and severity is growing • D id tifi ti De-identification provides one way to id t manage the risks, however Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 10. Trust • Patients change their behavior if they perceive a threat to privacy • This can have a negative impact on the q quality of the data that is used for y research Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 11. Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 12. Deloitte Survey (2007) • N=827 respondents in North America • 43% reported more than 10 privacy breaches within the last 12 months in their organizations • Over 85% reported at least one privacy breach • Over 63% reported multiple privacy breaches requiring notification • Breaches involving 1000+ records were reported by 34% of respondents Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 13. Verizon Study • Based on forensic engagements conducted by Verizon • Breaches resulting from external sources: 73% • Caused by insiders: 18% • Implicated business partners: 39% • The median number of records involved in an e ed a u be o eco ds o ed a insider breach were 10 times more than an external breach • Bi Biggest causes are errors and hackers t dh k Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 14. Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 15. HIMSS Leadership Survey • Survey of healthcare IT executives, n=307 • Conducted in the 2007-2008 timeframe • 24% of respondents reported that they have had a security breach in their organization in the last 12 months • 16% of respondents reported that they have had a security breach in their organization in the last 6 months • Half indicated that an internal security breach is a concern to their organizations Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 16. HIMSS Analytics Report • IT executives and security officers at healthcare institutions; n=263 • Half of respondents are concerned with internal inadvertent access to patient data • 13% indicated that their organization has had a security breach in the last 12 months • 80% of these were internal breaches Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 17. Medical Record Breaches 2008 • For all of 2008 (datalossdb.org) • 83 breaches involving medical records (14% of total) • Approx. 7.2 million records involved in these breaches (21.5% of all records) (21 5% Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 18. Does this Happen Here ? • Do you know of any cases where computer equipment was stolen from a hospital ? Did this equipment contain personal health information ? • Do you know if any cases where memory sticks with data on them were lost ? • Does anyone email data to their hotmail or gmail accounts so that they can access them from home or while travelling ? • Do people still share passwords ? Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 19. Known Data Leaks • PHI on second hand computers • Leaks through peer-to-peer file sharing networks • PowerPoint files on th I t P P i t fil the Internet t • Password protected files sent by email Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 20. Identity Theft • William Ernst Black (Edmonton 1999) • The creation of identity packages using information about dead children who were living in one jurisdiction but died in another ($37k for each identity package) • Example: drug smuggler was caught with these identity packages • Example: American getting free medical care in Canada iC d Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 21. Patient Concerns • There is evidence (from surveys) that the general public has changed their behavior to adjust for perceived privacy risks wrt th i PHI idi ik t their PHI: – 15% to 17% of US adults – 11% to 13% of Canadian adults • There is also evidence that vulnerable populations exhibit similar behaviors (e.g., adolescents, people with HIV or at high risk for HIV, those undergoing HIV genetic testing, mental health patients and battered women) Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 22. Behavior Change - I • Going to another doctor • Paying out of pocket when insured to avoid disclosure • Not seeking care to avoid disclosure to an employer or to not be seen entering a clinic by other members of the community • Giving inaccurate or incomplete information on medical historyy • Asking a doctor not to record a health problem or record a less serious or embarrassing one Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 23. Behavior Change - II • 87% of US physicians reported that a patient had asked them not to include certain information in their record • 78% of US physicians reported that they have withheld information due to privacy concerns Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 24. S Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 25. Asymmetry Principle - I • Trust is hard to gain but easy to lose: – Negative events/news carry more weight than g y g positive ones (negativity bias); it is more diagnostic – Avoiding loss – people weight negative information more greatly in an effort to avoid loss – Sources of negative information appear more g pp credible (positive information seems self-serving) Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 26. Asymmetry Principle - II – People interpret information according to their prior beliefs: if they have negative prior beliefs then th negative events will re-enforce that and ti t ill f th t d positive events will have little impact – Undecided individuals tend to be affected more by negative information – People with positive prior beliefs may feel betrayed b negative i f bt d by ti information/events ti / t Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 27. Canadian Public - 2007 100 90 80 70 60 46 44 50 40 39 37 37 35 34 40 30 20 10 0 Total BC Alberta Prairies Ont Que Atlantic Territories In your opinion, how safe and secure is the health y p , information which EXISTS about you? (5-7 on a 7 pt scale) Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 28. Canadian Public - 2003 Agree (5-7) (5 7) Neither (4) Disagree (1-3) DK/NR 0 10 20 30 40 50 60 70 80 90 100 I really worry that my personal health information might be used for other purposes in the future i ht b df th i th f t which have little to do with my health Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 29. How not to De identify De-identify • Just removing the name and address information is not enough • It is quite easy to re-identify individuals from the other data that is left • There are a number of public real life p examples of re-identification actually happening Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 30. Example Data With PHI Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 31. Types of Variables • Identifying variables: variables that can directly identify a patient • Quasi-identifiers: variables that can indirectly identify a patient y yp • Sensitive variables: sensitive clinical information that the patient would not p want to be known beyond the circle of care Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 32. De identified De-identified Data ? Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 33. Examples of Re-identification Re identification Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 34. Examples of Re-identification Re identification Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 35. Examples of Re-identification Re identification Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 36. Examples of Re-identification Re identification Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 37. User #4417749 • “tea for good health” • “numb fingers”, “hand tremors” numb fingers , hand tremors • “dry mouth” • “60 single men 60 men” • “dog that urinates on everything” • “landscapers in Lilburn Ga” landscapers Lilburn, Ga • “homes sold in shadow lake subdivision gwinnett county georgia” georgia Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 38. Thelma Arnold • 62 year old widow living in Lilburn Ga re-identified by the New York Times • She has three dogs Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 39. What Happened Next ? • Maureen Govern, CTO of AOL “resigns” • Abdur Chowdhury, AOL researcher who released the data was fired • Abdur’s boss in the research department was fired • Big embarrassment for AOL g Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 40. Examples of Re-identification Re identification Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 41. Examples of Re-identification Re identification Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 42. Examples of Re-identification Re identification Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 43. Uniqueness in the US Population • Studies show that between 63% to 87% of the US population is unique on their date of birth + ZIP code + gender • Uniqueness makes it q q quite easy to re- y identify individuals using a variety of techniques Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 44. Uniqueness in Canadian Population 100% 80% ques 60% Percent Uniq 40% 20% 0% PC PC + Gender PC + DoB 1 2 3 4 5 6 PC + DoB + Gender Number of Characters in Postal Code Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 45. Example • This example shows the risk of re- identification using just demographics Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 46. Types of Disclosure • Identity Disclosure: being able to determine the identity associated with a record • Attribute Disclosure: discovering g something new about an individual known to be in the database Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 47. Disclosure and Invasion-of-Privacy Invasion of Privacy • An important first criterion is deciding on the sensitivity of the data and the potential for harm to the patients from a secondary use/disclosure • If the invasion-of-privacy is deemed low then there may not be a need to de-identify the data Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 48. Invasion of Privacy Invasion-of-Privacy - I • The personal information in the Data is highly detailed • The information in the Data is of a highly sensitive and personal nature gy p • The information in the Data comes from a highly sensitive context gy • Many people would be affected if there was a Data breach or the Data was processed inappropriately by the recipient/agent Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 49. Invasion of Privacy Invasion-of-Privacy - II • If there was a Data breach or the Data was processed inappropriately by the recipient/agent that may cause direct and quantifiable damages and measurable injury to the patients • If the recipient/agent is located in a different jurisdiction, there is a possibility, for practical purposes, that the data sharing agreement will be difficult to enforce Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 50. Invasion of Privacy Invasion-of-Privacy – Consent - I • There is a provision in the relevant legislation permitting the disclosure/use of the Data without the consent of the patients • The Data was unsolicited or given freely or voluntarily by the patients with little expectation of it being maintained in total confidence Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 51. Invasion of Privacy Invasion-of-Privacy – Consent - II • The patients have provided express consent that their Data can be disclosed for this secondary Purpose when it was originally collected or at some point since then • The custodian has consulted well- defined groups or communities regarding the disclosure of the Data and had a positive response Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 52. Invasion of Privacy Invasion-of-Privacy – Consent - III • A strategy for informing/notifying the public about potential disclosures for the recipient’s secondary Purpose was in place when the data was collected or since then • Obtaining consent from the individuals at this point is inappropriate or impractical Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 53. Identity Disclosure • Three common types: – Prosecutor risk – Journalist risk – Rareness • All three are concerned with the risk of re-identifying a single individual Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 54. Prosecutor vs. Journalist • If all of the following is true then p prosecutor risk is relevant: – The data represents the whole population such that everyone is known to be in it or the sampling fraction is very high – If not the whole population, it is possible for an intruder to know that a particular p person has a record in the data • Patient may self-reveal • Data collection method is revealing • Otherwise journalist risk is relevant Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 55. Prosecutor Risk - I • The intruder has background information about a specific individual p known to be in the database • The amount of background information will depend on the intruder • The intruder is attempting to find the record belonging to that individual in the database Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 56. Prosecutor Risk - II • Examples of intruders: – Neighbor g – Ex-spouse – Employer – Relative Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 57. Example Date of Birth Gender Postal Code Diagnosis 12/03/1957 M K0J 1P0 … 01/7/1978 M K0J 1P0 … 09/12/1968 F K0J 1P0 … 17/08/1987 F K0J 1P0 … 25/02/1974 F K0J 1T0 … 23/05/1985 M K0J 1T0 … 14/03/1965 F K0J 2A0 … Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 58. Selecting Variables – Prosecutor - I • In the best case assumption, a neighbor would know: g – Address and telephone information about the VIP – Household and dwelling information (number of children, value of property, type of property) –KKey dates (births, deaths, weddings) d t (bi th d th ddi ) – Visible characteristics: gender, race, ethnicity, language spoken at home, weight, height, physical disabilities – Profession Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 59. Selecting Variables – Prosecutor - II • What would an ex-spouse know: – The same things that a neighbor would g g know – Basic medical history (allergies, chronic diseases) – Income, years of schooling • All of these variables would be considered quasi-identifiers if they appear in the database Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 60. Journalist Risk • The journalist is not looking for a specific p p person – re-identifying any yg y person will do • The journalist has access to a database that s/he can use for matching • This is called an identification database Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 61. Journalist Matching Example Medical Database Identification DB DoB DB Name Clinical Initials and lab Address data Gender Telephone No. Postal Code Quasi-Identifiers Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 62. Assessing Journalist Risk • In general, we want to know how rare the quasi-identifier values would be in q the population (e.g., homeowners/professionals/civil servants i th geographic area of t in the hi f interest) • If the combination is not rare then th bi ti i t th there is small journalist risk Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 63. Selecting Variables – Journalist - I • Depends on what information can be obtained in an identification database • For an external intruder, likely variables are those available in public registries: egist ies – Key dates (birth, death, marriage) – Profession – Home address and telephone number – Type of dwelling – Gender, ethnicity, race – Income if a highly paid public servant Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 64. Selecting Variables – Journalist - II • Assume that an internal intruder would be able to get all relevant g administrative data: – Key dates (birth, death, admission, discharge, discharge visit) – Gender, address, telephone number Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 65. Inference of Variables - I • Even though a particular quasi- identifier may not be known to the y intruder (prosecutor risk), available in an identification database (journalist), or available in the disclosed database (all three risks), it may be possible to infer it from other variables • Variables that can be inferred should be treated as quasi-identifiers Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 66. Inference of Variables - II • Inferred variables should be added to the disclosed database if they are not y there because they may be used in a re-identification attack, and you want to take them into account during risk assessment Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 67. Inference Examples • Gender, ethnicity, religious origin from name • Age from graduation date • Profession from payer of insurance claim (e.g., civil servants have a single health insurer) • Age and gender from a diagnostic or lab code (e.g., mamogram or PSA test) Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 68. Rareness • If individuals are rare on the quasi- identifiers, then they are at higher , y g prosecutor and journalist re- identification risk • If an individual has a rare and visible characteristic/feature, then that also makes th k them easier to re-identify ( it id tif (eg, put an ad in the radio) Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 69. Attribute Disclosure • If there is very little variation on sensitive variables • The data set can represent a whole population or some subset • Learn something new about a person without actually finding which record belongs to them Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 70. A Pragmatic Approach • It is important to ensure that the q quasi-identifiers are plausible for the p data and the recipients of the data • If you select many quasi-identifiers then that will b definition inc ease the ill by increase re-identification risk • Ideally each selected quasi-identifier Ideally, quasi identifier should be associated with a realistic re- identification scenario Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 71. Constructing an Identification DB • This may be a single physical database or a join of multiple sources together to construct a virtual database • It will have the quasi-identifiers as well q as identity information, but will not have the sensitive information (e.g., clinical or financial details) • The sources may be public and free, public and for a fee, or fully bli df f f ll commercial Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 72. Examples of Identification DBs - I • These are databases or sources (Canada): – Obituaries: available from newspapers and funeral homes; there are obituary aggregator sites that make this simple h kh l – PPSR: Private Property Security Registration; contains information on loans secured by property (e.g., cars) – Land Registry: information on house ownership Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 73. Examples of Identification DBs - II – Membership Lists: provide comprehensive listings of professionals (e.g., doctors, lawyers, civil servants) – Salary Disclosure Reports: provided by governments for those earning higher than a certain threshold – White Pages: public telephone directory – Job Sites: CVs posted in public and closed job web sites –DDonations: Di l ti Disclosures of donations to fd ti t political parties (include address) Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 74. Voter Lists - I • Cannot legally be used for purposes outside of an election (in Canada) ( ) • But, a charity allegedly supporting a terrorist group (Tamil Tigers) was found by fo nd b the RCMP to ha e Canadian have voter lists • Volunteers do not necessarily destroy or dispose of the lists after an election (and in many cases do not sign anything b f thi before th they get them) t th ) Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 75. Voter Lists - II • It is not expensive (or difficult) to become a candidate in an election and get the voter list: – Alberta: $500 – BC: $100 – NB: $100 (+nominated by 25 electors) – Ontario: $100 $ – Quebec: 0$ (+nominated by 100 electors) • Canadian voter lists do not contain the DoB ( t) D B (yet) Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 76. Economics of Identification DBs • Some data sources have a fee for each individual record/search • This makes the cost of creating an identification database quite high • This may impose a large economic burden on an intruder and act as a deterrent from creating identification databases Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 77. Internal Identification Databases • An internal intruder may have access to administrative databases that can act as Identification DB • For example, in a hospital an internal intruder may ha e int de ma have access to all admissions; this is not sensitive data so is less protected but has enough p g demographics that it can be good as an identification database • Thi puts i t This t internal i t d l intruders at a huge th advantage Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 78. Internal Access • An internal intruder can get access to such an administrative database: – had access in a previous position but that access was not revoked – people in the organization share access credentials, so the intruder can use someone else’s credentials to get the administrative database – has access as part of his/her job and there are no audit trails – internal systems are not well protected because internal people are trusted and intruder knows how to break-in the system to get the data break in Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 79. Public Registries • In the following slides I will explain how to create identification databases from public registries in Canada Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 80. Professional Groups - I We can construct identification databases for specific professional groups Membership PPSR Lists White Pages Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 81. Professional Groups - II • College of Physicians and Surgeons of Ontario • Law Society of Upper Canada • Professional Engineers O t i Pf i lE i Ontario • College of Occupational Therapists • College of Physical Therapists • Public servants (eg, GEDS) • ……. Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 82. What is the success rate ? CPSO LSUC • Ability to get home postal codes (source: PPSR and 60% 45% telephone directory) • Ability to get practice/firm postal codes (source: 100% 100% CPSO/LSUC) • Ability to get date of birth (source: PPSR) 40% 45% • Ability to get gender (source: CPSO/genderizing 100% 100% LSUC) • Ability to get initials (source: CPSO/LSUC) 100% 100% Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 83. What is the success rate by gender? CPSO LSUC MALE • Ability to get home postal codes (source: PPSR and 63% 48% telephone directory) • Ability to get date of birth (source: PPSR) 45% 48% FEMALE • Ability to get home postal codes (source: PPSR and 49% 40% telephone directory) • Ability to get date of birth (source: PPSR) 29% 40% Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 84. Homeowners We can construct identification databases for specific postal codes Canada Land PPSR Post Registry White Pages Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 85. What is the success rate ? Ott To • Ability to get initials 93% 100% • Ability to get DoB 33% 40% • Ability to get telephone number 80% 50% • Ability to get gender 87% 95% Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 86. Re id Re-id Risk for Homeowners • The number of households per postal code is quite small ( q (Ott: 15; To: 20) ; ) • The individuals (homeowners) were unique on common combinations of quasi-identifiers (eg, gender and DoB) • For these individuals re-identification risk is very high Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 87. Civil Servants - I • GEDS is on the Internet: Government Electronic Directory Services • There are 386,630 individuals in the federal government (159,652 in Ontario and 28 046 in Alberta) 28,046 • GEDS has approx. 170,000 entries • Incomplete because: organizations can opt-out, some individuals need to opt- in, and some employees and orgs are exempted ( d (eg, CSIS DND) CSIS, Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 88. Civil Servants - II • We selected a sample of 40 individuals in health care related federal departments in Ontario • Able to get home address for 50%, home telephone number for 40%, gender for 100%, DoB for 22.5% • Provincial governments have similar sources Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 89. Re identification Re-identification Threshold • There is a spectrum of re-identification risk • When does the probability of re- identification become so high that the information is deemed identifiable ? • Canadian privacy law tends not to be precise about this • Gordon case: serious possibility test Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 90. Canadian Definitions - I Privacy Law Definition Ontario PHIPA “Identifying information” means information that identifies an individual or for which it is reasonably foreseeable in the circumstances that it could be utilized, either alone or with other information, to identify an individual. Nfld PPHI “Identifying information means information that identifies an Identifying information” individual or for which it is reasonably foreseeable in the circumstances that it could be utilized either alone or together with other information to identify an individual. Sask THIPA “De-identified personal health information” means personal health information from which any information that may reasonably be expected to identify an individual has been removed. removed Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 91. Canadian Definitions - II Privacy Law Definition Alberta HIA be a “Individually identifying” means that the identity o the individual d dua y de y g ea s a e de y of e d dua who is the subject of the information can be readily ascertained from the information; “nonidentifying” means that the identity of the individual who is the subject of the information cannot be readily ascertained from the information information. NB PPIA “Identifiable individual” means an individual can be identified by the contents of the information because the information includes the individual s name, makes the individual s identity obvious, or individual’s name individual’s obvious is likely in the circumstances to be combined with other information that includes the individual’s name or makes the individual’s identity obvious. Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 92. Re identification Re-identification Risk Spectrum Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 93. Re identification Re-identification Threshold • Privacy legislation treats the threshold in two ways: y – Discretionary/permitted disclosures and uses = threshold can be anywhere along the spectrum – Only de-identified information without consent = information id identifiable or not; there is no spectrum • Any systematic approach to dealing with thresholds must cover both Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 94. Threshold Precedents - I • We will use healthcare precedents as an indication of the risk that society y has agreed to take: – The largest probability of re-identification that i th t is used in any policy or guideline di li id li document in Canada or the US is 0.33 – If the probability is > 0.33 then the information would certainly be considered identifiable Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 95. Threshold Precedents - II – The most common probability of re- identification used in disclosure control of health d t i 0 2 ( ll i h lth data is 0.2 (cell size of 5) f – It makes sense that a value of 0.2 would be used as a “default” risk default • Below 0.33 there are many degrees of de-identification Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 96. Example • The choice of threshold has a significant impact on risk assessment g p results Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 97. De identification De-identification Techniques D1 quasi identifying yg identifying yg variables variables D3 D2 Analytics Heuristics Randomization Coding Suppression Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 98. Examples of Analytics • Table aggregation – disclose only summary tables y • Generalization • Record or variable suppression pp • Geographic aggregation • Sub-sampling Sub sampling • Adding noise Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 99. Common De-identification Heuristic De identification • If geographic area has a small pp population, then: , – Suppress all data from that area – Aggregate the geographic area • Applied for a variety of data sets, including public health data sets • For many applications this heuristic results in significant loss of data or imperils analysis Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 100. Examples • HIPAA: 20k rule • Census Bureau: 100k rule • Statistics Canada: 70k rule • British Census: 120k rule Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 101. The Problem • Such generic rules ignore the specific variables that are included in a data set • A smaller cutoff should be used if few variables are in a data set • A larger cutoff should be used if many variables are in a data set Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 102. Automation - I Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 103. Automation - II Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 104. 20,000 70,000 100,000 Our GAPS Models Province Cutoff Cutoff Cutoff FSA Pop FSA Pop FSA Pop FSA Pop Alberta Alb t 55% 84% 38% 71% 1.4% 1 4% 5% 0 0 British Columbia 68% 87% 46% 70% 1.1% 4% 0 0 Manitoba 59% 88% 39% 68% 0 0 0 0 New Brunswick 20% 51% 4.5% 19% 0 0 0 0 Newfoundland 55% 83% 30% 62% 0 0 0 0 Nova Scotia 47% 82% 16% 43% 0 0 0 0 Ontario 69% 91% 49% 76% 1.4% 5% 0.2% 1% PEI 57% 90% 43% 79% 0 0 0 0 Quebec 59% 84% 36% 63% 1% 5% 0.25% 0 Saskatchewan 60% 93% 49% 84% 2% 7% 0 2% Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 105. Risk Methodology • De-identification by itself is not sufficient: – Using low thresholds results in rapid data quality deterioration – Using high thresholds is perceived as too risky – We want to create incentives for the data recipients to improve their security and privacy practices • M th d l Methodology allows you to select and ll t lt d justify a threshold Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 106. Managing Re identification Risk Re-identification V A Amount of De-identification - Risk Exposure p - + + Mitigating Invasion-of- Motives & Controls Privacy V A Capacity Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 107. The Tradeoffs Ability to Re-identify the Data Low High g gating Controls s balanced dangerous Low C higher cost burden on data recipient High Mitig conservative balanced lower data quality Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 108. Steps in Risk Methodology • The methodology has two steps to evaluate the overall risks • First we determine the probability of a re-identification attempt • Then we determine the re-identification risk to use Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 109. Determining Pr Re-identification Attempts Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 110. Determining Risk Threshold to Use Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 111. Implementation of Methodology • An important component of this methodology is the ability to audit the gy y data recipient/agent receiving the data • Update audits are performed regularly • Data sharing agreements are put in place for external recipients and external agents (internal ones usually covered by employment agreements) • The elements in the security maturity y y profile are part of the data sharing agreement Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 112. Compliance Audits • The audits use a publicly available checklist • Audit results would be generally accepted so that recipients do not need to get audited repeatedly for different a dited epeatedl fo diffe ent disclosures • Intended to be rapid (one or two day on-site) and cheap ($1k to $2k) Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 113. Example - Pharmacy Data • Request to CHEO for prescription data from a commercial data broker • Concern that this data could potentially identify patients • We performed a study to evaluate re- identification risk and come up with an anonymous version of the data Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 114. Prescription Records Example • Patient age in days • Gender • Patient gender • Length of stay in days • Forward Sortation Area • Admission date • Quarter and year of admission • Discharge date • Patient’s region (first character of the • Diagnosis postal code) • Dispensed drug • Patient’s age in weeks • Diagnosis • Dispensed drug • Regular third party privacy/security audits • Breach notification protocols must be in place B h ifi i l bi l • Restrictions on further distribution of raw data • Data destruction provisions Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 115. An Example Deployment Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 116. An Example Deployment Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
  • 117. An Example Deployment Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca