SlideShare a Scribd company logo
1 of 32
Download to read offline
Feedback-Based Annotation, Selection and Refinement of
          Schema Mappings for Dataspaces

       Khalid Belhajjame, Norman W. Paton, Suzanne M. Embury,
             Alvaro A. A. Fernandes, and Cornelia Hedeler




                             EDBT/ICDT	
  2010	
                1	
  
Data	
  Integra2on	
  

                         What	
  are	
  the	
  available	
  proteins	
  of	
  the	
  Fruit	
  Fly?	
  	
  
          Scien2st	
  

                                                 Integra2on	
  
                                                   Schema	
  

                  Mappings	
  




PedroDB	
                      PepSeeker	
                                 Pride	
                  GPMDB	
  




                                                   EDBT/ICDT	
  2010	
                                          2	
  
Towards	
  Pay-­‐as-­‐you-­‐go	
  Data	
  Integra2on	
  

  Data	
  Integra*on	
  
    –  SeKng	
  up	
  a	
  data	
  integra2on	
  system	
  requires	
  significant	
  upfront	
  effort	
  
    –  The	
  specifica2on	
  of	
  schema	
  mappings	
  has	
  proved	
  to	
  be	
  2me	
  and	
  
       resource	
  consuming:	
  it	
  requires	
  deep	
  knowledge	
  of	
  the	
  sources	
  to	
  be	
  
       integrated	
  as	
  well	
  as	
  the	
  user’s	
  requirements.	
  

  Dataspaces:	
  a	
  Pay-­‐as-­‐you-­‐go	
  Data	
  Integra*on	
  [Franklin	
  et	
  al.	
  2005]	
  
    –  Reduce	
  the	
  up-­‐front	
  cost	
  required	
  to	
  setup	
  a	
  data	
  integra2on	
  system:	
  
       Provide	
  some	
  services	
  immediately	
  
    –  Gradually	
  improve	
  the	
  services	
  provided	
  by	
  the	
  system	
  through	
  
       interac2on	
  with	
  end	
  users	
  in	
  a	
  pay-­‐as-­‐you-­‐go	
  fashion.	
  

   M.	
  J.	
  Franklin,	
  A.	
  Y.	
  Halevy,	
  and	
  D.	
  Maier.	
  From	
  databases	
  to	
  dataspaces:	
  a	
  new	
  abstrac2on	
  for	
  informa2on	
  
   management.	
  SIGMOD	
  Record,	
  34(4):27–33,	
  2005.	
  
                                                                       EDBT/ICDT	
  2010	
                                                                            3	
  
Pay-­‐as-­‐you-­‐go	
  Data	
  Integra2on	
  
                            What	
  are	
  the	
  available	
  proteins	
  of	
  the	
  Fruit	
  Fly?	
  	
  
           Scien2st	
  


                                                    Integra2on	
  
                                                      Schema	
  
                                                                                                    Bootstrap	
  
                                                                                                    Dataspaces	
  
                          Mappings	
  




        PedroDB	
                 PepSeeker	
                             Pride	
                      GPMDB	
  


Objec2ve	
  of	
  the	
  present	
  work:	
  	
  
Inves2gate	
  Pay-­‐as-­‐you-­‐go	
  Annota2on,	
  Selec2on,	
  and	
  Refinement	
  of	
  Schema	
  Mappings	
  
                                                       EDBT/ICDT	
  2010	
                                           4	
  
Pay-­‐as-­‐you-­‐go	
  Data	
  Integra2on	
  
 We consider that integration schema and source schemas are relational,
 and that the schema mappings that define the extent of the relations in the
 integration schema, r, are global as view mappings of the form:

                                 m = ⟨r,qs⟩
 where qs is a relational query over the source schemas.


 A relation in the integration schema can be associated with multiple
candidate mappings: We consider a setting in which multiple matching
mechanisms can be used, each of which could give rise to multiple mapping
candidates for populating the same relation of the integration schema.


                                 EDBT/ICDT	
  2010	
                       5	
  
Outline	
  


  User	
  Feedback	
  


  Annota*on	
  of	
  Schema	
  Mappings	
  


  Selec*on	
  of	
  Schema	
  Mappings	
  Based	
  on	
  User	
  Requirements	
  


  Refinement	
  of	
  Schema	
  Mappings	
  	
  



                                    EDBT/ICDT	
  2010	
                          6	
  
User	
  Feedback	
  
  Query:	
  What	
  are	
  the	
  available	
  fruit	
  fly	
  proteins?	
  
  Results:	
  


                                                                               Feedback	
  

                                                                                   ✔	
  

                                                                                   ✖	
  



                                                                                   ✖	
  

                                                                                   ✔	
  




                                                  EDBT/ICDT	
  2010	
                         7	
  
User	
  Feedback	
  (cont.)	
  

	
     Let	
  m	
  be	
  a	
  candidate	
  mapping,	
  and	
  UF	
  a	
  set	
  of	
  feedback	
  instances	
  UF	
  
       supplied	
  by	
  the	
  user:	
  	
  

  tp(m,UF):	
  the	
  tuples	
  that	
  are	
  expected	
  by	
  the	
  user	
  and	
  that	
  are	
  retrieved	
  
   by	
  the	
  mapping	
  m.	
  

  fp(m,UF):	
  the	
  tuples	
  that	
  are	
  not	
  expected	
  by	
  the	
  user	
  and	
  that	
  are	
  
   retrieved	
  by	
  the	
  mapping	
  m.	
  	
  

  fn(m,UF):	
  the	
  tuples	
  that	
  are	
  expected	
  by	
  the	
  user	
  and	
  are	
  not	
  retrieved	
  
   by	
  the	
  mapping	
  m.	
  



                                                       EDBT/ICDT	
  2010	
                                              8	
  
Outline	
  


 User	
  Feedback	
  


  Annota*on	
  of	
  Schema	
  Mappings	
  


  Selec*on	
  of	
  Schema	
  Mappings	
  Based	
  on	
  User	
  Requirements	
  


  Refinement	
  of	
  Schema	
  Mappings	
  	
  




                                    EDBT/ICDT	
  2010	
                          9	
  
Annota2ng	
  Mappings	
  
Using	
  a	
  simple	
  annota*on	
  scheme,	
  a	
  schema	
  mapping	
  can	
  be	
  
annotated	
  as:	
  
 Correct	
  
 	
  
 Incorrect	
  
 	
  


The	
  set	
  of	
  schema	
  mappings	
  is	
  likely	
  to	
  be	
  incomplete,	
  and,	
  
therefore,	
  we	
  may	
  end	
  up	
  annota2ng	
  all	
  mappings	
  as	
  incorrect.	
  

Because	
  of	
  this,	
  we	
  use	
  a	
  less	
  stringent	
  scheme	
  mapping	
  
annota2on.	
  	
  

                                           EDBT/ICDT	
  2010	
                                  10	
  
Annota2ng	
  Mappings	
  (cont.)	
  
Instead,	
  we	
  use	
  and	
  adapt	
  the	
  no2ons	
  of	
  precision	
  and	
  recall	
  
used	
  in	
  informa2on	
  retrieval	
  to	
  measure	
  the	
  quality	
  of	
  a	
  
mapping.	
  

 Precision:	
  
 	
  


 Recall:	
  
 	
  


 F	
  measure:	
  
 	
  


                                           EDBT/ICDT	
  2010	
                                   11	
  
Mapping	
  Annota2on:	
  Valida2on	
  

Ques*ons:	
  	
  

     –  How	
  much	
  user	
  feedback	
  is	
  required	
  for	
  approxima8ng	
  the	
  
        real	
  precision	
  and	
  recall,	
  i.e.,	
  those	
  based	
  on	
  complete	
  
        knowledge	
  of	
  the	
  expected	
  results?	
  

     –  Does	
  the	
  pay-­‐as-­‐you-­‐go	
  philosophy	
  hold?	
  




                                         EDBT/ICDT	
  2010	
                              12	
  
Mapping	
  Annota2on:	
  Valida2on	
  (cont.)	
  

Experiment:	
  
  Data:	
  
    –  Two	
  datasets:	
  the	
  Mondial	
  geographical	
  database	
  and	
  the	
  Amalgam	
  
       data	
  integra2on	
  benchmark	
  
    –  Candidate	
  schema	
  mappings:	
  created	
  using	
  the	
  IBM	
  Infosphere	
  Data	
  
       Architect.	
  	
  

  Process:	
  we	
  applied	
  the	
  two-­‐step	
  process	
  illustrated	
  below	
  for	
  mul2ple	
  
   itera2ons.	
  
    1.  Generate	
  a	
  sample	
  feedback	
  instances.	
  
    2.  Compute	
  the	
  rela2ve	
  precision	
  and	
  recall	
  of	
  the	
  candidate	
  mappings	
  
         given	
  cumula2ve	
  feedback.	
  


                                              EDBT/ICDT	
  2010	
                                       13	
  
Mapping	
  Annota2on:	
  Error	
  in	
  Precision	
  
Error	
  




                      EDBT/ICDT	
  2010	
          14	
  
Mapping	
  Annota2on:	
  Error	
  in	
  Recall	
  
Error	
  




                        EDBT/ICDT	
  2010	
             15	
  
Outline	
  


 User	
  Feedback	
  


 Annota*on	
  of	
  Schema	
  Mappings	
  


  Selec*on	
  of	
  Schema	
  Mappings	
  Based	
  on	
  User	
  Requirements	
  


  Refinement	
  of	
  Schema	
  Mappings	
  	
  




                                    EDBT/ICDT	
  2010	
                          16	
  
Mapping	
  Selec2on	
  

  Mapping	
  selec2on	
  should	
  be	
  tailored	
  to	
  meet	
  user	
  requirements.	
  

  We	
  use	
  a	
  selec2on	
  method	
  that	
  aims	
  to	
  maximise	
  the	
  recall	
  such	
  that	
  the	
  
   precision	
  of	
  the	
  results	
  is	
  higher	
  than	
  a	
  given	
  precision	
  threshold.	
  

  We	
  cast	
  this	
  selec2on	
  problem	
  as	
  a	
  search	
  problem	
  that	
  aims	
  to	
  maximise	
  the	
  
   following	
  u2lity	
  func2on:	
  




    D.	
  A.	
  Menascé	
  and	
  V.	
  Dubey.	
  U2lity-­‐based	
  qos	
  brokering	
  in	
  service	
  oriented	
  architectures.	
  In	
  ICWS,	
  pages	
  
    422–430.	
  IEEE	
  CS,	
  2007.	
  

                                                                     EDBT/ICDT	
  2010	
                                                                          17	
  
Mapping	
  Selec2on	
  

  Mapping	
  selec2on	
  should	
  be	
  tailored	
  to	
  meet	
  user	
  requirements.	
  

  We	
  use	
  a	
  selec2on	
  method	
  that	
  aims	
  to	
  maximise	
  the	
  recall	
  such	
  that	
  the	
  
   precision	
  of	
  the	
  results	
  is	
  higher	
  than	
  a	
  given	
  precision	
  threshold.	
  

  We	
  cast	
  this	
  selec2on	
  problem	
  as	
  a	
  search	
  problem	
  that	
  aims	
  to	
  maximise	
  the	
  
   following	
  u2lity	
  func2on:	
  




    D.	
  A.	
  Menascé	
  and	
  V.	
  Dubey.	
  U2lity-­‐based	
  qos	
  brokering	
  in	
  service	
  oriented	
  architectures.	
  In	
  ICWS,	
  pages	
  
    422–430.	
  IEEE	
  CS,	
  2007.	
  

                                                                     EDBT/ICDT	
  2010	
                                                                          18	
  
Mapping	
  Selec2on:	
  Precision	
  



                        Do	
  we	
  meet	
  precision	
  requirement,	
  	
  
  i.e.,	
  is	
  the	
  precision	
  threshold	
  set	
  by	
  the	
  user	
  respected?	
  




                                     EDBT/ICDT	
  2010	
                                       19	
  
Mapping	
  Selec2on:	
  Precision	
  




               EDBT/ICDT	
  2010	
      20	
  
Mapping	
  Selec2on:	
  Recall	
  



              Do	
  we	
  get	
  some	
  benefits	
  for	
  recall,	
  	
  
  i.e.,	
  does	
  the	
  method	
  we	
  use	
  maximise	
  the	
  recall?	
  




                              EDBT/ICDT	
  2010	
                                 21	
  
Mapping	
  Selec2on:	
  Recall	
  




             EDBT/ICDT	
  2010	
     22	
  
Outline	
  


 User	
  Feedback	
  


 Annota*on	
  of	
  Schema	
  Mappings	
  


 Selec*on	
  of	
  Schema	
  Mappings	
  Based	
  on	
  User	
  Requirements	
  


  Refinement	
  of	
  Schema	
  Mappings	
  	
  




                                    EDBT/ICDT	
  2010	
                         23	
  
Mapping	
  Refinement	
  

	
     We	
  dis2nguish	
  two	
  kinds	
  of	
  refinement:	
  	
  

  Mapping	
  refinement	
  that	
  seeks	
  to	
  reduce	
  the	
  number	
  of	
  false	
  posi2ves	
  
	
   A	
  candidate	
  mapping	
  is	
  refined	
  by	
  modifying	
  a	
  source	
  query	
  so	
  that	
  the	
  
     number	
  of	
  false	
  posi2ves	
  it	
  returns	
  is	
  reduced.	
  	
  

  Mapping	
  refinement	
  that	
  aims	
  to	
  increase	
  the	
  number	
  of	
  true	
  posi2ves	
  
	
     A	
  candidate	
  mapping	
  m	
  is	
  refined	
  by	
  modifying	
  a	
  source	
  query	
  so	
  that	
  
       the	
  number	
  of	
  true	
  posi2ves	
  it	
  returns	
  is	
  increased.	
  	
  




                                                     EDBT/ICDT	
  2010	
                                             24	
  
Mapping	
  Refinement:	
  Example	
  
                                                            I Want Fruit fly
                                                               proteins
Integration           Protein
schema                  Accession     name           gene




              m = <Protein, ProteinEntry>

Source
schema




                                    EDBT/ICDT	
  2010	
                        25	
  
Mapping	
  Refinement:	
  The	
  Space	
  of	
  Solu2ons	
  

    The	
  space	
  of	
  solu2ons	
  is	
  composed	
  of	
  the	
  mappings	
  that	
  can	
  be	
  constructed	
  
    out	
  of	
  the	
  candidate	
  mappings.	
  Specifically:,	
  by	
  

    i. Joining	
  the	
  source	
  query	
  of	
  a	
  candidate	
  mapping.	
  
      	
  

    ii. Augmen2ng	
  the	
  source	
  query	
  of	
  a	
  candidate	
  mapping	
  with	
  a	
  selec2on	
  
       	
  
    condi2on.	
  

    iii. Relaxing	
  the	
  selec2on	
  condi2on	
  of	
  the	
  source	
  query	
  of	
  a	
  candidate	
  
        	
  
    mapping.	
  

    iv. Combining	
  the	
  source	
  queries	
  of	
  two	
  or	
  more	
  mappings	
  using	
  union,	
  
       	
  
    difference	
  and	
  intersec2on.	
  


15/04/2009	
                                           Khalid	
                                                  26	
  
Exploring	
  the	
  Space	
  of	
  Solu2ons	
  

  The	
  space	
  of	
  mappings	
  that	
  can	
  be	
  obtained	
  by	
  refinement	
  is	
  
   poten2ally	
  large.	
  	
  


  A	
  search	
  algorithm	
  that	
  explores	
  the	
  whole	
  space	
  of	
  the	
  possible	
  
   mappings	
  may	
  not	
  be	
  able	
  to	
  find	
  a	
  solu2on	
  in	
  a	
  bounded	
  2me.	
  


  In	
  the	
  context	
  of	
  the	
  present	
  work,	
  we	
  used	
  an	
  evolu*onary	
  
   algorithm	
  for	
  exploring	
  the	
  space	
  of	
  mappings	
  that	
  can	
  be	
  obtained	
  
   by	
  refinement.	
  


15/04/2009	
                                      Khalid	
                                               27	
  
Mapping	
  Refinement	
  Algorithm	
  




               EDBT/ICDT	
  2010	
      28	
  
Mapping	
  Refinement:	
  Valida2on	
  

  Ques*on:	
  
	
   	
  Can	
  mapping	
  refinement	
  improve	
  the	
  quality	
  of	
  ini8al	
  candidate	
  
     mappings,	
  and,	
  if	
  so,	
  at	
  what	
  cost,	
  i.e.,	
  what	
  is	
  the	
  amount	
  of	
  user	
  
     feedback	
  required?	
  

  Experiment:	
  To	
  answer	
  the	
  above	
  ques2on	
  we	
  applied	
  the	
  
   following	
  process	
  for	
  mul2ple	
  itera2ons.	
  
    1) Generate	
  a	
  sample	
  of	
  feedback	
  instances.	
  
    2) Annotate	
  the	
  set	
  of	
  candidate	
  mappings.	
  
    3) Refine	
  candidate	
  mappings	
  using	
  the	
  RefineMappings	
  algorithm.	
  


                                                    EDBT/ICDT	
  2010	
                                                29	
  
Mapping	
  Refinement:	
  Valida2on	
  (cont.)	
  




                    EDBT/ICDT	
  2010	
             30	
  
Conclusions	
  
	
     Pay-­‐as-­‐you-­‐go	
  Annota*on	
  of	
  Schema	
  Mappings	
  
  We	
  showed	
  how	
  schema	
  mappings	
  can	
  be	
  incrementally	
  annotated	
  based	
  
   on	
  feedback	
  supplied	
  by	
  end	
  users.	
  
  We	
  also	
  showed	
  through	
  an	
  evalua2on	
  exercise	
  that	
  the	
  more	
  feedback	
  
   the	
  user	
  supplies,	
  the	
  bemer	
  is	
  the	
  quality	
  of	
  the	
  mapping	
  annota2on	
  
   computed.	
  	
  


	
  	
   Applica*on:	
  Selec*on	
  and	
  Refinement	
  of	
  Schema	
  Mappings	
  
       in	
  Dataspaces	
  
  Mapping	
  annota2on	
  computed	
  based	
  on	
  user	
  feedback	
  are	
  used	
  as	
  input	
  
   for	
  enabling	
  the	
  selec2on	
  and	
  the	
  refinement	
  of	
  schema	
  mappings.	
  
  The	
  evalua2on	
  exercises	
  also	
  showed	
  that	
  mapping	
  refinement	
  is	
  more	
  
   cost	
  effec2ve	
  in	
  the	
  first	
  feedback	
  itera2ons.	
  	
  	
  	
  

                                                EDBT/ICDT	
  2010	
                                            31	
  
Feedback-Based Annotation, Selection and Refinement of
          Schema Mappings for Dataspaces

       Khalid Belhajjame, Norman W. Paton, Suzanne M. Embury,
             Alvaro A. A. Fernandes, and Cornelia Hedeler




                             EDBT/ICDT	
  2010	
                32	
  

More Related Content

Viewers also liked

Intégration incrémentale de données (Valenciennes juin 2010)
Intégration incrémentale de données (Valenciennes juin 2010)Intégration incrémentale de données (Valenciennes juin 2010)
Intégration incrémentale de données (Valenciennes juin 2010)Khalid Belhajjame
 
Linking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scriptsLinking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scriptsKhalid Belhajjame
 
A Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its ExtensionsA Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its ExtensionsKhalid Belhajjame
 
Research Object Model in Sepublica
Research Object Model in SepublicaResearch Object Model in Sepublica
Research Object Model in SepublicaKhalid Belhajjame
 
Detecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsDetecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsKhalid Belhajjame
 

Viewers also liked (8)

Intégration incrémentale de données (Valenciennes juin 2010)
Intégration incrémentale de données (Valenciennes juin 2010)Intégration incrémentale de données (Valenciennes juin 2010)
Intégration incrémentale de données (Valenciennes juin 2010)
 
Linking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scriptsLinking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scripts
 
Credible workshop
Credible workshopCredible workshop
Credible workshop
 
A Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its ExtensionsA Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its Extensions
 
Edbt2014 talk
Edbt2014 talkEdbt2014 talk
Edbt2014 talk
 
Research Object Model in Sepublica
Research Object Model in SepublicaResearch Object Model in Sepublica
Research Object Model in Sepublica
 
Anr cair meeting feb 2016
Anr cair meeting feb 2016Anr cair meeting feb 2016
Anr cair meeting feb 2016
 
Detecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsDetecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow Results
 

Similar to Edbt 2010, Belhajjame

Performance of Weighted Least Square Filter Based Pan Sharpening using Fuzzy ...
Performance of Weighted Least Square Filter Based Pan Sharpening using Fuzzy ...Performance of Weighted Least Square Filter Based Pan Sharpening using Fuzzy ...
Performance of Weighted Least Square Filter Based Pan Sharpening using Fuzzy ...IRJET Journal
 
Vibration Analysis for condition Monitoring & Predictive Maintenance using Em...
Vibration Analysis for condition Monitoring & Predictive Maintenance using Em...Vibration Analysis for condition Monitoring & Predictive Maintenance using Em...
Vibration Analysis for condition Monitoring & Predictive Maintenance using Em...IRJET Journal
 
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy LogicImproved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy LogicIRJET Journal
 
IRJET - A Research on Eloquent Salvation and Productive Outsourcing of Massiv...
IRJET - A Research on Eloquent Salvation and Productive Outsourcing of Massiv...IRJET - A Research on Eloquent Salvation and Productive Outsourcing of Massiv...
IRJET - A Research on Eloquent Salvation and Productive Outsourcing of Massiv...IRJET Journal
 
Fruit Classification and Quality Prediction using Deep Learning Methods
Fruit Classification and Quality Prediction using Deep Learning MethodsFruit Classification and Quality Prediction using Deep Learning Methods
Fruit Classification and Quality Prediction using Deep Learning MethodsIRJET Journal
 
IRJET- Smart Classroom Attendance System: Survey
IRJET- Smart Classroom Attendance System: SurveyIRJET- Smart Classroom Attendance System: Survey
IRJET- Smart Classroom Attendance System: SurveyIRJET Journal
 
AUTOMATED WASTE MANAGEMENT SYSTEM
AUTOMATED WASTE MANAGEMENT SYSTEMAUTOMATED WASTE MANAGEMENT SYSTEM
AUTOMATED WASTE MANAGEMENT SYSTEMIRJET Journal
 
IRJET - Finger Vein Extraction and Authentication System for ATM
IRJET -  	  Finger Vein Extraction and Authentication System for ATMIRJET -  	  Finger Vein Extraction and Authentication System for ATM
IRJET - Finger Vein Extraction and Authentication System for ATMIRJET Journal
 
IRJET - Review of Various Multi-Focus Image Fusion Methods
IRJET - Review of Various Multi-Focus Image Fusion MethodsIRJET - Review of Various Multi-Focus Image Fusion Methods
IRJET - Review of Various Multi-Focus Image Fusion MethodsIRJET Journal
 
HOUSE PRICE ESTIMATION USING DATA SCIENCE AND ML
HOUSE PRICE ESTIMATION USING DATA SCIENCE AND MLHOUSE PRICE ESTIMATION USING DATA SCIENCE AND ML
HOUSE PRICE ESTIMATION USING DATA SCIENCE AND MLIRJET Journal
 
Diabetic Retinopathy Detection
Diabetic Retinopathy DetectionDiabetic Retinopathy Detection
Diabetic Retinopathy DetectionIRJET Journal
 
IRJET- Face Recognition using Landmark Estimation and Convolution Neural Network
IRJET- Face Recognition using Landmark Estimation and Convolution Neural NetworkIRJET- Face Recognition using Landmark Estimation and Convolution Neural Network
IRJET- Face Recognition using Landmark Estimation and Convolution Neural NetworkIRJET Journal
 
IRJET - Calorie Detector
IRJET -  	  Calorie DetectorIRJET -  	  Calorie Detector
IRJET - Calorie DetectorIRJET Journal
 
Food Image to the Recipe Generator
Food Image to the Recipe GeneratorFood Image to the Recipe Generator
Food Image to the Recipe GeneratorIRJET Journal
 
IRJET- Restful Backend to Serve any Frontend System
IRJET- Restful Backend to Serve any Frontend SystemIRJET- Restful Backend to Serve any Frontend System
IRJET- Restful Backend to Serve any Frontend SystemIRJET Journal
 
IRJET - Comparative Study of Flight Delay Prediction using Back Propagati...
IRJET -  	  Comparative Study of Flight Delay Prediction using Back Propagati...IRJET -  	  Comparative Study of Flight Delay Prediction using Back Propagati...
IRJET - Comparative Study of Flight Delay Prediction using Back Propagati...IRJET Journal
 
Agricultural Product Price and Crop Cultivation Prediction based on Data Scie...
Agricultural Product Price and Crop Cultivation Prediction based on Data Scie...Agricultural Product Price and Crop Cultivation Prediction based on Data Scie...
Agricultural Product Price and Crop Cultivation Prediction based on Data Scie...IRJET Journal
 
IRJET - Automated Fraud Detection Framework in Examination Halls
 IRJET - Automated Fraud Detection Framework in Examination Halls IRJET - Automated Fraud Detection Framework in Examination Halls
IRJET - Automated Fraud Detection Framework in Examination HallsIRJET Journal
 

Similar to Edbt 2010, Belhajjame (20)

Performance of Weighted Least Square Filter Based Pan Sharpening using Fuzzy ...
Performance of Weighted Least Square Filter Based Pan Sharpening using Fuzzy ...Performance of Weighted Least Square Filter Based Pan Sharpening using Fuzzy ...
Performance of Weighted Least Square Filter Based Pan Sharpening using Fuzzy ...
 
Vibration Analysis for condition Monitoring & Predictive Maintenance using Em...
Vibration Analysis for condition Monitoring & Predictive Maintenance using Em...Vibration Analysis for condition Monitoring & Predictive Maintenance using Em...
Vibration Analysis for condition Monitoring & Predictive Maintenance using Em...
 
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy LogicImproved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
 
IRJET - A Research on Eloquent Salvation and Productive Outsourcing of Massiv...
IRJET - A Research on Eloquent Salvation and Productive Outsourcing of Massiv...IRJET - A Research on Eloquent Salvation and Productive Outsourcing of Massiv...
IRJET - A Research on Eloquent Salvation and Productive Outsourcing of Massiv...
 
Fruit Classification and Quality Prediction using Deep Learning Methods
Fruit Classification and Quality Prediction using Deep Learning MethodsFruit Classification and Quality Prediction using Deep Learning Methods
Fruit Classification and Quality Prediction using Deep Learning Methods
 
IRJET- Smart Classroom Attendance System: Survey
IRJET- Smart Classroom Attendance System: SurveyIRJET- Smart Classroom Attendance System: Survey
IRJET- Smart Classroom Attendance System: Survey
 
AUTOMATED WASTE MANAGEMENT SYSTEM
AUTOMATED WASTE MANAGEMENT SYSTEMAUTOMATED WASTE MANAGEMENT SYSTEM
AUTOMATED WASTE MANAGEMENT SYSTEM
 
IRJET - Finger Vein Extraction and Authentication System for ATM
IRJET -  	  Finger Vein Extraction and Authentication System for ATMIRJET -  	  Finger Vein Extraction and Authentication System for ATM
IRJET - Finger Vein Extraction and Authentication System for ATM
 
IRJET - Review of Various Multi-Focus Image Fusion Methods
IRJET - Review of Various Multi-Focus Image Fusion MethodsIRJET - Review of Various Multi-Focus Image Fusion Methods
IRJET - Review of Various Multi-Focus Image Fusion Methods
 
HOUSE PRICE ESTIMATION USING DATA SCIENCE AND ML
HOUSE PRICE ESTIMATION USING DATA SCIENCE AND MLHOUSE PRICE ESTIMATION USING DATA SCIENCE AND ML
HOUSE PRICE ESTIMATION USING DATA SCIENCE AND ML
 
Diabetic Retinopathy Detection
Diabetic Retinopathy DetectionDiabetic Retinopathy Detection
Diabetic Retinopathy Detection
 
IRJET- Face Recognition using Landmark Estimation and Convolution Neural Network
IRJET- Face Recognition using Landmark Estimation and Convolution Neural NetworkIRJET- Face Recognition using Landmark Estimation and Convolution Neural Network
IRJET- Face Recognition using Landmark Estimation and Convolution Neural Network
 
IRJET - Calorie Detector
IRJET -  	  Calorie DetectorIRJET -  	  Calorie Detector
IRJET - Calorie Detector
 
Food Image to the Recipe Generator
Food Image to the Recipe GeneratorFood Image to the Recipe Generator
Food Image to the Recipe Generator
 
IRJET- Restful Backend to Serve any Frontend System
IRJET- Restful Backend to Serve any Frontend SystemIRJET- Restful Backend to Serve any Frontend System
IRJET- Restful Backend to Serve any Frontend System
 
MANET
MANETMANET
MANET
 
IRJET - Comparative Study of Flight Delay Prediction using Back Propagati...
IRJET -  	  Comparative Study of Flight Delay Prediction using Back Propagati...IRJET -  	  Comparative Study of Flight Delay Prediction using Back Propagati...
IRJET - Comparative Study of Flight Delay Prediction using Back Propagati...
 
Presentation_final.pdf
Presentation_final.pdfPresentation_final.pdf
Presentation_final.pdf
 
Agricultural Product Price and Crop Cultivation Prediction based on Data Scie...
Agricultural Product Price and Crop Cultivation Prediction based on Data Scie...Agricultural Product Price and Crop Cultivation Prediction based on Data Scie...
Agricultural Product Price and Crop Cultivation Prediction based on Data Scie...
 
IRJET - Automated Fraud Detection Framework in Examination Halls
 IRJET - Automated Fraud Detection Framework in Examination Halls IRJET - Automated Fraud Detection Framework in Examination Halls
IRJET - Automated Fraud Detection Framework in Examination Halls
 

More from Khalid Belhajjame

Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based WorkflowsLineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based WorkflowsKhalid Belhajjame
 
Privacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eSciencePrivacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eScienceKhalid Belhajjame
 
Converting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsConverting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsKhalid Belhajjame
 
Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014Khalid Belhajjame
 
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...Khalid Belhajjame
 
Case studyworkshoponprovenance
Case studyworkshoponprovenanceCase studyworkshoponprovenance
Case studyworkshoponprovenanceKhalid Belhajjame
 

More from Khalid Belhajjame (14)

Provenance witha purpose
Provenance witha purposeProvenance witha purpose
Provenance witha purpose
 
Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based WorkflowsLineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
 
Privacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eSciencePrivacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eScience
 
Irpb workshop
Irpb workshopIrpb workshop
Irpb workshop
 
Aussois bda-mdd-2018
Aussois bda-mdd-2018Aussois bda-mdd-2018
Aussois bda-mdd-2018
 
Converting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsConverting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objects
 
Ikc 2015
Ikc 2015Ikc 2015
Ikc 2015
 
Reproducibility 1
Reproducibility 1Reproducibility 1
Reproducibility 1
 
Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014
 
Tapp 2014 (belhajjame)
Tapp 2014 (belhajjame)Tapp 2014 (belhajjame)
Tapp 2014 (belhajjame)
 
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
 
Why Workflows Break
Why Workflows BreakWhy Workflows Break
Why Workflows Break
 
D-prov use-case
D-prov use-caseD-prov use-case
D-prov use-case
 
Case studyworkshoponprovenance
Case studyworkshoponprovenanceCase studyworkshoponprovenance
Case studyworkshoponprovenance
 

Recently uploaded

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 

Recently uploaded (20)

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 

Edbt 2010, Belhajjame

  • 1. Feedback-Based Annotation, Selection and Refinement of Schema Mappings for Dataspaces Khalid Belhajjame, Norman W. Paton, Suzanne M. Embury, Alvaro A. A. Fernandes, and Cornelia Hedeler EDBT/ICDT  2010   1  
  • 2. Data  Integra2on   What  are  the  available  proteins  of  the  Fruit  Fly?     Scien2st   Integra2on   Schema   Mappings   PedroDB   PepSeeker   Pride   GPMDB   EDBT/ICDT  2010   2  
  • 3. Towards  Pay-­‐as-­‐you-­‐go  Data  Integra2on     Data  Integra*on   –  SeKng  up  a  data  integra2on  system  requires  significant  upfront  effort   –  The  specifica2on  of  schema  mappings  has  proved  to  be  2me  and   resource  consuming:  it  requires  deep  knowledge  of  the  sources  to  be   integrated  as  well  as  the  user’s  requirements.     Dataspaces:  a  Pay-­‐as-­‐you-­‐go  Data  Integra*on  [Franklin  et  al.  2005]   –  Reduce  the  up-­‐front  cost  required  to  setup  a  data  integra2on  system:   Provide  some  services  immediately   –  Gradually  improve  the  services  provided  by  the  system  through   interac2on  with  end  users  in  a  pay-­‐as-­‐you-­‐go  fashion.   M.  J.  Franklin,  A.  Y.  Halevy,  and  D.  Maier.  From  databases  to  dataspaces:  a  new  abstrac2on  for  informa2on   management.  SIGMOD  Record,  34(4):27–33,  2005.   EDBT/ICDT  2010   3  
  • 4. Pay-­‐as-­‐you-­‐go  Data  Integra2on   What  are  the  available  proteins  of  the  Fruit  Fly?     Scien2st   Integra2on   Schema   Bootstrap   Dataspaces   Mappings   PedroDB   PepSeeker   Pride   GPMDB   Objec2ve  of  the  present  work:     Inves2gate  Pay-­‐as-­‐you-­‐go  Annota2on,  Selec2on,  and  Refinement  of  Schema  Mappings   EDBT/ICDT  2010   4  
  • 5. Pay-­‐as-­‐you-­‐go  Data  Integra2on    We consider that integration schema and source schemas are relational, and that the schema mappings that define the extent of the relations in the integration schema, r, are global as view mappings of the form: m = ⟨r,qs⟩ where qs is a relational query over the source schemas.  A relation in the integration schema can be associated with multiple candidate mappings: We consider a setting in which multiple matching mechanisms can be used, each of which could give rise to multiple mapping candidates for populating the same relation of the integration schema. EDBT/ICDT  2010   5  
  • 6. Outline     User  Feedback     Annota*on  of  Schema  Mappings     Selec*on  of  Schema  Mappings  Based  on  User  Requirements     Refinement  of  Schema  Mappings     EDBT/ICDT  2010   6  
  • 7. User  Feedback     Query:  What  are  the  available  fruit  fly  proteins?     Results:   Feedback   ✔   ✖   ✖   ✔   EDBT/ICDT  2010   7  
  • 8. User  Feedback  (cont.)     Let  m  be  a  candidate  mapping,  and  UF  a  set  of  feedback  instances  UF   supplied  by  the  user:       tp(m,UF):  the  tuples  that  are  expected  by  the  user  and  that  are  retrieved   by  the  mapping  m.     fp(m,UF):  the  tuples  that  are  not  expected  by  the  user  and  that  are   retrieved  by  the  mapping  m.       fn(m,UF):  the  tuples  that  are  expected  by  the  user  and  are  not  retrieved   by  the  mapping  m.   EDBT/ICDT  2010   8  
  • 9. Outline    User  Feedback     Annota*on  of  Schema  Mappings     Selec*on  of  Schema  Mappings  Based  on  User  Requirements     Refinement  of  Schema  Mappings     EDBT/ICDT  2010   9  
  • 10. Annota2ng  Mappings   Using  a  simple  annota*on  scheme,  a  schema  mapping  can  be   annotated  as:    Correct      Incorrect     The  set  of  schema  mappings  is  likely  to  be  incomplete,  and,   therefore,  we  may  end  up  annota2ng  all  mappings  as  incorrect.   Because  of  this,  we  use  a  less  stringent  scheme  mapping   annota2on.     EDBT/ICDT  2010   10  
  • 11. Annota2ng  Mappings  (cont.)   Instead,  we  use  and  adapt  the  no2ons  of  precision  and  recall   used  in  informa2on  retrieval  to  measure  the  quality  of  a   mapping.    Precision:      Recall:      F  measure:     EDBT/ICDT  2010   11  
  • 12. Mapping  Annota2on:  Valida2on   Ques*ons:     –  How  much  user  feedback  is  required  for  approxima8ng  the   real  precision  and  recall,  i.e.,  those  based  on  complete   knowledge  of  the  expected  results?   –  Does  the  pay-­‐as-­‐you-­‐go  philosophy  hold?   EDBT/ICDT  2010   12  
  • 13. Mapping  Annota2on:  Valida2on  (cont.)   Experiment:     Data:   –  Two  datasets:  the  Mondial  geographical  database  and  the  Amalgam   data  integra2on  benchmark   –  Candidate  schema  mappings:  created  using  the  IBM  Infosphere  Data   Architect.       Process:  we  applied  the  two-­‐step  process  illustrated  below  for  mul2ple   itera2ons.   1.  Generate  a  sample  feedback  instances.   2.  Compute  the  rela2ve  precision  and  recall  of  the  candidate  mappings   given  cumula2ve  feedback.   EDBT/ICDT  2010   13  
  • 14. Mapping  Annota2on:  Error  in  Precision   Error   EDBT/ICDT  2010   14  
  • 15. Mapping  Annota2on:  Error  in  Recall   Error   EDBT/ICDT  2010   15  
  • 16. Outline    User  Feedback    Annota*on  of  Schema  Mappings     Selec*on  of  Schema  Mappings  Based  on  User  Requirements     Refinement  of  Schema  Mappings     EDBT/ICDT  2010   16  
  • 17. Mapping  Selec2on     Mapping  selec2on  should  be  tailored  to  meet  user  requirements.     We  use  a  selec2on  method  that  aims  to  maximise  the  recall  such  that  the   precision  of  the  results  is  higher  than  a  given  precision  threshold.     We  cast  this  selec2on  problem  as  a  search  problem  that  aims  to  maximise  the   following  u2lity  func2on:   D.  A.  Menascé  and  V.  Dubey.  U2lity-­‐based  qos  brokering  in  service  oriented  architectures.  In  ICWS,  pages   422–430.  IEEE  CS,  2007.   EDBT/ICDT  2010   17  
  • 18. Mapping  Selec2on     Mapping  selec2on  should  be  tailored  to  meet  user  requirements.     We  use  a  selec2on  method  that  aims  to  maximise  the  recall  such  that  the   precision  of  the  results  is  higher  than  a  given  precision  threshold.     We  cast  this  selec2on  problem  as  a  search  problem  that  aims  to  maximise  the   following  u2lity  func2on:   D.  A.  Menascé  and  V.  Dubey.  U2lity-­‐based  qos  brokering  in  service  oriented  architectures.  In  ICWS,  pages   422–430.  IEEE  CS,  2007.   EDBT/ICDT  2010   18  
  • 19. Mapping  Selec2on:  Precision   Do  we  meet  precision  requirement,     i.e.,  is  the  precision  threshold  set  by  the  user  respected?   EDBT/ICDT  2010   19  
  • 20. Mapping  Selec2on:  Precision   EDBT/ICDT  2010   20  
  • 21. Mapping  Selec2on:  Recall   Do  we  get  some  benefits  for  recall,     i.e.,  does  the  method  we  use  maximise  the  recall?   EDBT/ICDT  2010   21  
  • 22. Mapping  Selec2on:  Recall   EDBT/ICDT  2010   22  
  • 23. Outline    User  Feedback    Annota*on  of  Schema  Mappings    Selec*on  of  Schema  Mappings  Based  on  User  Requirements     Refinement  of  Schema  Mappings     EDBT/ICDT  2010   23  
  • 24. Mapping  Refinement     We  dis2nguish  two  kinds  of  refinement:       Mapping  refinement  that  seeks  to  reduce  the  number  of  false  posi2ves     A  candidate  mapping  is  refined  by  modifying  a  source  query  so  that  the   number  of  false  posi2ves  it  returns  is  reduced.       Mapping  refinement  that  aims  to  increase  the  number  of  true  posi2ves     A  candidate  mapping  m  is  refined  by  modifying  a  source  query  so  that   the  number  of  true  posi2ves  it  returns  is  increased.     EDBT/ICDT  2010   24  
  • 25. Mapping  Refinement:  Example   I Want Fruit fly proteins Integration Protein schema Accession name gene m = <Protein, ProteinEntry> Source schema EDBT/ICDT  2010   25  
  • 26. Mapping  Refinement:  The  Space  of  Solu2ons   The  space  of  solu2ons  is  composed  of  the  mappings  that  can  be  constructed   out  of  the  candidate  mappings.  Specifically:,  by   i. Joining  the  source  query  of  a  candidate  mapping.     ii. Augmen2ng  the  source  query  of  a  candidate  mapping  with  a  selec2on     condi2on.   iii. Relaxing  the  selec2on  condi2on  of  the  source  query  of  a  candidate     mapping.   iv. Combining  the  source  queries  of  two  or  more  mappings  using  union,     difference  and  intersec2on.   15/04/2009   Khalid   26  
  • 27. Exploring  the  Space  of  Solu2ons     The  space  of  mappings  that  can  be  obtained  by  refinement  is   poten2ally  large.       A  search  algorithm  that  explores  the  whole  space  of  the  possible   mappings  may  not  be  able  to  find  a  solu2on  in  a  bounded  2me.     In  the  context  of  the  present  work,  we  used  an  evolu*onary   algorithm  for  exploring  the  space  of  mappings  that  can  be  obtained   by  refinement.   15/04/2009   Khalid   27  
  • 28. Mapping  Refinement  Algorithm   EDBT/ICDT  2010   28  
  • 29. Mapping  Refinement:  Valida2on     Ques*on:      Can  mapping  refinement  improve  the  quality  of  ini8al  candidate   mappings,  and,  if  so,  at  what  cost,  i.e.,  what  is  the  amount  of  user   feedback  required?     Experiment:  To  answer  the  above  ques2on  we  applied  the   following  process  for  mul2ple  itera2ons.   1) Generate  a  sample  of  feedback  instances.   2) Annotate  the  set  of  candidate  mappings.   3) Refine  candidate  mappings  using  the  RefineMappings  algorithm.   EDBT/ICDT  2010   29  
  • 30. Mapping  Refinement:  Valida2on  (cont.)   EDBT/ICDT  2010   30  
  • 31. Conclusions     Pay-­‐as-­‐you-­‐go  Annota*on  of  Schema  Mappings     We  showed  how  schema  mappings  can  be  incrementally  annotated  based   on  feedback  supplied  by  end  users.     We  also  showed  through  an  evalua2on  exercise  that  the  more  feedback   the  user  supplies,  the  bemer  is  the  quality  of  the  mapping  annota2on   computed.         Applica*on:  Selec*on  and  Refinement  of  Schema  Mappings   in  Dataspaces     Mapping  annota2on  computed  based  on  user  feedback  are  used  as  input   for  enabling  the  selec2on  and  the  refinement  of  schema  mappings.     The  evalua2on  exercises  also  showed  that  mapping  refinement  is  more   cost  effec2ve  in  the  first  feedback  itera2ons.         EDBT/ICDT  2010   31  
  • 32. Feedback-Based Annotation, Selection and Refinement of Schema Mappings for Dataspaces Khalid Belhajjame, Norman W. Paton, Suzanne M. Embury, Alvaro A. A. Fernandes, and Cornelia Hedeler EDBT/ICDT  2010   32