Digital Enterprise Research Institute                                                                         www.deri.ie ...
AgendaDigital Enterprise Research Institute                 www.deri.ie         Paper Overview         Motivation       ...
Paper OverviewDigital Enterprise Research Institute                                                    www.deri.ie        ...
Digital Enterprise Research Institute             www.deri.ie             Big Data & Information Quality             MOTIV...
Enterprise Data LandscapeDigital Enterprise Research Institute                                                            ...
Collaborative Data QualityDigital Enterprise Research Institute                                                           ...
Human ComputationDigital Enterprise Research Institute                                                                    ...
Human ComputationDigital Enterprise Research Institute                                                                    ...
Digital Enterprise Research Institute                                   www.deri.ie             Challenges of Task Routing...
Task RoutingDigital Enterprise Research Institute                                                       www.deri.ie       ...
Task RoutingDigital Enterprise Research Institute                                                        www.deri.ie      ...
Challenges of Push RoutingDigital Enterprise Research Institute                                      www.deri.ie         ...
CAMEE Collaborative Management of Enterprise EntitiesDigital Enterprise Research Institute                                ...
dbp-res:X-Men:_First_Class             rdfs:type dbp-owl:Film .             foaf:name "X-Men: First Class"@en .           ...
Digital Enterprise Research Institute                      www.deri.ie             SKOS Concepts based Implementation of C...
Challenges of Push RoutingDigital Enterprise Research Institute                                      www.deri.ie         ...
DBpedia & SKOSDigital Enterprise Research Institute                                 www.deri.ie             Dbpedia.org  ...
Example EntityDigital Enterprise Research Institute                        www.deri.ie* http://dbpedia.org/resource/A_Beau...
CAMEE with SKOSDigital Enterprise Research Institute                                                                      ...
Expertise AssessmentDigital Enterprise Research Institute                                                                 ...
Example Self-AssessmentDigital Enterprise Research Institute        www.deri.ie                                        21
Example Task-AssessmentDigital Enterprise Research Institute        www.deri.ie                                        22
Task RoutingDigital Enterprise Research Institute        www.deri.ie                                                23
Digital Enterprise Research Institute                         www.deri.ie             Leveraging Expertise Profiles for Ta...
ExpertimentDigital Enterprise Research Institute                                                     www.deri.ie         ...
DatasetDigital Enterprise Research Institute                                     www.deri.ie             Popular Movies i...
Response RateDigital Enterprise Research Institute                                                                   www.d...
Assessment EffortDigital Enterprise Research Institute                                                                    ...
Task RoutingDigital Enterprise Research Institute                                                                         ...
SummaryDigital Enterprise Research Institute                                                     www.deri.ie             ...
Further ReadingDigital Enterprise Research Institute                                                           www.deri.ie...
Selected ReferencesDigital Enterprise Research Institute                                                                  ...
Selected ReferencesDigital Enterprise Research Institute                                                                  ...
Selected ReferencesDigital Enterprise Research Institute                                                                  ...
Selected ReferencesDigital Enterprise Research Institute                                                                  ...
Upcoming SlideShare
Loading in...5
×

Towards Expertise Modelling for Routing Data Cleaning Tasks within a Community of Knowledge Workers

605

Published on

Presented at the ICIQ 2012

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
605
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Personal background
  • Show start and the end beforeBreak the builds
  • The ranking allows us to select best worker based on scoring method
  • Towards Expertise Modelling for Routing Data Cleaning Tasks within a Community of Knowledge Workers

    1. 1. Digital Enterprise Research Institute www.deri.ie TOWARDS EXPERTISE MODELLING FOR ROUTING DATA CLEANING TASKS WITHIN A COMMUNITY OF KNOWLEDGE WORKERS Umair ul Hassan, Sean O’Riain, Edward Curry Digital Enterprise Research Institute National University of Ireland, Galway 17th International Conference on Information Quality (ICIQ 2012), Paris, France Copyright 2012 Digital Enterprise Research Institute. All rights reserved.
    2. 2. AgendaDigital Enterprise Research Institute www.deri.ie  Paper Overview  Motivation  Enterprise Data Landscape  Collaborative Data Quality  Human Computation  Problem Space  Task Routing  Challenges of Push Routing  CAMEE Prototype  DBPedia.org & SKOS  Expertise Assessment  Task Routing  Experiments  Summary 2
    3. 3. Paper OverviewDigital Enterprise Research Institute www.deri.ie  Motivation  Data quality management is limited to few individuals (e.g. MDM)  Involve community of user in data quality tasks  Tasks require expertise and domain knowledge  Problem  How to assess and model human expertise  How to effectively route tasks to appropriate workers  Contribution  Concepts based approach for modelling and assessment of knowledge worker‟s expertise  Concept matching approach for routing data quality tasks  Prototype implementation using SKOS vocabulary 3
    4. 4. Digital Enterprise Research Institute www.deri.ie Big Data & Information Quality MOTIVATION 4
    5. 5. Enterprise Data LandscapeDigital Enterprise Research Institute www.deri.ie  Enterprises will have to deal with much more data in future The Reality All data relevant to enterprise and its operations Relevant The External Data Known Data directly managed by enterprise and The its departments Managed Enterprise Data Reference data managed through well define MDM Collaboratively policies and governance council Managed “The data deluge,” The Economist, Feb-2010 8
    6. 6. Collaborative Data QualityDigital Enterprise Research Institute www.deri.ie Developers Data Governance Data Sources External Crowd Data Quality Human Algorithms Computation Clean Data Clean Data Internal Community 6
    7. 7. Human ComputationDigital Enterprise Research Institute www.deri.ie  Solve computationally hard problems with help of humans  Algorithms control human workers  Computation is carried out by Humans Algorithm Workers Developer Define Compute* Barowy et al, “AutoMan: a platform for integrating human-based and digital computation,” OOPSLA ’12 7
    8. 8. Human ComputationDigital Enterprise Research Institute www.deri.ie Task Design during computation Input Output Task Router Output Aggregation Our Focus before computation after computation* Edith Law and Luis von Ahn, Human Computation - Core Research Questions and State of the Art 8
    9. 9. Digital Enterprise Research Institute www.deri.ie Challenges of Task Routing in Collaborative Data Quality PROBLEM SPACE 9
    10. 10. Task RoutingDigital Enterprise Research Institute www.deri.ie  Pull Routing  System provides an interface to support workers  Workers actively seek tasks and assign to themselves Search & Browse Interface Algorithm Workers Tasks Select Result Result* www.mtruk.com 10
    11. 11. Task RoutingDigital Enterprise Research Institute www.deri.ie  Push Routing  System has complete control over assignment of tasks – Based on criteria such as expertise, cost, and latency  Workers passively receive tasks Task Interface Algorithm Assign Workers Tasks Result Result Assign* www.mobileworks.com 11
    12. 12. Challenges of Push RoutingDigital Enterprise Research Institute www.deri.ie  Workers have different domain knowledge and expertise 1. How to represent expertise required for task? 2. How to assess and represent expertise of workers? 3. How to match a task with expertise of workers? 12
    13. 13. CAMEE Collaborative Management of Enterprise EntitiesDigital Enterprise Research Institute www.deri.ie  Leverages concepts from data to build expertise profiles Associate data Task concepts with tasks Use concepts from the Profile worker expertise Data Concepts Expertise data sources against concepts Routing Leverage profiles for making routing decisions 13
    14. 14. dbp-res:X-Men:_First_Class rdfs:type dbp-owl:Film . foaf:name "X-Men: First Class"@en . dbp-res:X-Men:_First_Class dbp-prop:released "25-05-2011"Digital Enterprise Research Institute www.deri.ie dbp-prop:budget "9600.0" . dbp-owl:distributor dbp-res:20th_Century_Fox CAMEE Task Manager 1) Update & Concepts Input Data Quality Task Dirty Dataset Algorithms Model Worker1 (Sci-Fi, Action, Adventure) Worker2 (Drama, Action, Thriller) Crowd 2) Assessment Expertise 3) Expertise Routing Model Model Crowd Manager 4) Task 5) Task UI Feedback Manager Output Clean 6) Response Dataset dbp-res:X-Men:_First_Class Was film “X-Men: First Class” rdfs:type dbp-owl:Film . True released in 25 May 2011? foaf:name "X-Men: First Class"@en . dbp-prop:budget "9600.0" . dbp-owl:distributor dbp-res:20th_Century_Fox dbp-prop:released "25-05-2011" 14
    15. 15. Digital Enterprise Research Institute www.deri.ie SKOS Concepts based Implementation of CAMEE PROTOTYPE 15
    16. 16. Challenges of Push RoutingDigital Enterprise Research Institute www.deri.ie  Workers have different domain knowledge and expertise 1. How to represent expertise required for task? – DBPedia & SKOS Concepts 2. How to assess and represent expertise of workers? – Expertise based on Self/Task Assessment 3. How to match a task with expertise of workers – Task Routing based on Matching 16
    17. 17. DBpedia & SKOSDigital Enterprise Research Institute www.deri.ie  Dbpedia.org  Structured Database from Wikipedia Facts  Simple Knowledge Organization System  Common model for knowledge organization – Facilitate interoperability – Machine readability  “Concept” is basic element – Identified by URI and represented with RDF  Defines concept schemes – Hierarchical and associative relationships* www.dbpedia.org, www.w3.org/2004/02/skos/ 17
    18. 18. Example EntityDigital Enterprise Research Institute www.deri.ie* http://dbpedia.org/resource/A_Beautiful_Mind_(film) 18
    19. 19. CAMEE with SKOSDigital Enterprise Research Institute www.deri.ie Source Data Data Quality Algorithm Task Model Entity: A Beautiful Mind Update: Missing Value Task: Confirm Missing Value Property & Values: dbpedia-owl:writer = Did Akiva Goldsman wrote the dbpedia-owl:Work/runtime dbpedia:Akiva_Goldsman movie "A Beautiful Mind"? 135.0 dbpedia-owl:director SKOS Concepts: SKOS Concepts: dbpedia:Ron_Howard American_biographical_films American_biographical_films dbpedia-owl:producer Films_set_in_the_1950s Films_set_in_the_1950s dbpedia:Ron_Howard dbpedia:Brian_Graze dbpedia-owl:starring dbpedia:Ed_Harris dbpedia:Russell_Crowe Worker Expertise Task Routing SKOS Concepts: SKOS Concepts: Match Films_set_in_the_1950s (Good) American_biographical_films American_biographical_films Films_about_psychiatry (Poor) American_drama_films (Fair) Films_set_in_the_1950s American_drama_films (Fair) Workers & Expertise Model Routing Model 19
    20. 20. Expertise AssessmentDigital Enterprise Research Institute www.deri.ie  Build profiles of workers  To quantify expertise or knowledge levels of workers against concepts  Two Approaches  Self-Assessment: Workers provide self-assessment of knowledge for each concept  Task Assessment: Workers provide responses to assessment tasks  Expertise Profiles in form of matrix E(C,W)  where C in set of concepts and W is set of workers Concept Worker 1 Worker 2 Worker 3 1990s_comedy-drama_films 0.6 0.2 0.2 Films_about_psychiatry 0.6 0.2 0.6 American_biographical_films 0.8 0.4 0.4 American_comedy-drama_films 0.8 0.6 0.6 20
    21. 21. Example Self-AssessmentDigital Enterprise Research Institute www.deri.ie 21
    22. 22. Example Task-AssessmentDigital Enterprise Research Institute www.deri.ie 22
    23. 23. Task RoutingDigital Enterprise Research Institute www.deri.ie  23
    24. 24. Digital Enterprise Research Institute www.deri.ie Leveraging Expertise Profiles for Task routing EXPERIMENTS 24
    25. 25. ExpertimentDigital Enterprise Research Institute www.deri.ie  Hypothesis  Data quality tasks routed using a concept-based expertise profiles have higher response rates if the expertise model is built using a task-assessment approach as compared to a self-assessment based approach.  Two stages of experiment  Assessment Stage (build profiles)  Routing Stage (leverage profiles) 25
    26. 26. DatasetDigital Enterprise Research Institute www.deri.ie  Popular Movies in Dbpedia  Top 100 grossing movies in Hollywood and Bollywood Characteristic Value Number of entities (dbp:Film) 724 No. of concepts (film genres) 42 No. of data quality tasks 230  Knowledge Workers Characteristic Value No. of Workers 11 Tasks for Assessment Stage 100 Tasks for Routing Stage 130 26
    27. 27. Response RateDigital Enterprise Research Institute www.deri.ie  Hypothesis  Data quality tasks routed using a concept-based expertise profiles have higher response rates if the expertise model is built using a task-assessment approach as compared to a self-assessment based approach.  Data Matching Matching Routing (Assessment) Random (Self-Assessment) (Task Assessment) Dont know 71.54% 58.46% 10.00% Strongly Disagree 5.38% 16.92% 29.23% Disagree 6.92% 2.31% 13.08% Neutral 2.31% 2.31% 8.46% Agree 3.85% 4.62% 12.31% Strongly Agree 10.00% 15.38% 26.92% 27
    28. 28. Assessment EffortDigital Enterprise Research Institute www.deri.ie  Combine self-assessment with task assessment  Filtering assessment tasks based on self-rated concepts reduces effort required during assessment 150 140 Effort (average decisions per worker) 130 120 110 100 For examples 90 80 filter tasks with 70 concepts of 60 Good or higher 50 self-rating 40 30 20 10 0 RND SA TA SA&TA SA&TA SA&TA SA&TA SA&TA (Poor+) (Fair+) (Good+) (Excellent) Assessment Method for Expertise Profiling CA: Self-Assessment TA: Task Assessment 28
    29. 29. Task RoutingDigital Enterprise Research Institute www.deri.ie  Likelihood of response and Quality of response remains near maximum during routing stage 100.00% Response Rate 90.00% Effort (average decisions per worker) Accuracy 80.00% 70.00% For examples 60.00% filter tasks with 50.00% concepts of 40.00% Good or higher self-rating 30.00% 20.00% 10.00% 0.00% RND SA TA SA&TA SA&TA SA&TA SA&TA SA&TA (Poor+) (Fair+) (Good+) (Excellent) Assessment Method for Expertise Profiling CR: Self-Assessment TP: Task Assessment 29
    30. 30. SummaryDigital Enterprise Research Institute www.deri.ie  Conclusion  Effective task routing is fundamental aspect of collaborative data quality management  Concepts are effective for expertise assessment and modelling  Task routing leveraging Task Assessment based profiles have better likelihood of response from workers  Future Directions  Loading balancing under constraints – Cost, Latency, Motivation, Expertise, Utility  Trade-off between assessment for profiling and exploitation 30
    31. 31. Further ReadingDigital Enterprise Research Institute www.deri.ie 17th International Conference on Information Quality (ICIQ 2012) Paris, 16-17 November 2012 U. Ul Hassan, S. O’Riain, and E. Curry, “Towards Expertise Modelling for Routing Data Cleaning Tasks within a Community of Knowledge Workers,” in 17th International Conference on Information Quality - ICIQ’12, 2012. http://www.deri.ie/about/team/member/umair_ul_hassan/ 31
    32. 32. Selected ReferencesDigital Enterprise Research Institute www.deri.ie  Big Data & Data Quality  S. Lavalle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data, Analytics and the Path from Insights to Value,” MIT Sloan Management Review, vol. 52, no. 2, pp. 21–32, 2011.  A. Haug and J. S. Arlbjørn, “Barriers to master data quality,” Journal of Enterprise Information Management, vol. 24, no. 3, pp. 288–303, 2011.  R. Silvola, O. Jaaskelainen, H. Kropsu-Vehkapera, and H. Haapasalo, “Managing one master data – challenges and preconditions,” Industrial Management & Data Systems, vol. 111, no. 1, pp. 146– 162, 2011.  E. Curry, S. Hasan, and S. O‟Riain, “Enterprise Energy Management using a Linked Dataspace for Energy Intelligence,” in Second IFIP Conference on Sustainable Internet and ICT for Sustainability, 2012.  D. Loshin, Master Data Management. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2008.  B. Otto and A. Reichert, “Organizing Master Data Management: Findings from an Expert Survey,” in Proceedings of the 2010 ACM Symposium on Applied Computing - SAC ‟10, 2010, pp. 106–110. 32
    33. 33. Selected ReferencesDigital Enterprise Research Institute www.deri.ie  Collective Intelligence, Crowdsourcing & Human Computation  E. Curry, A. Freitas, and S. O. Riain, “The Role of Community-Driven Data Curation for Enterprises,” in Linking Enterprise Data, D. Wood, Ed. Boston, MA: Springer US, 2010, pp. 25–47.  A. Doan, R. Ramakrishnan, and A. Y. Halevy, “Crowdsourcing systems on the World-Wide Web,” Communications of the ACM, vol. 54, no. 4, p. 86, Apr. 2011.  E. Law and L. von Ahn, “Human Computation,” Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 5, no. 3, pp. 1–121, Jun. 2011.  M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin, “CrowdDB  Answering Queries with : Crowdsourcing,” in Proceedings of the 2011 international conference on Management of data - SIGMOD ‟11, 2011, p. 61.  P. Wichmann, A. Borek, R. Kern, P. Woodall, A. K. Parlikad, and G. Satzger, “Exploring the „Crowd‟ as Enabler of Better Information Quality,” in Proceedings of the 16th International Conference on Information Quality, 2011, pp. 302–312. 33
    34. 34. Selected ReferencesDigital Enterprise Research Institute www.deri.ie  Expert Finding  K. Balog, L. Azzopardi, and M. de Rijke, “Formal models for expert finding in enterprise corpora,” in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ‟06, 2006, p. 43.  K. Balog, T. Bogers, L. Azzopardi, M. de Rijke, and A. van den Bosch, “Broad expertise retrieval in sparse data environments,” in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ‟07, 2007, p. 551.  K. Balog and M. De Rijke, “Determining expert profiles (with an application to expert finding),” in Proceedings of the 20th international joint conference on Artifical intelligence, 2007, pp. 2657–=2662. 34
    35. 35. Selected ReferencesDigital Enterprise Research Institute www.deri.ie  Linked Data & User Feedback  S. O‟Riain, E. Curry, and A. Harth, “XBRL and open data for global financial ecosystems: A linked data approach,” International Journal of Accounting Information Systems, Mar. 2012.  U. Ul Hassan, S. O‟Riain, and E. Curry, “Leveraging Matching Dependencies for Guided User Feedback in Linked Data Applications,” in 9th International Workshop on Information Integration on the Web IIWeb2012, 2012.  A. Miles and J. R. Pérez-Agüera, “SKOS: Simple Knowledge Organisation for the Web,” Cataloging & Classification Quarterly, vol. 43, no. 3–4, pp. 69–83, Apr. 2007.  C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann, “DBpedia - A crystallization point for the Web of Data,” Web Semantics: Science, Services and Agents on the World Wide Web, vol. 7, no. 3, pp. 154–165, Sep. 2009.  S. R. Jeffery, M. J. Franklin, and A. Y. Halevy, “Pay-as-you-go user feedback for dataspace systems,” in Proceedings of the 2008 ACM SIGMOD international conference on Management of data - SIGMOD ‟08, 2008, pp. 847–860. 35

    ×