Heterogeneous Data Annotation Approaches
(Application in Transportation Domain)
Presented by: Yomna M. I. Hassan
Submitted...
Content
 What is Annotation?
 Why Annotation? (Motivation)
 Related Research Tracks
 Heterogeneous Data Types
 Why Fo...
What is Annotation
 Annotation = Tagging =Labeling (In Research)
 Associating metadata with content
 Simple keywords
 ...
Why Annotation? (Motivation)
• Data integration and unifying
• Facilitation of search and data retrieval
• Failure of norm...
Related Research tracks within semantics
• Mobile business intelligence
• Interoperability of semantic data
• Scalability ...
Heterogonous Data Types
• Heterogonous in sources
• Heterogonous in formats
• Heterogonous in granularity
Why Focus on Heterogeneous Data?
 Web 2.0 and the expansion of user involvement in data
generation
 Increased confidence...
Annotation Techniques
Annotation
Manual
Semi-Automated
Automated
Basic tagging system [2]
Generalized Framework for Heterogeneous
Data Annotation
 Format unification
 Domain related extraction
 Annotation (Lab...
Examples on Automated Tagging Research
• Automated Tagging through training dataset
• Rule-based annotation
• Semi automat...
Automated Tagging through training dataset
[3]
• Framework for real-time tag recommendation. The
tagged training documents...
Rule-based annotation [4]
 Xml description of each sensor
 Matching with domain ontology used done through a
rule-based ...
Semi automated Annotation Framework [5]
 Was used in a museum shared system
 Terminology: turn from literal values to co...
Other Examples[6]
 Annotation for gene prediction
 Spatial Annotation (Image matching based on
location within body stru...
Research on Traffic Data
 Annotation can be helpful for different types of
traffic analysis such as : traffic time estima...
HMM for automatic annotation of traffic
trajectories [8]
 Algorithms for integrating information from geographic objects ...
Next Steps
 Focus on Image Annotation
 Potential challenges faced within Image annotation
1. Handling difference in form...
Thank you
References
1- Gertz, Michael. "Data annotation in collaborative research environments."Workshop on
Data Derivation and Pro...
Upcoming SlideShare
Loading in …5
×

Heterogeneous data annotation

293 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
293
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Heterogeneous data annotation

  1. 1. Heterogeneous Data Annotation Approaches (Application in Transportation Domain) Presented by: Yomna M. I. Hassan Submitted to : Dr. Abeer ElKorany
  2. 2. Content  What is Annotation?  Why Annotation? (Motivation)  Related Research Tracks  Heterogeneous Data Types  Why Focus on Heterogeneous Data?  Annotation Techniques  Research on Traffic Data  Next Steps
  3. 3. What is Annotation  Annotation = Tagging =Labeling (In Research)  Associating metadata with content  Simple keywords  Tags are either (descriptive, structural, or administrative “like license numbers), ..  Types of Annotation: • Content-dependent • Content-independent: such as data location, data format, authorship etc. [1]
  4. 4. Why Annotation? (Motivation) • Data integration and unifying • Facilitation of search and data retrieval • Failure of normal data integration methods such as indexing (as database has no standard schema like structure)[1]
  5. 5. Related Research tracks within semantics • Mobile business intelligence • Interoperability of semantic data • Scalability of semantic web methods and tools
  6. 6. Heterogonous Data Types • Heterogonous in sources • Heterogonous in formats • Heterogonous in granularity
  7. 7. Why Focus on Heterogeneous Data?  Web 2.0 and the expansion of user involvement in data generation  Increased confidence and accuracy and reduced ambiguity.  Increased robustness: one sensor can contribute information where others are unavailable, inoperative, or ineffective.  Enhanced spatial and temporal coverage: one sensor can work when or where another sensor cannot  Decreased costs: (a) a suite of ‘average’ sensors can achieve the same level of performance as a single, highly-reliable sensor and at a significantly lower cost, and (b) fewer sensors may be required to obtain a (for a particular application) sufficient picture of the system state.
  8. 8. Annotation Techniques Annotation Manual Semi-Automated Automated
  9. 9. Basic tagging system [2]
  10. 10. Generalized Framework for Heterogeneous Data Annotation  Format unification  Domain related extraction  Annotation (Labeling) based on available information
  11. 11. Examples on Automated Tagging Research • Automated Tagging through training dataset • Rule-based annotation • Semi automated Annotation Framework • Other Examples
  12. 12. Automated Tagging through training dataset [3] • Framework for real-time tag recommendation. The tagged training documents are treated as triplets of (words, docs, tags) • A two-way Poisson Mixture Model (PMM) is proposed to model the document distribution into mixture components within each cluster and aggregate words into word clusters simultaneously. • A new document is classified by the mixture model based on its posterior probabilities so that tags are recommended according to their ranks.
  13. 13. Rule-based annotation [4]  Xml description of each sensor  Matching with domain ontology used done through a rule-based approach
  14. 14. Semi automated Annotation Framework [5]  Was used in a museum shared system  Terminology: turn from literal values to concepts  Annombile (automate annotation): Based on feature matching in XML with domain ontology.
  15. 15. Other Examples[6]  Annotation for gene prediction  Spatial Annotation (Image matching based on location within body structure)  Textual annotation (by word pattern matching)
  16. 16. Research on Traffic Data  Annotation can be helpful for different types of traffic analysis such as : traffic time estimation, accident avoidance and notification
  17. 17. HMM for automatic annotation of traffic trajectories [8]  Algorithms for integrating information from geographic objects (with the spatial extent of point, line or regions)  HMM for semantic annotation of stops.  The Trajectory : Sequence of (x,y,t) points  A sequence of stops is computed and forms the real observation (O)  The exact POI data are the superficial hidden states, whilst the POI categories are the real hidden states that we are interested in.  Our goal is to identify the real hidden states and use them to annotate the stops.
  18. 18. Next Steps  Focus on Image Annotation  Potential challenges faced within Image annotation 1. Handling difference in format 2. Handling difference in information details available with the image 3. Implicit details ( For example: Camera positioning, zooming, etc…)  Look into details related to transportation systems
  19. 19. Thank you
  20. 20. References 1- Gertz, Michael. "Data annotation in collaborative research environments."Workshop on Data Derivation and Provenance. 2002. 2- Kim, Hak Lae, et al. "Review and alignment of tag ontologies for semantically-linked data in collaborative tagging spaces." Semantic Computing, 2008 IEEE International Conference on. IEEE, 2008. 3- Song, Yang, et al. "Real-time automatic tag recommendation." Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2008. 4- Moraru, Alexandra, Carolina Fortuna, and Dunja Mladenić. "Using semantic annotation for knowledge extraction from geographically distributed and heterogeneous sensor data." 4th SensorKDD. ACM (2010). 5- Hyvönen, Eero, Mirva Salminen, and Miikka Junnila. "Annotation of heterogeneous database content for the semantic web." Proceedings of the 4th International Workshop on Knowledge Markup and Semantic Annotation (SemAnnot 2004), Hiroshima, Japan. 2004. 6- Richardson, Lorna, et al. "EMAGE mouse embryo spatial gene expression database: 2010 update." Nucleic acids research 38.suppl 1 (2010): D703-D709. 7- Ou, Qing. Fusing Heterogeneous Traffic Data: Parsimonious Approaches Using Data- data Consistency. Netherlands TRAIL Research School, 2011. 8- Yan, Zhixian, et al. "SeMiTri: a framework for semantic annotation of heterogeneous trajectories." Proceedings of the 14th international conference on extending database technology. ACM, 2011.

×