SlideShare a Scribd company logo
1 of 66
Metadata for Information
Resources
Metadata
• Metadata, literally “data about data,” has
become a widely used yet still frequently
underspecified term that is understood in
different ways by the diverse professional
communities that design, create, describe,
preserve, and use information systems and
resources
Metadata
• For the past hundred years at least, the creation
• and management of metadata has primarily been
the responsibility of information professionals
engaged in
– cataloging
– classification
– indexing
• But as information resources are increasingly put
online by the general public, metadata
considerations are no longer solely the province
of information professionals.
Metadata Creation
• Metadata creation is—or should often be—a
collaborative effort
• Until the mid-1990s, metadata was a term
used primarily by communities involved with
the management and interoperability of
geospatial data and with data management
and systems design and maintenance in
general
Metadata
• In general, all information objects, regardless
of the physical or intellectual form they take,
have three features—
– Content
– Context
– Structure—
• All of which can and should be reflected
through metadata.
Metadata
• Content relates to what the object contains or is
about and is intrinsic to an information object.
• Context indicates the who, what, why, where, and
how aspects associated with the object’s creation
and is extrinsic to an information object.
• Structure relates to the formal set of associations
within or among individual information objects and
can be intrinsic (basic, inherent, essential) or extrinsic
or both.
Metadata
• Cultural heritage information professionals such as
museum registrars, library catalogers, and archival
processors often apply the term metadata to the
value-added information that they create to arrange,
describe, track, and otherwise enhance access to
information objects and the physical collections related
to those objects.
• Such metadata is frequently governed by community-
developed and community-fostered standards and best
practices in order to ensure quality, consistency, and
interoperability (info exchange).
Metadata Authors
• Metadata created by users
• Metadata created by trained information
professionals.
• Activities such as
– social tagging
– social bookmarking
• The resulting forms of user-created metadata
such as “folksonomies”
Web Search and Ranking
• How Web search engines work?
• How they use metadata, data, links, and
relevance ranking to help users find what they
are seeking?
Importance of Metadata
• Hardware and software come and go
• Sometimes becoming obsolete with alarming rapidity
• But high-quality, standards-based, system-independent
metadata can be
– Used
– Reused
– Migrated
– disseminated (spread)
in any number of ways,
• Even in ways that we cannot anticipate at this moment
Digitization+Metadata
• Digitization combined with the creation of
carefully crafted metadata can significantly
enhance end-user access
Information Resources
• Our users are the primary reason that we
create digital resources.
• Exercise
• Need a list of Information resources and
related objects
1. Photographs
2. Text Books
3. Journals
4. Researcher
papers
5. Audios
6. Articles
7. Magazines
8. Videos
9. Publications
10. Media
11. Blogs
12. Websites
13. Encyclopedias
14. Expert opinions
15. Databases
16. Newspapers
17. Almanacs
18. Conference
proceedings
19. Dictionaries
20. Encyclopedias
21. Handbooks
22. Diaries
23. Interviews
24. Letters
25. Original works
of art
26. Speeches
27. Works of
literature
28. Biographies
29. Dissertations
30. Indexes,
abstracts,
bibliographies
(used to locate a
secondary
source)
31. Journal articles
32. Monographs
Twitter Metadata
• Twitter has the following objects
– Users
– Tweets
– Entities : media, urls, user_mentions,
hashtags, symbols
Users Metadata
• Users can be anyone or anything.
They tweet, follow, create lists, have a
home_timeline, can be mentioned, and can
be looked up in bulk
• Metadata for the users contain the following
fields:
Users Metadata
• contributors_enabled
– Indicates that the user has an account with “contributor
mode” enabled
– Allowing for Tweets issued by the user to be co-authored
by another account.
• created_at
– The UTC datetime that the user account was created on
Twitter.
• default_profile
– indicates that the user has not altered the theme or
background of their user profile.
Users Metadata
 default_profile_image
◦ When true, indicates that the user has not uploaded their
own avatar and a default egg avatar is used instead.
 Id
◦ The integer representation of the unique identifier for this
User
 Id_str
◦ The string representation of the unique identifier for this
User.
Users Metadata
 Description
◦ String describing the account
• favourites_count
• follow_request_sent
• Following
• followers_count
• friends_count
 …more attributes
Tweets Metadata
 A tweet is just 140 characters of text.
 A tweet is filled with metadata
–information about when it was sent, by who, using what Twitter
application and so on.
 Contributors
◦ An collection of brief user objects (usually only one) indicating
users who contributed to the authorship of the tweet
 Coordinates
◦ Represents the geographic location of this Tweet as reported by
the user or client application
 created_at
◦ UTC time when this Tweet was created.
Tweets Metadata
 current_user_retweet
◦ Only surfaces on methods supporting
the include_my_retweetparameter, when set to true.
 Entities
◦ Entities which have been parsed out of the text of the Tweet
 Id
◦ The integer representation of the unique identifier for this
Tweet.
 Id_Str
◦ The string representation of the unique identifier for this
Tweet
 Text
◦ The actual UTF-8 text of the status update
• …more
Places Metadata
• locality the city the place is in
• region the administrative region the place is in
• iso3 the country code
• postal_code in the preferred local format for the place
• phone in the preferred local format for the place,
include long distance code
• Twitter twitter screen-name, without@
• url official/canonical URL for place
• app:id An ID or comma separated list of IDs
representing the place in the applications
place database.
Entities for Tweets: media entity
 An array of media attached to the Tweet with the Twitter
Photo Upload feature.
id the media ID (int format)
id_str the media ID (string format)
media_url The URL of the media file
media_url_https The SSL URL of the media file
url The media URL that was extracted
display_url Not a URL but a string to display instead of the media URL
expanded_url The fully resolved media URL
sizes thumb, small, medium and large.
type Only photo for now
indices The character positions the media was extracted from
urls entity
• An array of URLs extracted from the Tweet text. Each URL
entity comes with the following attributes:
url
The t.co URL that was extracted from the Tweet
text
display_url
Not a valid URL but a string to display instead of
the URL
expanded_url The resolved URL
indices
The character positions the URL was extracted
from
user_mentions entity
id The user ID (int format)
id_str The user ID (string format)
screen_name The user screen name
name The user full name
indices
The character positions the user mention was extracted
from
hashtags entity
text The hashtag text
indices The character positions the hashtag was extracted from
symbols entity
• An array of financial symbols starting with the
dollar sign extracted from the Tweet tex
text The symbol text
indices The character positions the symbol was extracted from
Photo metadata: a simple concept
There are 3 main categories of data:
• Administrative – identification of the creator, creation
date and location, contact information for licensors of
the image, and other technical details.
• Descriptive – information about the visual content.
This may include headline, title, captions and
keywords. This can be done using free text or codes
from a controlled vocabulary.
• Rights – copyright information and underlying rights in
the visual content including model and property rights,
and rights usage terms.
Classes Of Metadata
• Technical Metadata
1. Most modern image-capture devices generate
information about themselves and the pictures
they record, such as that stored in Exif.
2. These data describe an image’s technical
characteristics, such as its size, color profile, ISO
speed and other camera settings.
3. Some professional cameras can be configured to
add detailed ownership and descriptive
information.
Descriptive Metadata
A photographer or image collection manager can enter and embed various
information about an image’s contents.
This can include
1. captions,
2. headlines,
3. titles,
4. keywords,
5. location of capture, etc.
These metadata fields were included in the original IPTC-IIM schema.
Expanded in the IPTC Core and IPTC Extension metadata schemas.
Good descriptive metadata are key to unlocking an image collection to find stored
images.
Administrative Metadata
Image files can also include
1. licensing or rights usage terms,
2. Specific restrictions on using an image,
3. Model releases,
4. Source information, such as the identity of the creator,
and contact information for the rights holder or
licensor.
These types of metadata have been comprehensively
addressed and standardized within the PLUS system.
TABLE 1 - ARTICLE IDENTIFICATION
News Metadata
Weather Metadata
Observing site name: Aviemore
Location (deg lat, deg long): 57N, 3W
Elevation: 226m
Parameter observed: temperature
Operator: Met Office
Started: 11:50 01/01/01
Ended: 12:00 01/01/01
Value: 4
Units: Celsius
Instrument
instrument number: 123456
instrument inspection date: 01/01/01
instrument type: Magic mercury 1234
Data
Metadata
1000 Genomes
• Example 1000 Genomes Data
• CHROM 4
• POS 42208061
• ID rs186575857
• REF T
• ALT C
• QUAL 100
• FILTER PASS
• INFO AA=T;AN=2184;AC=1;RSQ=0.8138;AF=0.0005;
• FORMAT GT:DS:GL
• GENOTYPE 0|0:0.000:-0.03,-1.19,-5.00
35
Facebook Comments Metadata
36
37
Photos
38
Understanding MetaData
Metadata is key to ensuring that
resources will survive and continue to be
accessible into the future
• Metadata is structured information that
– describes,
– explains,
– locates, or
– otherwise makes it easier to retrieve,
– use,
– or manage
• an information resource.
• Metadata is often called
– data about data or
– Information about information.
term metadata usage
• Used differently in different communities.
• Some use it to refer to machine understandable
information, while
• others use it only for records that describe electronic
resources. In
– the library environment,
• Metadata is commonly used for any formal scheme of
resource description, applying to any type of
– object (digital or non-digital)
• Traditional library cataloging is a form of metadata;
MARC 21 & AACR
• MARC 21 and the rule sets used with it, such as AACR2,
are metadata standards.
• Format for Bibliographic Data
• MARC 21 (Machine Readable Cataloging)- 1999
Edition Update No. 1 (October 2000) through Update
No. 21 (September 2015) - Library of Congress
• AACR (Anglo American Cataloging Rules) and its allied
products are published jointly by the American Library
Association, the Canadian Library Association, and the
Chartered Institute of Library and Information
Professionals.
• Metadata have also been developed to
describe various types of textual and non-
textual objects including
– Published books
– electronic documents
– Archival finding aids
– Art objects
– Educational and training materials and
– Scientific datasets
Metadata Types
There are three main types of metadata:
• Descriptive metadata - describes a resource for
purposes such as discovery and identification.
• It can include elements such as
– title, abstract, author, and keywords.
• Structural metadata - indicates how compound
objects are put together, for example,
– How pages are ordered to form chapters.
• Administrative metadata – provides information to
help manage a resource, such as
– when and how it was created, File type and other technical
information, and who can access it.
Metadata Types
• There are several subsets of administrative
data; two that sometimes are listed as
separate metadata types are:
– Rights management metadata : which deals with
intellectual property rights, and
– Preservation metadata : which contains
information needed to archive and preserve a
resource
Aggregation
• Metadata can describe resources at any level
of aggregation. It can describe
– a collection,
– a single resource, or
– a component part of a larger resource (for
example, a photograph in an article).
• Just as catalogers make decisions about
whether a catalog record should be created
for a whole set of volumes or for each
particular volume in the set
Storing Metadata
• Metadata can be embedded in a digital object
or it can be stored separately.
• Metadata is often embedded in HTML
documents and in the headers of image files.
Storing Metadata
• Storing metadata with the object
– Ensures the metadata will not be lost,
– obviates problems of linking between data and metadata,
– and helps ensure that the metadata and object will be
updated together
• Storing metadata separately
– can simplify the management of the metadata itself and
– facilitate search and retrieval
• Therefore, metadata is commonly stored in a database
• system and linked to the objects described
What Does Metadata Do?
• An important reason for creating descriptive
metadata is to facilitate discovery of relevant
information
• In addition to resource discovery, metadata
can help organize electronic resources
• Facilitate interoperability and legacy resource
integration
• Provide digital identification, and
• support archiving and preservation
Resource Discovery
• Metadata serves the same functions in resource
discovery as good cataloging does by:
• Allowing resources to be found by relevant criteria
• Identifying resources
• Bringing similar resources together
• Distinguishing dissimilar resources and
• Giving location information
Organizing Electronic Resources
• As the number of Web-based resources grows
exponentially, aggregate sites or portals are increasingly
useful in organizing links to resources based on audience or
topic.
• Such lists can be built as static webpages, with the names
and locations of the resources “hardcoded” in the HTML.
• However, it is more efficient and increasingly more
common to build these pages dynamically from metadata
stored in databases.
• Various software tools can be used to automatically extract
and reformat the information for Web applications.
Interoperability
• Describing a resource with metadata allows it to be
understood by both humans and machines in ways
that promote interoperability.
• Interoperability is the ability of multiple systems with
different hardware and software platforms, data
structures, and interfaces to exchange data with
minimal loss of content and functionality.
• Using defined metadata schemes, shared transfer
protocols, and crosswalks (mapping) between
schemes, resources across the network can be
searched more seamlessly.
Digital Identification
• Most metadata schemes include elements such as standard
numbers to uniquely identify the work or object to which the
metadata refers.
– The location of a digital object may also be given using a file name, URL
(Uniform Resource Locator)
– Some more persistent identifier such as a PURL (Persistent URL or URI)
– DOI (Digital Object Identifier)
• Persistent identifiers are preferred because object locations
often change, making the standard URL (and therefore the
metadata record) invalid.
• In addition to the actual elements that point to the object, the
metadata can be combined to act as a set of identifying data,
differentiating one object from another for validation purposes.
Archiving and Preservation
• Most current metadata efforts center around
the discovery of recently created resources.
• However, there is a growing concern that
digital resources will not survive in usable
form into the future.
– Digital information is fragile
– It can be corrupted or altered, intentionally or
unintentionally.
– It may become unusable as storage media and
hardware and software technologies change.
Format Migration and Emulation
• Format migration and perhaps emulation of current hardware
and software behavior in future hardware and software
platforms are strategies for overcoming these challenges.
• Metadata is key to ensuring that resources will survive
and continue to be accessible into the future.
• Archiving and preservation require special elements
– to track the lineage of a digital object (where it came from
and how it has changed over time),
– to detail its physical characteristics, and
– to document its behavior in order to emulate it on future
technologies.
Structuring Metadata
• Metadata schemes (also called schema) are sets of metadata
elements designed for a specific purpose, such as
– describing a particular type of information resource.
• The definition or meaning of the elements themselves is known as
the semantics of the scheme
• The values given to metadata elements are the content
• Metadata schemes generally specify names of elements and their
semantics
• Optionally, they may specify
• content rules for how content must be formulated, for example,
how to identify the main title,
• representation rules for content , for example, capitalization rules,
and
• allowable content values, for example, terms must be used from a
specified controlled vocabulary.
Structuring Metadata
• There may also be syntax rules for how the elements and
their content should be encoded.
• A metadata scheme with no prescribed syntax rules is
called syntax independent.
• Metadata can be encoded in any definable syntax.
• Many current metadata schemes use
– SGML (Standard Generalized Mark-up Language) or
– XML (Extensible Mark-up Language).
• XML, developed by the World Wide Web Consortium
(W3C), is an extended form of HTML that allows for locally
defined tag sets and the easy exchange of structured
information.
• SGML is a superset of both HTML and XML and allows for
the richest mark-up of a document.
Metadata Schemes and
Element Sets
• Many different metadata schemes are developed in a
variety of user environments and disciplines.
• Some of the most common ones are
– Dublin Core Metadata Initiative (DCMI)
– The Text Encoding Initiative (TEI)
– Metadata Encoding and Transmission Standard (METS)
– Metadata Object Description Schema (MODS)
– Learning Object Metadata
– E-Commerce – <indecs> and ONIX
– Visual Objects – CDWA and VRA
– MPEG Multimedia Metadata
– Metadata schemes for datasets
Dublin Core
• The Dublin Core Metadata Element Set arose from
discussions at a 1995 workshop sponsored by OCLC
and the National Center for Supercomputing
Applications (NCSA).
• As the workshop was held in Dublin, Ohio, the element
set was named the Dublin Core.
• The continuing development of the Dublin Core and
related specifications is managed by the Dublin Core
Metadata Initiative (DCMI).
• The original objective of the Dublin Core was to define
a set of elements that could be used by authors to
describe their own Web resources.
Dublin Core Example
Title=”Metadata Demystified”
Creator=”Brand, Amy”
Creator=”Daly, Frank”
Creator=”Meyers, Barbara”
Subject=”metadata”
Description=”Presents an overview of metadata conventions in publishing.”
Publisher=”NISO Press”
Publisher=”The Sheridan Press”
Date=”2003-07"
Type=”Text”
Format=”application/pdf”
Identifier=”http://www.niso.org/standards/resources/Metadata_Demystified.pdf”
Language=”en”
The Text Encoding Initiative (TEI)
• The Text Encoding Initiative is an international project to develop
guidelines for marking up electronic texts such as novels, plays, and
poetry, primarily to support research in the humanities. In addition
to specifying how to encode the text of a work, the TEI Guidelines
for Electronic Text Encoding and Interchange also specify a header
portion, embedded in the resource, that consists of metadata about
the work. The TEI header, like the rest of the TEI, is defined as an
SGML DTD (Document Type Definition)— a set of tags and rules
defined in SGML syntax that describe the structure and elements of
a document.
• This SGML mark-up becomes part of the electronic resource itself.
Since the TEI DTD is rather large and complicated in order to apply
to a vast range of texts and uses, a simpler subset of the DTD,
known as TEI Lite, is commonly used in libraries.
• It is assumed that TEI-encoded texts are electronic versions of
printed texts.
Metadata Encoding and
Transmission Standard (METS)
• The Metadata Encoding and Transmission
Standard (METS) was developed to fill the need
for a standard data structure for describing
complex digital library objects.
• METS is an XML Schema for creating XML
document instances that express the structure of
digital library objects, the associated descriptive
and administrative metadata, and the names and
locations of the files that comprise the digital
object.
Metadata Object Description Schema (MODS)
• The Metadata Object Description Schema (MODS) is a descriptive
metadata schema that is a derivative of MARC 21 and intended to
either carry selected data from existing MARC 21 records or enable
the creation of original resource description records.
• It includes a subset of MARC fields and uses language based tags
rather than the numeric ones used in MARC 21 records.
• In some cases, it regroups elements from the MARC 21
bibliographic format.
• Like METS, MODS is expressed using the XML schema language.
• Although the MODS standard can stand on its own, it may also
complement other metadata formats.
A MODS Record Example
<mods>
<titleInfo>
<title>Metadata demystified</title>
</titleInfo>
<name type=”personal”>
<namePart type=”family”>Brand</namePart>
<namePart type=”given”>Amy</namePart>
<role>
<roleTerm authority=”marcrelator” type=”text”>author</roleTerm>
</role>
</name>
<typeOfResource>text</typeOfResource>
<originInfo>
<dateIssued>2003</dateIssued>
<place>
<placeTerm type=”text”>Bethesda, MD</placeTerm>
</place>
<publisher>NISO Press</publisher>
</originInfo>
<identifier type=”isbn”>1-880124-59-9</identifier>
</mods>
Metadata.pptx

More Related Content

Similar to Metadata.pptx

Metadata and Tagging
Metadata and TaggingMetadata and Tagging
Metadata and Taggingpauloshea
 
Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010
Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010
Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010Henry Ong
 
Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligence               Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligence dannyijwest
 
Five steps to search and store tweets by keywords
Five steps to search and store tweets by keywordsFive steps to search and store tweets by keywords
Five steps to search and store tweets by keywordsWeiai Wayne Xu
 
Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled IntelligenceMetadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligencedannyijwest
 
Get Twitter Data With The Best Twitter Profile Scraper.pdf
Get Twitter Data With The Best Twitter Profile Scraper.pdfGet Twitter Data With The Best Twitter Profile Scraper.pdf
Get Twitter Data With The Best Twitter Profile Scraper.pdfAqsaBatool21
 
SPSCT15 - Must Love Term Sets: The New and Improved Managed Metadata Service ...
SPSCT15 - Must Love Term Sets: The New and Improved Managed Metadata Service ...SPSCT15 - Must Love Term Sets: The New and Improved Managed Metadata Service ...
SPSCT15 - Must Love Term Sets: The New and Improved Managed Metadata Service ...Jonathan Ralton
 
Leverage Social Media Data with SAP Data Services
Leverage Social Media Data with SAP Data ServicesLeverage Social Media Data with SAP Data Services
Leverage Social Media Data with SAP Data ServicesMethod360
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
 
Business Intelligence and Analytics Unit-2 part-A .pptx
Business Intelligence and Analytics Unit-2 part-A .pptxBusiness Intelligence and Analytics Unit-2 part-A .pptx
Business Intelligence and Analytics Unit-2 part-A .pptxRupaRani28
 
Tweet Archivist presentation
Tweet Archivist presentationTweet Archivist presentation
Tweet Archivist presentationBNRichards
 
IRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data AnalysisIRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data AnalysisIRJET Journal
 
SharePointSocialService
SharePointSocialServiceSharePointSocialService
SharePointSocialServiceShahzad S
 
Week-2_LectureA1_701.pdf
Week-2_LectureA1_701.pdfWeek-2_LectureA1_701.pdf
Week-2_LectureA1_701.pdfssuserc3fe80
 

Similar to Metadata.pptx (20)

Metadata and Tagging
Metadata and TaggingMetadata and Tagging
Metadata and Tagging
 
Meta data
Meta dataMeta data
Meta data
 
Digital data
Digital dataDigital data
Digital data
 
Digital Types
Digital TypesDigital Types
Digital Types
 
Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010
Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010
Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010
 
Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligence               Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligence
 
Five steps to search and store tweets by keywords
Five steps to search and store tweets by keywordsFive steps to search and store tweets by keywords
Five steps to search and store tweets by keywords
 
Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled IntelligenceMetadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligence
 
Get Twitter Data With The Best Twitter Profile Scraper.pdf
Get Twitter Data With The Best Twitter Profile Scraper.pdfGet Twitter Data With The Best Twitter Profile Scraper.pdf
Get Twitter Data With The Best Twitter Profile Scraper.pdf
 
SPSCT15 - Must Love Term Sets: The New and Improved Managed Metadata Service ...
SPSCT15 - Must Love Term Sets: The New and Improved Managed Metadata Service ...SPSCT15 - Must Love Term Sets: The New and Improved Managed Metadata Service ...
SPSCT15 - Must Love Term Sets: The New and Improved Managed Metadata Service ...
 
Leverage Social Media Data with SAP Data Services
Leverage Social Media Data with SAP Data ServicesLeverage Social Media Data with SAP Data Services
Leverage Social Media Data with SAP Data Services
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
Business Intelligence and Analytics Unit-2 part-A .pptx
Business Intelligence and Analytics Unit-2 part-A .pptxBusiness Intelligence and Analytics Unit-2 part-A .pptx
Business Intelligence and Analytics Unit-2 part-A .pptx
 
Tweet Archivist presentation
Tweet Archivist presentationTweet Archivist presentation
Tweet Archivist presentation
 
IRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data AnalysisIRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data Analysis
 
Making Sense of ISO/IEC 19788
Making Sense of ISO/IEC 19788Making Sense of ISO/IEC 19788
Making Sense of ISO/IEC 19788
 
semana1.pptx
semana1.pptxsemana1.pptx
semana1.pptx
 
SharePointSocialService
SharePointSocialServiceSharePointSocialService
SharePointSocialService
 
Metadata
MetadataMetadata
Metadata
 
Week-2_LectureA1_701.pdf
Week-2_LectureA1_701.pdfWeek-2_LectureA1_701.pdf
Week-2_LectureA1_701.pdf
 

Recently uploaded

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad EscortsCall girls in Ahmedabad High profile
 

Recently uploaded (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
 

Metadata.pptx

  • 2. Metadata • Metadata, literally “data about data,” has become a widely used yet still frequently underspecified term that is understood in different ways by the diverse professional communities that design, create, describe, preserve, and use information systems and resources
  • 3. Metadata • For the past hundred years at least, the creation • and management of metadata has primarily been the responsibility of information professionals engaged in – cataloging – classification – indexing • But as information resources are increasingly put online by the general public, metadata considerations are no longer solely the province of information professionals.
  • 4. Metadata Creation • Metadata creation is—or should often be—a collaborative effort • Until the mid-1990s, metadata was a term used primarily by communities involved with the management and interoperability of geospatial data and with data management and systems design and maintenance in general
  • 5. Metadata • In general, all information objects, regardless of the physical or intellectual form they take, have three features— – Content – Context – Structure— • All of which can and should be reflected through metadata.
  • 6. Metadata • Content relates to what the object contains or is about and is intrinsic to an information object. • Context indicates the who, what, why, where, and how aspects associated with the object’s creation and is extrinsic to an information object. • Structure relates to the formal set of associations within or among individual information objects and can be intrinsic (basic, inherent, essential) or extrinsic or both.
  • 7. Metadata • Cultural heritage information professionals such as museum registrars, library catalogers, and archival processors often apply the term metadata to the value-added information that they create to arrange, describe, track, and otherwise enhance access to information objects and the physical collections related to those objects. • Such metadata is frequently governed by community- developed and community-fostered standards and best practices in order to ensure quality, consistency, and interoperability (info exchange).
  • 8.
  • 9. Metadata Authors • Metadata created by users • Metadata created by trained information professionals. • Activities such as – social tagging – social bookmarking • The resulting forms of user-created metadata such as “folksonomies”
  • 10. Web Search and Ranking • How Web search engines work? • How they use metadata, data, links, and relevance ranking to help users find what they are seeking?
  • 11. Importance of Metadata • Hardware and software come and go • Sometimes becoming obsolete with alarming rapidity • But high-quality, standards-based, system-independent metadata can be – Used – Reused – Migrated – disseminated (spread) in any number of ways, • Even in ways that we cannot anticipate at this moment
  • 12. Digitization+Metadata • Digitization combined with the creation of carefully crafted metadata can significantly enhance end-user access
  • 13. Information Resources • Our users are the primary reason that we create digital resources. • Exercise • Need a list of Information resources and related objects
  • 14. 1. Photographs 2. Text Books 3. Journals 4. Researcher papers 5. Audios 6. Articles 7. Magazines 8. Videos 9. Publications 10. Media 11. Blogs 12. Websites 13. Encyclopedias 14. Expert opinions 15. Databases 16. Newspapers 17. Almanacs 18. Conference proceedings 19. Dictionaries 20. Encyclopedias 21. Handbooks 22. Diaries 23. Interviews 24. Letters 25. Original works of art 26. Speeches 27. Works of literature 28. Biographies 29. Dissertations 30. Indexes, abstracts, bibliographies (used to locate a secondary source) 31. Journal articles 32. Monographs
  • 15. Twitter Metadata • Twitter has the following objects – Users – Tweets – Entities : media, urls, user_mentions, hashtags, symbols
  • 16. Users Metadata • Users can be anyone or anything. They tweet, follow, create lists, have a home_timeline, can be mentioned, and can be looked up in bulk • Metadata for the users contain the following fields:
  • 17. Users Metadata • contributors_enabled – Indicates that the user has an account with “contributor mode” enabled – Allowing for Tweets issued by the user to be co-authored by another account. • created_at – The UTC datetime that the user account was created on Twitter. • default_profile – indicates that the user has not altered the theme or background of their user profile.
  • 18. Users Metadata  default_profile_image ◦ When true, indicates that the user has not uploaded their own avatar and a default egg avatar is used instead.  Id ◦ The integer representation of the unique identifier for this User  Id_str ◦ The string representation of the unique identifier for this User.
  • 19. Users Metadata  Description ◦ String describing the account • favourites_count • follow_request_sent • Following • followers_count • friends_count  …more attributes
  • 20. Tweets Metadata  A tweet is just 140 characters of text.  A tweet is filled with metadata –information about when it was sent, by who, using what Twitter application and so on.  Contributors ◦ An collection of brief user objects (usually only one) indicating users who contributed to the authorship of the tweet  Coordinates ◦ Represents the geographic location of this Tweet as reported by the user or client application  created_at ◦ UTC time when this Tweet was created.
  • 21. Tweets Metadata  current_user_retweet ◦ Only surfaces on methods supporting the include_my_retweetparameter, when set to true.  Entities ◦ Entities which have been parsed out of the text of the Tweet  Id ◦ The integer representation of the unique identifier for this Tweet.  Id_Str ◦ The string representation of the unique identifier for this Tweet  Text ◦ The actual UTF-8 text of the status update • …more
  • 22. Places Metadata • locality the city the place is in • region the administrative region the place is in • iso3 the country code • postal_code in the preferred local format for the place • phone in the preferred local format for the place, include long distance code • Twitter twitter screen-name, without@ • url official/canonical URL for place • app:id An ID or comma separated list of IDs representing the place in the applications place database.
  • 23. Entities for Tweets: media entity  An array of media attached to the Tweet with the Twitter Photo Upload feature. id the media ID (int format) id_str the media ID (string format) media_url The URL of the media file media_url_https The SSL URL of the media file url The media URL that was extracted display_url Not a URL but a string to display instead of the media URL expanded_url The fully resolved media URL sizes thumb, small, medium and large. type Only photo for now indices The character positions the media was extracted from
  • 24. urls entity • An array of URLs extracted from the Tweet text. Each URL entity comes with the following attributes: url The t.co URL that was extracted from the Tweet text display_url Not a valid URL but a string to display instead of the URL expanded_url The resolved URL indices The character positions the URL was extracted from
  • 25. user_mentions entity id The user ID (int format) id_str The user ID (string format) screen_name The user screen name name The user full name indices The character positions the user mention was extracted from
  • 26. hashtags entity text The hashtag text indices The character positions the hashtag was extracted from
  • 27. symbols entity • An array of financial symbols starting with the dollar sign extracted from the Tweet tex text The symbol text indices The character positions the symbol was extracted from
  • 28.
  • 29. Photo metadata: a simple concept There are 3 main categories of data: • Administrative – identification of the creator, creation date and location, contact information for licensors of the image, and other technical details. • Descriptive – information about the visual content. This may include headline, title, captions and keywords. This can be done using free text or codes from a controlled vocabulary. • Rights – copyright information and underlying rights in the visual content including model and property rights, and rights usage terms.
  • 30. Classes Of Metadata • Technical Metadata 1. Most modern image-capture devices generate information about themselves and the pictures they record, such as that stored in Exif. 2. These data describe an image’s technical characteristics, such as its size, color profile, ISO speed and other camera settings. 3. Some professional cameras can be configured to add detailed ownership and descriptive information.
  • 31. Descriptive Metadata A photographer or image collection manager can enter and embed various information about an image’s contents. This can include 1. captions, 2. headlines, 3. titles, 4. keywords, 5. location of capture, etc. These metadata fields were included in the original IPTC-IIM schema. Expanded in the IPTC Core and IPTC Extension metadata schemas. Good descriptive metadata are key to unlocking an image collection to find stored images.
  • 32. Administrative Metadata Image files can also include 1. licensing or rights usage terms, 2. Specific restrictions on using an image, 3. Model releases, 4. Source information, such as the identity of the creator, and contact information for the rights holder or licensor. These types of metadata have been comprehensively addressed and standardized within the PLUS system.
  • 33. TABLE 1 - ARTICLE IDENTIFICATION News Metadata
  • 34. Weather Metadata Observing site name: Aviemore Location (deg lat, deg long): 57N, 3W Elevation: 226m Parameter observed: temperature Operator: Met Office Started: 11:50 01/01/01 Ended: 12:00 01/01/01 Value: 4 Units: Celsius Instrument instrument number: 123456 instrument inspection date: 01/01/01 instrument type: Magic mercury 1234 Data Metadata
  • 35. 1000 Genomes • Example 1000 Genomes Data • CHROM 4 • POS 42208061 • ID rs186575857 • REF T • ALT C • QUAL 100 • FILTER PASS • INFO AA=T;AN=2184;AC=1;RSQ=0.8138;AF=0.0005; • FORMAT GT:DS:GL • GENOTYPE 0|0:0.000:-0.03,-1.19,-5.00 35
  • 37. 37
  • 40. Metadata is key to ensuring that resources will survive and continue to be accessible into the future
  • 41. • Metadata is structured information that – describes, – explains, – locates, or – otherwise makes it easier to retrieve, – use, – or manage • an information resource. • Metadata is often called – data about data or – Information about information.
  • 42. term metadata usage • Used differently in different communities. • Some use it to refer to machine understandable information, while • others use it only for records that describe electronic resources. In – the library environment, • Metadata is commonly used for any formal scheme of resource description, applying to any type of – object (digital or non-digital) • Traditional library cataloging is a form of metadata;
  • 43. MARC 21 & AACR • MARC 21 and the rule sets used with it, such as AACR2, are metadata standards. • Format for Bibliographic Data • MARC 21 (Machine Readable Cataloging)- 1999 Edition Update No. 1 (October 2000) through Update No. 21 (September 2015) - Library of Congress • AACR (Anglo American Cataloging Rules) and its allied products are published jointly by the American Library Association, the Canadian Library Association, and the Chartered Institute of Library and Information Professionals.
  • 44. • Metadata have also been developed to describe various types of textual and non- textual objects including – Published books – electronic documents – Archival finding aids – Art objects – Educational and training materials and – Scientific datasets
  • 45. Metadata Types There are three main types of metadata: • Descriptive metadata - describes a resource for purposes such as discovery and identification. • It can include elements such as – title, abstract, author, and keywords. • Structural metadata - indicates how compound objects are put together, for example, – How pages are ordered to form chapters. • Administrative metadata – provides information to help manage a resource, such as – when and how it was created, File type and other technical information, and who can access it.
  • 46. Metadata Types • There are several subsets of administrative data; two that sometimes are listed as separate metadata types are: – Rights management metadata : which deals with intellectual property rights, and – Preservation metadata : which contains information needed to archive and preserve a resource
  • 47. Aggregation • Metadata can describe resources at any level of aggregation. It can describe – a collection, – a single resource, or – a component part of a larger resource (for example, a photograph in an article). • Just as catalogers make decisions about whether a catalog record should be created for a whole set of volumes or for each particular volume in the set
  • 48. Storing Metadata • Metadata can be embedded in a digital object or it can be stored separately. • Metadata is often embedded in HTML documents and in the headers of image files.
  • 49. Storing Metadata • Storing metadata with the object – Ensures the metadata will not be lost, – obviates problems of linking between data and metadata, – and helps ensure that the metadata and object will be updated together • Storing metadata separately – can simplify the management of the metadata itself and – facilitate search and retrieval • Therefore, metadata is commonly stored in a database • system and linked to the objects described
  • 50. What Does Metadata Do? • An important reason for creating descriptive metadata is to facilitate discovery of relevant information • In addition to resource discovery, metadata can help organize electronic resources • Facilitate interoperability and legacy resource integration • Provide digital identification, and • support archiving and preservation
  • 51. Resource Discovery • Metadata serves the same functions in resource discovery as good cataloging does by: • Allowing resources to be found by relevant criteria • Identifying resources • Bringing similar resources together • Distinguishing dissimilar resources and • Giving location information
  • 52. Organizing Electronic Resources • As the number of Web-based resources grows exponentially, aggregate sites or portals are increasingly useful in organizing links to resources based on audience or topic. • Such lists can be built as static webpages, with the names and locations of the resources “hardcoded” in the HTML. • However, it is more efficient and increasingly more common to build these pages dynamically from metadata stored in databases. • Various software tools can be used to automatically extract and reformat the information for Web applications.
  • 53. Interoperability • Describing a resource with metadata allows it to be understood by both humans and machines in ways that promote interoperability. • Interoperability is the ability of multiple systems with different hardware and software platforms, data structures, and interfaces to exchange data with minimal loss of content and functionality. • Using defined metadata schemes, shared transfer protocols, and crosswalks (mapping) between schemes, resources across the network can be searched more seamlessly.
  • 54. Digital Identification • Most metadata schemes include elements such as standard numbers to uniquely identify the work or object to which the metadata refers. – The location of a digital object may also be given using a file name, URL (Uniform Resource Locator) – Some more persistent identifier such as a PURL (Persistent URL or URI) – DOI (Digital Object Identifier) • Persistent identifiers are preferred because object locations often change, making the standard URL (and therefore the metadata record) invalid. • In addition to the actual elements that point to the object, the metadata can be combined to act as a set of identifying data, differentiating one object from another for validation purposes.
  • 55. Archiving and Preservation • Most current metadata efforts center around the discovery of recently created resources. • However, there is a growing concern that digital resources will not survive in usable form into the future. – Digital information is fragile – It can be corrupted or altered, intentionally or unintentionally. – It may become unusable as storage media and hardware and software technologies change.
  • 56. Format Migration and Emulation • Format migration and perhaps emulation of current hardware and software behavior in future hardware and software platforms are strategies for overcoming these challenges. • Metadata is key to ensuring that resources will survive and continue to be accessible into the future. • Archiving and preservation require special elements – to track the lineage of a digital object (where it came from and how it has changed over time), – to detail its physical characteristics, and – to document its behavior in order to emulate it on future technologies.
  • 57. Structuring Metadata • Metadata schemes (also called schema) are sets of metadata elements designed for a specific purpose, such as – describing a particular type of information resource. • The definition or meaning of the elements themselves is known as the semantics of the scheme • The values given to metadata elements are the content • Metadata schemes generally specify names of elements and their semantics • Optionally, they may specify • content rules for how content must be formulated, for example, how to identify the main title, • representation rules for content , for example, capitalization rules, and • allowable content values, for example, terms must be used from a specified controlled vocabulary.
  • 58. Structuring Metadata • There may also be syntax rules for how the elements and their content should be encoded. • A metadata scheme with no prescribed syntax rules is called syntax independent. • Metadata can be encoded in any definable syntax. • Many current metadata schemes use – SGML (Standard Generalized Mark-up Language) or – XML (Extensible Mark-up Language). • XML, developed by the World Wide Web Consortium (W3C), is an extended form of HTML that allows for locally defined tag sets and the easy exchange of structured information. • SGML is a superset of both HTML and XML and allows for the richest mark-up of a document.
  • 59. Metadata Schemes and Element Sets • Many different metadata schemes are developed in a variety of user environments and disciplines. • Some of the most common ones are – Dublin Core Metadata Initiative (DCMI) – The Text Encoding Initiative (TEI) – Metadata Encoding and Transmission Standard (METS) – Metadata Object Description Schema (MODS) – Learning Object Metadata – E-Commerce – <indecs> and ONIX – Visual Objects – CDWA and VRA – MPEG Multimedia Metadata – Metadata schemes for datasets
  • 60. Dublin Core • The Dublin Core Metadata Element Set arose from discussions at a 1995 workshop sponsored by OCLC and the National Center for Supercomputing Applications (NCSA). • As the workshop was held in Dublin, Ohio, the element set was named the Dublin Core. • The continuing development of the Dublin Core and related specifications is managed by the Dublin Core Metadata Initiative (DCMI). • The original objective of the Dublin Core was to define a set of elements that could be used by authors to describe their own Web resources.
  • 61. Dublin Core Example Title=”Metadata Demystified” Creator=”Brand, Amy” Creator=”Daly, Frank” Creator=”Meyers, Barbara” Subject=”metadata” Description=”Presents an overview of metadata conventions in publishing.” Publisher=”NISO Press” Publisher=”The Sheridan Press” Date=”2003-07" Type=”Text” Format=”application/pdf” Identifier=”http://www.niso.org/standards/resources/Metadata_Demystified.pdf” Language=”en”
  • 62. The Text Encoding Initiative (TEI) • The Text Encoding Initiative is an international project to develop guidelines for marking up electronic texts such as novels, plays, and poetry, primarily to support research in the humanities. In addition to specifying how to encode the text of a work, the TEI Guidelines for Electronic Text Encoding and Interchange also specify a header portion, embedded in the resource, that consists of metadata about the work. The TEI header, like the rest of the TEI, is defined as an SGML DTD (Document Type Definition)— a set of tags and rules defined in SGML syntax that describe the structure and elements of a document. • This SGML mark-up becomes part of the electronic resource itself. Since the TEI DTD is rather large and complicated in order to apply to a vast range of texts and uses, a simpler subset of the DTD, known as TEI Lite, is commonly used in libraries. • It is assumed that TEI-encoded texts are electronic versions of printed texts.
  • 63. Metadata Encoding and Transmission Standard (METS) • The Metadata Encoding and Transmission Standard (METS) was developed to fill the need for a standard data structure for describing complex digital library objects. • METS is an XML Schema for creating XML document instances that express the structure of digital library objects, the associated descriptive and administrative metadata, and the names and locations of the files that comprise the digital object.
  • 64. Metadata Object Description Schema (MODS) • The Metadata Object Description Schema (MODS) is a descriptive metadata schema that is a derivative of MARC 21 and intended to either carry selected data from existing MARC 21 records or enable the creation of original resource description records. • It includes a subset of MARC fields and uses language based tags rather than the numeric ones used in MARC 21 records. • In some cases, it regroups elements from the MARC 21 bibliographic format. • Like METS, MODS is expressed using the XML schema language. • Although the MODS standard can stand on its own, it may also complement other metadata formats.
  • 65. A MODS Record Example <mods> <titleInfo> <title>Metadata demystified</title> </titleInfo> <name type=”personal”> <namePart type=”family”>Brand</namePart> <namePart type=”given”>Amy</namePart> <role> <roleTerm authority=”marcrelator” type=”text”>author</roleTerm> </role> </name> <typeOfResource>text</typeOfResource> <originInfo> <dateIssued>2003</dateIssued> <place> <placeTerm type=”text”>Bethesda, MD</placeTerm> </place> <publisher>NISO Press</publisher> </originInfo> <identifier type=”isbn”>1-880124-59-9</identifier> </mods>

Editor's Notes

  1. This is an example of snp data in the 20110521 release. The info field contains information like the Ancesteral allele, allele count and number and Allele Frequency. The genotype field always first contains the individual genotype first which is a an index on an array of the reference and alternative alleles. Normally there is only 0 and 1 but if the variant is multi allelic there will be higher indexes too. The pipe symbol indicates this is a phased genotype, unphased genotypes are delimited by /. The other fields in the genotype column are generally measures of the genotype quality. In this instance the second field is a dosage measure from Mach/Thunder and the third field is a genotype likelihood giving a log likelihood for the 3 possible genotypes RR,RA,AA.