Internet Search and DRM Issues

Presented By
Dr. Jayant M. Nandagaoli
Head, Department of Library & Information Science
(jayantnandagaoli@gmail.com)

• * Information search is systematic & exhaustive search
for published or unpublished material on specific
subject/topics.
• * This is intermediate stage between reference work
& research.
• * It is often the 1st step in a research project.
• * It is an systematic investigation to get desirable
information.
• * It can be carried out by manually, computer based-Offline
and Online also.
• * This undertaken against certain information needs and
problems to overcome.

• * Ascertaining the purpose, scope & depth of
information need, to be searched : -
* Facet analysis: - Key terms and relationship.
* Formulation of search strategy:- Plan or course
of action.
• * Selection, collection and search of information
sources: - Tertiary to Primary source
* Presentation of results:-

• There are several types of resources available such as
• * Electronic Journals and Newsletters
• * Directories of WWW Electronic Journals
• * Online Indexes of Print or Electronic Journals
• * Table of Contents
• * Preprints and Working Papers
• * Discussion Lists or Forums/Usenet Newsgroups
• * Directories of Newsgroups and Mailing Lists.
• * Subject Databases
• * Campus Wide Information Systems (CWIS)

• * Technical Reports
• * Library Catalogues
• * Patents
• * Document Delivery
* Reference Sources
• * Courseware Directories
• * Digital Libraries
• * Software Archives
• * Data Archives
• * Blogs, Youtube & other Social Networking sites
• * Other Resources, etc.

* Search engine A web search engine is designed to search for
information on the World Wide Web and FTP servers. The
search results are generally presented in a list of results and
are often called hits.
* The information may consist of web pages, images,
information and other types of files. Some search engines also
mine data available in databases or open directories. Unlike
web directories, which are maintained by human editors,
search engines operate algorithmically or are a mixture of
algorithmic and human input.

Internet search engines are special sites on the Web that are
designed to help people find information stored on other sites.
There are three pieces of software that together make up a
search engine: the spider software, the index software and the
query software. There are differences in the ways various search
engines work, but they all perform three basic tasks:
1) They search the Internet or select pieces from the Internet based
on important keywords.
2) They keep an index of the words they find and where they find
them.
3) They allow users to look for words or combinations of words
found in that index.

A search engine operates, in the following order
Crawling
Follow links to find information
Indexing
Record what words appear where
Ranking
What information is a good match to a user query?
What information is inherently good? (SEO)

1. Spiders: To find information on the hundreds of
millions of Web pages that exist, a search engine
employs special software robots, called spiders, to
build lists of the words found on Web sites.
"Spiders" take a Web page's content and create key
search words that enable online users to find pages
they're looking for.
2. Crawling: When a spider is building its lists, the
process is called Web. In order to build and maintain a
useful list of words, a search engine's spiders have to
look at a lot of pages.

3. Indexing: For fast accessing of data.
4. Meta tag: The contents of each page are then analyzed
to determine how it should be indexed (for example,
words are extracted from the titles, headings, or
special fields).

 The spider software 'crawls the web looking for new pages to
collect and add to the search engine indices’. The difference is that
the spider doesn't collect images or formatting - it is only interested
in text and links AND the URL, (for example, http://www. Unique-
Resource-Locator.html) from which they come. it doesn't display
anything and it gets as much information as it can is the shortest time
possible.
 The index software catches everything the spider can throw at it
(yes, that's another metaphor). The index makes sense of the mass of
text, links and URLs using what is called an algorithm - a complex
mathematical formula that indexes the words, the pairs of words and
so on. The spider takes the information it has gathered about a web
page and sends it to the index software where it is analyzed and
stored.

 The query software - it is the front end of what everybody
thinks of as a search engine. It may look simple but the query
software presents the results of all the quite remarkable spider
and index software that works away invisibly on our behalf.
So, when you type in your search words and hit search, then
the search engine will try to match your words with the best,
most relevant web pages it can find by 'searching the web, But
this too is a metaphor and perhaps the most important one.
 The query software doesn't actually search the web - instead, it
checks through all the records that have been created by its
own index software. And those records are made possible by
the text, links and URL material the spider software collects.

 When someone types chocolate into the query box on a search engine page
(such as Google), then it's time for the query software to go to work.

 Keyword Density: The ratio of the number of
occurrences of a word or phrase on a page to the total
number of words on the page.

* Research has shown that, about 80% of internet traffic
is generated through search engines.
* Approximately 75% of the users staying only on the 1st
page of the search results and only about 20% of the
users go ahead to the 2nd page of the search result.
* Every search engine has limitation as to coverage.
* Some have compromised search with economics i.e.
becoming little more than advertisers .
* Search engines are also many times victims of spam
indexing affecting what is included and how ranked.

• Internal search: An internal search can only be used
to find content on a single website (or intranet or
extranet).
• External or public search: A public search can be
used to find content on any website, anywhere on the
web.
• Meta search engine: meta search engine uses the
indexes of other search engines to find content,
anywhere on the web.

Look for links to an “About” page the information should
be specific and should include:
* A name of someone responsible for the information
* A description of what the organization or individual
does
* Where they are located and who they serve
* Specific contact information
* Names of executive members and their roles if it is
an organization or business

* Every word matters- All the words you put in the
search box will be used.
* Search is always case insensitive- A search for
[ugc.ac.in] is the same as a search for [UGC]
* Generally, punctuation is ignored, including
@#$%^&*()=+[]/ and other special characters.

• Ex.
Ex. http://www.gov.ns.ca/premier/
• { http://}{www.gov.ns.ca/}{premier/}
{ http://}{www.gov.ns.ca/}{premier/}
• ↓ ↓ ↓
↓ ↓ ↓
Html web page Home Page Directory or
Html web page Home Page Directory or
Subordinate Page
Subordinate Page

• Web address, or URL (Universal Resource Locator):
http://www.sc.edu/beaufort/library/pages/bones/lesson1.shtml
• "http" means hypertext transfer protocol and refers to the format
used to transfer and deal with information
•
• "www" stands for World Wide Web and is the general name for the
host server that supports text, graphics, sound files, etc. (It is not an
essential part of the address, and some sites choose not to use it)
• "sc“ is the second-level domain name and usually designates the
server's location, in this case, the University of South Carolina
• "edu" is the top-level domain name
• "beaufort" is the directory name
• "library“ is the sub-directory name
• "pages" and "bones" are the folder and sub-folder names
• "lesson1" is the file name
• "shtml" Is the file type extension and, in this case, stands for
"scripted hypertext mark-up language" (that's the language the
computer reads). The addition of the "s" indicates that the server will
scan the page for commands that require additional insertion before the
page is sent to the user.

• Only a few top-level domains are currently recognized,
but this is changing. Here is a list of the domains
generally accepted by all:
• .edu -- Educational site (usually a university or
college)
• .com -- Commercial business site
• .gov -- Governmental/non-military site
• .mil -- Military sites and agencies
• .net -- Networks, internet service providers,
organizations
• .org -- Non-profit organizations and others .

• In mid November 2000, the Internet Corporation for
Assigned Names and Numbers (ICANN) voted to
accept an additional seven new suffixes, which are
expected to be made available to users :
• .aero -- restricted use by air transportation industry
• .biz -- general use by businesses
• .coop -- restricted use by cooperatives
• .info -- general use by both commercial and non-
commercial sites
• .museum -- restricted use by museums
• .name -- general use by individuals
• .pro -- restricted use by certified professionals and
professional entities

• * Keep it simple- Most queries do not require
advanced operators or unusual syntax. Enter exact
name.
* Think how the page you are looking for will be
written- It is not human, it’s a program that match
the given words. Use the words that are most likely
to appear on the page.
* Describe what you need with as few terms as
possible- Since all words are used, each additional
word limits the result.

• * Surround searches in quotes- For multiple common
words group query by quotes e.g. computer and
help use “computer help”
* Avoid Stop words- If not important, don’t enter
them e.g. instead of why does my computer not
boot use “computer” and “boot”.
* To avoid undesirable concept- Not required contain
can be avoid by using (-) sing e.g. “computer
help”-windows

• * Fill in the blank (*)- include * asterisk within
query which considered as a placeholder for any
unknown term & gives the best matches e.g.
google* will give result about many of google’s
products.
* Search exactly (+)- Usually considers
synonyms such as Orphan- Abandoned Child.
In order to avoid term +orphan will match the
word precisely.

 Search for an exact word or phrase "search query“:- Use quotes to search
for an exact word or set of words. This option is handy when searching for
song lyrics or a line from literature. "imagine all the people“
 Search within a site or domain site: query:-looking for more results from a
certain website, include site: in your query. For example, you can find all
mentions of "olympics" on the New York Times website like this:
olympics site: nytimes.com or olympics site:.gov
 Search for pages that link to a URL link:query:- Using the link: operator,
you can find pages that link to a certain page. For example, you can find all
the pages that link to google.com. link:google.com or link:google.com/images
 Search for pages that are similar to a URL related:query:-
related:nytimes.com
 Search for either word query OR query:- world cup location 2014 OR 2018
 Search for a number range number..number:-Separate numbers by two
periods without spaces (..) to see results that contain numbers in a given range
of things like dates, prices, and measurements. camera $50..$100

 Copyright is a right given by the law to creators of literary,
dramatic, musical and artistic works and producers of
cinematograph films and sound recordings.
 Three conditions must be met before copyright protection:
 The work must be original.
 The work must be fixed, or presented in a tangible form
such as writing, film, or photography.
 A qualified person must create the work. A qualified
person is one living in a country that is a member of the
Universal Copyright Convention (UCC) or the Berne
Convention.

IPR
Industrial
Property
Artistic &
Literary Rights
Patent
Industrial
Designs
Trademarks
Copyright

 Copyright is concerned with the rights of authors, composers,
artists and other creators in their works. Copyright grants them
the right, for a limited period of time, to authorize or prohibit
certain uses of their works by others. These rights encompass
basically two aspects
Economic
 The main aim of copyright is to provide a stimulus for
creativity – ensuring economic returns on the creation and
protection from violation of the creation.
Moral
 Moral rights generally cover the right of “paternity” by which
authors have the right to claim authorship of their works,
ensuring that their names are mentioned in connection with
them.

 Copyright is in essence a bundle of rights covering the following:
 Rights for reproduction, i.e. exclusive rights to make copies of the
work.
 Rights for modification/adaptation, i.e. exclusive rights to modify
and make adaptations and create derivative works.
 Rights for distribution, i.e. the rights distribute the work to the
public.
 Rights for public performance, i.e. the right to recite, play, dance, or
act with or without the aid of a machine.
 Rights for public display, i.e. the right to display the work anywhere
that is open to the public

 Section 52 of the Indian Copyright Act 1957 indicates provisions what are not
legally infringement
 Fair use of the work are permitted for
 Research
 Non-commercial Private study
 Criticism, review, reporting of current events in newspapers/ periodicals/
broadcasting
 Classroom teaching or examination purpose
 reproduction for use of public libraries (not more than 3 copies)
 Reproduction of an unpublished work kept in a library subject to certain
conditions
 Publication in a collection for the use of educational institutions in certain
circumstances
 In the case of computer programmes
 utilize the computer programme for the purpose for which it was supplied
 to make back up copies purely as temporary protection against loss, destruction or
damage

 Piracy of copyrighted materials and demand for a stronger intellectual
property rights is not a new phenomenon. But digitization has made the
piracy much easier
 Easy reproduction
 Very less cost of reproduction
 Easier substitutability of digitized copies
 Inexpensive dissemination of digital content
 The most important aspect of digital content is that access to the content is
synonymous with control of the content which added with the low cost of
content reproduction and dissemination causes virtual loss of ownership in
terms of the content’s economic value. This is a major problem for the
content owners and content industries.
 In 2006, Copyright office in India posted proposals to amend the
Copyright Act, 1957 on its website. One of the proposed amendments
seeks to introduce the Digital Rights Management (DRM) in the Indian
copyright law. The purpose for such introduction in the Indian copyright
laws has been to “keep pace with national and international developments
and advance in technologies.

 The Digital Millennium Copyright Act (DMCA) is a
controversial United States digital rights management (DRM)
law enacted October 28, 1998.
 The intent behind DMCA was to create an updated version of
copyright laws to deal with the special challenges of regulating
digital material. Broadly, the aim of DMCA is to protect the
rights of both copyright owners and consumers.
 The law complies with the World Intellectual Property
Organization (WIPO) Copyright Treaty and the WIPO
Performances and Phonograms Treaty, both of which were
ratified by over 50 countries around the world in 1996.

 With advent of ICT we are moving towards thick copyright Where
content industries are adopting copyright management technology
measures for the protection of intellectual property of digital works-
like e-books, e-journals, databases, computer programs, movies,
music etc. This is referred to as Digital Rights Management, or
DRM.
 Digital Rights Management is a collective name for technologies or
a range of techniques that prevent one from using a copyrighted
digital work beyond the degree to which the copyright owner (or a
publisher who may not actually hold a copyright) wishes to allow
one to use it . It is actually a range of techniques that use
information about rights and rights holders to manage copyright
material and the terms and conditions on which it is made available
to users.

Digital rights management is a far-reaching term that refers to any
scheme that controls access to copyrighted material using
technological means. In essence, DRM removes usage control from
the person in possession of digital content and puts it in the hands of a
computer program. The applications and methods are endless -- here
are just a few examples of digital rights management:
 A company sets its servers to block the forwarding of sensitive e-mail.
 An e-book server restricts access to, copying of and printing of
material based on constraints set by the copyright holder of the
content.
 A movie studio includes software on its DVDs that limits the number
of copies a user can make to two.
 A music label releases titles on a type of CD that includes bits of
information intended to confuse ripping software.

 Management of Digital Rights: The responsibility of expressing and
managing the rights to content in electronic or digital form, as a corollary
to content in print.
 Digital management of rights: The ability to physically manage
intellectual property and proprietary rights in content by way of an
electronic system or process, associated with copyright management
systems.
DRM
Management of digital rights
Digital management of rights

Components of DRM
Secure Containers
They make content inaccessible to those users that are not
authorized to access the content. These containers mainly rely
on cryptographic algorithms such as DES or AES. Eg. Inter
Trust’s DigiFile, and Microsoft’s file format for e books, etc.
the distributor,
the clearinghouse
the consumer.
 Usually a DRM system is integrated with an e-commerce
system that handles financial payments and triggers the
function of the clearinghouse

Rights Expressions
The Rights entity allows expressions to be made about the allowable
permissions, constraints, obligations, and any other rights-related
information about Users and Content. Hence, the Rights entity is
critical because it represents the expressiveness of the language that
will be used to inform the rights metadata.
 Content Identification and Description System
They help uniquely identify the content (eg. International Standard
Book Number) and associate descriptive metadata with the content.
 Payment Systems
The systems that enable the monetary transactions need to be a part
of the secure and trusted system in order for the system to operate.
 Watermarking and Fingerprinting
These set of technologies, often referred as forensic technologies, are
related to identification of content.

 Identification of People and Organization
Not only does a rights owner need to associate a claim of ownership
with the content but also the consumer will need to be uniquely
identified. Such user identification systems are a prerequisite for DRM
systems to be able to limit content access to legitimate users.
 Authentication Systems
The DRM requires algorithms to authenticate the person or
organization that wants to interact with any content. This function will
involve cryptographic algorithms and may need an agency that issues
electronic certificates often referred as “Trusted Third Party” or TTP.
 Event Reporting
A mechanism to report events such as the purchase of a piece of
content is important to allow event-based payments to be processed.
These event-based payments are examples of new business models that
DRM can enable.

Benefits of DRM:-
• Secure e book distribution
• Content authenticity
• Transaction non-repudiation
• Market participant identification
• Protection of digital content

 New Prohibitions On Circumvention Of Protection
Technologies:-
1. Prohibits the "circumvention" of any effective "technological
protection measure" (e.g., a password or form of encryption)
used by a copyright holder to restrict access to its material.
2. Prohibits the manufacture of any device, or the offering of
any service, primarily designed to defeat an effective
"technological protection measure“.
3. Defers the effective date of these prohibitions for two years
and 18 months, respectively.

 New Prohibitions On Circumvention Of Protection
Technologies:-
4. Requires that the Librarian of Congress issue a three-year waiver
from the anti-circumvention prohibition when there is evidence that
the new law adversely affects or may adversely affect "fair use" and
other non-infringing uses of any class of work.
5. Expressly states that many valuable activities based on the "fair use"
doctrine (including reverse engineering, security testing, privacy
protection and encryption research) will not constitute illegal "anti-
circumvention“.

· Limitations On Online Service Provider Liability:-
1. Exempts any OSP or carrier of digital information (including
libraries) from copyright liability because of the content of a
transmission made by a user of the provider's or carrier's
system (e.g., the user of a library computer system).
2. Establishes a mechanism for a provider to avoid copyright
infringement liability due to the storage of infringing
information on an OSP's own computer system, or the use of
"information location tools" and hyperlinks, if the provider
acts "expeditiously to remove or disable access to" infringing
material identified in a formal notice by the copyright holder

 Digital Preservation:-
1. expressly permit authorized institutions to make up to three,
digital preservation copies of an eligible copyrighted work.
2. electronically "loan" those copies to other qualifying
institutions.
3. permit preservation, including by digital means, when the
existing format in which the work has been stored becomes
obsolete.

Internet Search and DRM Issues

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Internet Search and DRM Issues

Similar to Internet Search and DRM Issues (20)

Recently uploaded

Recently uploaded (20)

Internet Search and DRM Issues