Today we’re going to let 2 popular applications go mano-a-mano against each other.
CAN DO THIS ON WEB starting at HSLS home page.
In this corner, from tech giant Google, we have the challenger, Google Scholar. It has been called “Google for Grownups.” It’s different from original Google in that we’re searching just the scholarly literature, not the entire universe of the Web. But it has the same undemanding, simple interface we all love in original Google.
In the other corner, from the National Library of Medicine, we have MEDLINE, the long-time defending champ. Available with different search engines developed by different organizations
Today I’m mostly going to use PubMed as my example of MEDLINE.
But HSLS also subscribes to Ovid MEDLINE--another search engine for the same core set of records.
To briefly review MEDLINE’s origins…
Produced by National Library of Medicine (NLM), one of the National Institutes of Health Available with different search engines developed by different organizations
Many people think that PubMed and Ovid MEDLINE consist of millions of full-text articles. But that’s not true.
Think about it: MEDLINE was created in 1966—long before the Internet, let alone full text. Links to full text have been added to PubMed and Ovid MEDLINE, but the articles themselves aren’t part of MEDLINE
MEDLINE is a structured database—a set of records—about articles. DB=records x fields Title Authors Journal Year Abstract etc….
In an increasing number of cases, though, bringing up the record then lets you link to full text if your library has a subscription to the journal in question.
Fields in a record (this is from the Ovid MEDLINE interface): Note that one of the fields is MeSH Subject Headings—more about that in a minute.
Here is the same MEDLINE record as it appears in Ovid MEDLINE and in PubMed. Again, the database record--the content—is the same. Difference is in the interface (appearance, search engine, features and tools).
It’s easy to see which journals are included in MEDLINE.
Accessing the Journals database from PubMed
There is a record for each journal Includes standard title abbreviation format Information about coverage (whether and how far back it’s included in MEDLINE) is clearly stated.
MeSH= Organized hierarchical system of standardized terms used to index all articles
Does take a little time to learn how it works, but then you can sit back and let it do the heavy lifting
Makes searching easier: you can use the MeSH terms to search just the subject headings and not worry about the different synonyms that authors may have used in their titles or abstracts.
c.f. “Textword” or “keyword” searching (searching for a particular string of characters)
Show Related Articles fxn in PubMed. Good example of special features unique to a specific interface or search engine. PubMed’s Related Articles fxn is very good, but Ovid MEDLINE’s is weak.
All articles on the same topic are indexed with the same term, even if their authors use different terms or term variations. This consistency makes searching easier: you can use the MeSH terms to search just the subject headings and not worry about the different synonyms that authors may have used in their titles or abstracts.
EXPLAIN STRING OF CHARACTERS
Show MeSH browser with SCOPE NOTE giving definition. Search PLAQUE in GS.
You could show this live, too.
Hierarchical controlled vocabulary
CVA (cerebrovascular accident) => MeSH Stroke
MeSH allows concept-based as well as keyword searching:
Not just same string of letters as what you type in, but related to concept you enter
It takes time…: May be one week or many months, depending on the journal
Now, PubMed is one version of MEDLINE that does include brand new articles, transmitted by the publishers at publication time. But the records for those articles don’t get indexed with MeSH terms immediately.
NOW DO FAUCI SEARCH
EXAMPLE: Search fauci -> The human immunodeficiency virus: infectivity and mechanisms of pathogenesis -> group of 5 »
No road map; hard to tell what you’re looking at since you get different kinds of records for the same item (not just multiple full text links for same record!!!)
EXAMPLE: Search human immunodeficiency virus infectivity/fauci/1988
Results: 1 article and 2 article citations
Further link to 6 versions, some of which are full text and some of which are records for the article with links to full text, some of which are just records for the article
Now do searches:
[Build] PubMed: "smiling"[mh] AND "esthetics, dental"[MeSH Terms] NO LIMITS, though we could restrict to English, for example Change smiling to [majr] => more precise search, fewer results
[Do in Advanced Scholar Search] GS: smile esthetic in title Since relevance ranked, not surprising that top articles are not particularly new So add: Articles excluding patents + since 2000 But change to <smile aesthetic> and results are different.
Covers multidisciplinary literature and non-journal sources, though not unique in these regards. Scopus and ISI Web of Science are multidisciplinary, too. CINAHL and PsycINFO are examples of databases that include both journal and non-journal literature Full-text searching Lets you search for specific details in the article itself (place, substance, personal names) Relevancy ranking Per GS documentation: where it was published, who it was written by, as well as how often and how recently it has been cited in other scholarly literature.
Recent article by Dolores Judkins in Medical Library Association News Expert Searching column
Retrieval is generally huge (though more specialized so less tsunami-like than regular Google)
Only first 1000 citations can be viewed. While you may feel this is plenty, you have no way of knowing whether you’re seeing 1000 out of 1500 or 1000 out of 15,000.
This brings up a larger problem…
Number of records and date range of coverage are and scope of records unknown Nobody knows which or how many journals are included. There is no sense of whether all articles from a given journal are included or not. There is no sense of how far back in time the coverage goes for a given journal. If database coverage isn’t well-documented, how do you know what you’re missing?
You can limit results to a subject area such as “Medicine, Pharmacology, and Veterinary Science” or “Biology, Life Sciences, and Environmental Science,” but no information is available on how these are defined.
Comparing Scholar to MEDLINE, what you most notice is the absence of standardized terms and quality control.
No “controlled vocabulary” a la MeSH You’re on your own with trying to think up all the synonyms, all the different terms and phrases an author might use to evoke a particular context. Intangible concepts like “dental esthetics” can be hard to translate into synonyms.
No consistent format for journal titles - Need to search on full title and maybe several possible abbreviations to find everything ***DEMO Nature Genetics***Nature Genetics 10,500,Nat Genet 3,080 PubMed: 5461 for both With PubMed, same number no matter which form of the title you use. This is another example of the way terms in PubMed are standardized so you don’t need to think up all the possible variations. Parser = GS’ program for analyzing syntax and breaking text down into meaningful subunits.
New article in Online Information Review by long-time GS critic Peter Jasco of University of Hawaii (Jascó P. Metadata mega mess in Google Scholar. Online Information Review 2010;34(1):175-191. )
His article looks especially at the problem of using GS to evaluate author impact. Recommendation: Don’t use it for that purpose! Examples of article section names and other prominently placed terms parsed as author names Here we have the record for an item by co-authors Objectives, Policy, and Disclosure Another by Registered, Please Login, Options and Access
So if GS is used to evaluate author impact, these ghost authors would either get or share the credit for the articles involved. Jasco singles out records from the Lancet journals as having been particularly badly mauled. These examples are ones I found on GS just last week. But some of Jasco’s examples can’t be replicated now. He claims that Google has a habit of quickly fixing such errors when he and others report on them.
There are records with phantom publication years, too, taken from other numeric data (such as volume number, author street address.
Per Catherine Arnott Smith: [GS] will give you results that are useful, but only in the same sense that putting a shovel into the ground and taking it out again will give you some dirt. More work on the user’s part is required. Scope and coverage are a mystery.
Includes Google Book records plus lots of other kinds of material. No road map; hard to tell what you’re looking at since you get multiple records for the same pub (not just multiple full text links for same record!!!)
Weakly defined: Not just scope but also: source of Related Articles list? (PubMed’s is based on experimental research and documented in the medical informatics literature) Multi-format = not just journal articles
One thing GS is really handy for is finding full text of an article.
But there’s a newer kid on the block: Pubget--full-text’s killer app. Automatically finds PubMed article PDFs, aided by your choice of institutional setting.
Keepers = your personal library. If there’s an available PDF, it appears as an icon with the record. Better than EN or RW b/c you don’t have to find or click through to or attach the PDF—it just appears for you.
I’ve had better luck using it in some browsers than others (Chrome>IE>FF). I had all of Adobe Acrobat installed, not just Adobe Reader. That was a problem.
Pubget PaperPlane – Drag to browser toolbar for one-click access to PDF from PubMed Abstract display
Search pump a1c Refine with more->adolescent Filter by Journal sources – MEDLINE/PubMed
Google Scholar vs. MEDLINE for Health Sciences Literature Searching
for Health Sciences
Patricia M. Weiss, MLIS
Health Sciences Library System
University of Pittsburgh
March 15, 2010
• Largest database of indexed journal
citations for health sciences literature
–Indexed = Including standard topic
• >16 M citations from 5400 journals,
• Produced by National Library of
Medicine (NLM) (part of NIH)
Author Title Journal Publication
European Review for Medical
& Pharmacological Sciences
DL, Pass F
Delivery of insulin by jet
Diabetes Technology &
Zahn H My journey from w ool
research to insulin
Journal of Peptide Science 2000
Insulin, the molecule of the
Archives of Physiology &
MeSH (Medical Subject Headings)
• MEDLINE’s “Controlled vocabulary”
• Articles on the same topic indexed with
the same term, even if authors use
• Basis for excellent PubMed Related articles
Different Terms, Same MeSH
in laparoscopic resection
of gastric neoplasms.
MeSH headings for both titles:
Same Term, Different MeSH
The diagnosis of
Mechanism of senile
plaque formation in
You are searching the bulk of health
Easy to determine if a journal is included
and how far back it goes
Concept as well as textword searching
Standardized terms and formats
(concepts, journal names)
Limited primarily to health sciences
Searchable MEDLINE record does not
include full text.
Search results are ranked by date, not
MeSH has a learning curve, can be hard to
Human indexers can be inconsistent.
No citation data or direct export to
About Google Scholar
• Harvests full text with publisher
permission, then makes it fully searchable
• Journal articles plus books, theses,
abstracts, online repositories, and other
scholarly Web sources
• 2 different types of entries
– Main entries for publication itself
– [CITATION] entries for cited references that
GS cannot find online
Scholar does it, PubMed doesn’t
Searches full text
Search results are ranked by relevance.
Includes citation data and links to citing
Direct export to EndNote and RefWorks
(a manual procedure in PubMed)
• Retrieval is generally huge.
• Relevancy rankings based in part on #
times/cited may bias results toward older
• Can’t select citations to download or print
• Only first 1000 citations can be viewed.
Judkins DZ. So you want to use Google… MLA News 2010 Feb;50(2):14-15.
The Denominator Problem
Number and date range of records are
Coverage (which journals?) unknown
Search results are delivered in round
numbers (“about 127,000”)
Absence of Standardized Terms
No “controlled vocabulary”
(standardized concept terms) à la MeSH
No consistent format for journal titles
Parsing errors create phantom author
names and dates.
For Best Google Scholar Results…
• Use fewer precise search terms to
minimize retrieval; use synonyms to
• Configure Advanced Scholar Search to look
for them in title.
• Configure Scholar Preferences/Library
Links by entering University of Pittsburgh
Both Tools are Useful
GS is poorly defined and lacks consistency.
For serious research, GS is not a
replacement for MEDLINE.
GS does make it easy to find some articles
As a multidisciplinary and multi-format
resource, GS may present items not found