Google Scholar vs. MEDLINE for Health Sciences Literature Searching


Published on

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Today we’re going to let 2 popular applications go mano-a-mano against each other.

  • CAN DO THIS ON WEB starting at HSLS home page.

    In this corner, from tech giant Google, we have the challenger, Google Scholar.
    It has been called “Google for Grownups.”
    It’s different from original Google in that we’re searching just the scholarly literature, not the entire universe of the Web.
    But it has the same undemanding, simple interface we all love in original Google.
  • In the other corner, from the National Library of Medicine, we have MEDLINE, the long-time defending champ.
    Available with different search engines developed by different organizations

    Today I’m mostly going to use PubMed as my example of MEDLINE.

  • But HSLS also subscribes to Ovid MEDLINE--another search engine for the same core set of records.
  • To briefly review MEDLINE’s origins…

    Produced by National Library of Medicine (NLM), one of the National Institutes of Health
    Available with different search engines developed by different organizations
  • Many people think that PubMed and Ovid MEDLINE consist of millions of full-text articles. But that’s not true.

    Think about it: MEDLINE was created in 1966—long before the Internet, let alone full text.
    Links to full text have been added to PubMed and Ovid MEDLINE, but the articles themselves aren’t part of MEDLINE

    MEDLINE is a structured database—a set of records—about articles.
    DB=records x fields

    In an increasing number of cases, though, bringing up the record then lets you link to full text if your library has a subscription to the journal in question.
  • Fields in a record (this is from the Ovid MEDLINE interface):
    Note that one of the fields is MeSH Subject Headings—more about that in a minute.
  • Here is the same MEDLINE record as it appears in Ovid MEDLINE and in PubMed.
    Again, the database record--the content—is the same.
    Difference is in the interface (appearance, search engine, features and tools).
  • It’s easy to see which journals are included in MEDLINE.

    Accessing the Journals database from PubMed

    There is a record for each journal
    Includes standard title abbreviation format
    Information about coverage (whether and how far back it’s included in MEDLINE) is clearly stated.

  • MeSH= Organized hierarchical system of standardized terms used to index all articles

    Does take a little time to learn how it works, but then you can sit back and let it do the heavy lifting

    Makes searching easier: you can use the MeSH terms to search just the subject headings and not worry about the different synonyms that authors may have used in their titles or abstracts.

    c.f. “Textword” or “keyword” searching (searching for a particular string of characters)

    Show Related Articles fxn in PubMed.
    Good example of special features unique to a specific interface or search engine.
    PubMed’s Related Articles fxn is very good, but Ovid MEDLINE’s is weak.
  • All articles on the same topic are indexed with the same term, even if their authors use different terms or term variations.
    This consistency makes searching easier: you can use the MeSH terms to search just the subject headings and not worry about the different synonyms that authors may have used in their titles or abstracts.


  • Show MeSH browser with SCOPE NOTE giving definition.
    Search PLAQUE in GS.
  • You could show this live, too.

    Hierarchical controlled vocabulary

    CVA (cerebrovascular accident) => MeSH Stroke

  • MeSH allows concept-based as well as keyword searching:

    Not just same string of letters as what you type in, but related to concept you enter
  • It takes time…: May be one week or many months, depending on the journal

    Now, PubMed is one version of MEDLINE that does include brand new articles, transmitted by the publishers at publication time. But the records for those articles don’t get indexed with MeSH terms immediately.

    EXAMPLE: Search fauci -> The human immunodeficiency virus: infectivity and mechanisms of pathogenesis -> group of 5 »
  • No road map; hard to tell what you’re looking at since you get different kinds of records for the same item (not just multiple full text links for same record!!!)

    EXAMPLE: Search human immunodeficiency virus infectivity/fauci/1988

    Results: 1 article and 2 article citations

    Further link to 6 versions, some of which are full text and some of which are records for the article with links to full text, some of which are just records for the article
  • Now do searches:

    [Build] PubMed: "smiling"[mh] AND "esthetics, dental"[MeSH Terms]
    NO LIMITS, though we could restrict to English, for example
    Change smiling to [majr] => more precise search, fewer results

    [Do in Advanced Scholar Search] GS: smile esthetic in title
    Since relevance ranked, not surprising that top articles are not particularly new
    So add: Articles excluding patents + since 2000
    But change to <smile aesthetic> and results are different.
  • Covers multidisciplinary literature and non-journal sources, though not unique in these regards.
    Scopus and ISI Web of Science are multidisciplinary, too.
    CINAHL and PsycINFO are examples of databases that include both journal and non-journal literature
    Full-text searching
    Lets you search for specific details in the article itself (place, substance, personal names)
    Relevancy ranking
    Per GS documentation: where it was published, who it was written by, as well as how often and how recently it has been cited in other scholarly literature.
  • Recent article by Dolores Judkins in Medical Library Association News Expert Searching column

    Retrieval is generally huge (though more specialized so less tsunami-like than regular Google)

    Only first 1000 citations can be viewed. While you may feel this is plenty, you have no way of knowing whether you’re seeing 1000 out of 1500 or 1000 out of 15,000.

    This brings up a larger problem…

  • Number of records and date range of coverage are and scope of records unknown
    Nobody knows which or how many journals are included.
    There is no sense of whether all articles from a given journal are included or not.
    There is no sense of how far back in time the coverage goes for a given journal.
    If database coverage isn’t well-documented, how do you know what you’re missing?

    You can limit results to a subject area such as “Medicine, Pharmacology, and Veterinary Science” or “Biology, Life Sciences, and Environmental Science,” but no information is available on how these are defined.

  • Comparing Scholar to MEDLINE, what you most notice is the absence of standardized terms and quality control.

    No “controlled vocabulary” a la MeSH
    You’re on your own with trying to think up all the synonyms, all the different terms and phrases an author might use to evoke a particular context.
    Intangible concepts like “dental esthetics” can be hard to translate into synonyms.

    No consistent format for journal titles
    - Need to search on full title and maybe several possible abbreviations to find everything
    ***DEMO Nature Genetics***Nature Genetics 10,500,Nat Genet 3,080
    PubMed: 5461 for both
    With PubMed, same number no matter which form of the title you use.
    This is another example of the way terms in PubMed are standardized so you don’t need to think up all the possible variations.
    Parser = GS’ program for analyzing syntax and breaking text down into meaningful subunits.
  • New article in Online Information Review by long-time GS critic Peter Jasco of University of Hawaii
    (Jascó P. Metadata mega mess in Google Scholar. Online Information Review 2010;34(1):175-191. )

    His article looks especially at the problem of using GS to evaluate author impact.
    Recommendation: Don’t use it for that purpose!
    Examples of article section names and other prominently placed terms parsed as author names
    Here we have the record for an item by co-authors Objectives, Policy, and Disclosure
    Another by Registered, Please Login, Options and Access

    So if GS is used to evaluate author impact, these ghost authors would either get or share the credit for the articles involved.
    Jasco singles out records from the Lancet journals as having been particularly badly mauled.
    These examples are ones I found on GS just last week.
    But some of Jasco’s examples can’t be replicated now. He claims that Google has a habit of quickly fixing such errors when he and others report on them.

    There are records with phantom publication years, too, taken from other numeric data (such as volume number, author street address.
  • Per Catherine Arnott Smith: [GS] will give you results that are useful, but only in the same sense that putting a shovel into the ground and taking it out again will give you some dirt. More work on the user’s part is required. Scope and coverage are a mystery.

    Includes Google Book records plus lots of other kinds of material.
    No road map; hard to tell what you’re looking at since you get multiple records for the same pub (not just multiple full text links for same record!!!)

  • Weakly defined: Not just scope but also: source of Related Articles list? (PubMed’s is based on experimental research and documented in the medical informatics literature)
    Multi-format = not just journal articles
  • One thing GS is really handy for is finding full text of an article.

    But there’s a newer kid on the block: Pubget--full-text’s killer app.
    Automatically finds PubMed article PDFs, aided by your choice of institutional setting.

    Keepers = your personal library. If there’s an available PDF, it appears as an icon with the record.
    Better than EN or RW b/c you don’t have to find or click through to or attach the PDF—it just appears for you.

    I’ve had better luck using it in some browsers than others (Chrome>IE>FF). I had all of Adobe Acrobat installed, not just Adobe Reader. That was a problem.

    Pubget PaperPlane – Drag to browser toolbar for one-click access to PDF from PubMed Abstract display
  • Search pump a1c
    Refine with more->adolescent
    Filter by Journal sources – MEDLINE/PubMed
  • Google Scholar vs. MEDLINE for Health Sciences Literature Searching

    1. 1. Google Scholar vs. MEDLINE for Health Sciences Literature Searching Patricia M. Weiss, MLIS Health Sciences Library System University of Pittsburgh March 15, 2010
    2. 2. PubMed
    3. 3. Ovid MEDLINE
    4. 4. MEDLINE Refresher • Largest database of indexed journal citations for health sciences literature –Indexed = Including standard topic descriptors • >16 M citations from 5400 journals, 1949 - • Produced by National Library of Medicine (NLM) (part of NIH)
    5. 5. MEDLINE Database Author Title Journal Publication Gualandi- Signorini AM Insulin formulations--a review European Review for Medical & Pharmacological Sciences 2001 Bremseth DL, Pass F Delivery of insulin by jet injection: recent observations Diabetes Technology & Therapeutics 2001 Zahn H My journey from w ool research to insulin Journal of Peptide Science 2000 Ionescu- Tirgoviste C Insulin, the molecule of the century Archives of Physiology & Biochemistry 1996 RECORDS FIELDS (information categories)
    6. 6. Fields in a MEDLINE Record
    7. 7. NLM’s PubMed Ovid MEDLINE
    8. 8. Journals Database in PubMed
    9. 9. MeSH (Medical Subject Headings) • MEDLINE’s “Controlled vocabulary” • Articles on the same topic indexed with the same term, even if authors use different terms. • Basis for excellent PubMed Related articles function
    10. 10. Different Terms, Same MeSH Title #1 Treatment of gastric cancer. Title #2 Technical considerations in laparoscopic resection of gastric neoplasms. MeSH headings for both titles: Stomach Neoplasms
    11. 11. Same Term, Different MeSH Title #1 The diagnosis of plaque-induced periodontal diseases. MeSH heading: Dental Plaque Title #2 Mechanism of senile plaque formation in Alzheimer disease. MeSH heading: Senile Plaques
    12. 12. MeSH Tree Structures – 2010
    13. 13. MeSH Database in PubMed
    14. 14. MEDLINE Strengths  You are searching the bulk of health sciences literature.  Easy to determine if a journal is included and how far back it goes  Concept as well as textword searching  Standardized terms and formats (concepts, journal names)
    15. 15. MEDLINE Limitations  Limited primarily to health sciences journals  Searchable MEDLINE record does not include full text.  Search results are ranked by date, not relevance.  MeSH has a learning curve, can be hard to use.  Human indexers can be inconsistent.  No citation data or direct export to EndNote/RefWorks
    16. 16. About Google Scholar • Harvests full text with publisher permission, then makes it fully searchable • Journal articles plus books, theses, abstracts, online repositories, and other scholarly Web sources • 2 different types of entries – Main entries for publication itself – [CITATION] entries for cited references that GS cannot find online
    17. 17. Scholar does it, PubMed doesn’t Scholar Strengths  Multidisciplinary  Searches full text  Search results are ranked by relevance.  Includes citation data and links to citing references  Direct export to EndNote and RefWorks (a manual procedure in PubMed)
    18. 18. Scholar Limitations • Retrieval is generally huge. • Relevancy rankings based in part on # times/cited may bias results toward older literature. • Can’t select citations to download or print • Only first 1000 citations can be viewed. Judkins DZ. So you want to use Google… MLA News 2010 Feb;50(2):14-15.
    19. 19. Scholar Limitations The Denominator Problem  Number and date range of records are unknown.  Coverage (which journals?) unknown  Search results are delivered in round numbers (“about 127,000”)
    20. 20. Scholar Limitations Absence of Standardized Terms and Formats  No “controlled vocabulary” (standardized concept terms) à la MeSH  No consistent format for journal titles  Parsing errors create phantom author names and dates.
    21. 21. Phantom Authors in Scholar
    22. 22. For Best Google Scholar Results… • Use fewer precise search terms to minimize retrieval; use synonyms to increase retrieval. • Configure Advanced Scholar Search to look for them in title. • Configure Scholar Preferences/Library Links by entering University of Pittsburgh
    23. 23. Bottom Line: Both Tools are Useful  GS is poorly defined and lacks consistency.  For serious research, GS is not a replacement for MEDLINE.  GS does make it easy to find some articles quickly.  As a multidisciplinary and multi-format resource, GS may present items not found in MEDLINE.
    24. 24. Pubget + PaperPlane!
    25. 25. • Science-specific search engine • Finds reports, peer-reviewed article PDFs, patents, preprints, repositories, abstracts, more • Sources are listed by publisher and others • Advanced Search, search refine and filter options • Preferences: specify U of Pittsburgh - HSLS