E or P? A Comparative Analysis of
Electronic and Print Book Usage
        Electronic Resources & Libraries
                     Austin
                March 19, 2013
              Michael Levine-Clark
              Christopher C. Brown
Methodology
Duke University Press eBooks
•   Added October 2008
•   Loaded MARC records December 2008
•   Purchase all e/p
•   1,480 e-books
    – Frontlist approximately 120 per year
    – Backlist
• 2,416 p-books
    • Many predate the e-book collection
• 1,150 in both formats
The Data
• Gathered circ data
  – Through December 2008
  – Each subsequent December (2009-2012)
  – Cumulative
• Compiled e-book use data
  – At end of each year, 2009-2012
  – For each year
Apples and Oranges
• pBook checkouts
  – Undergrad: 3 weeks
  – Grad: 10 weeks
  – Faculty: 1 year
  – Potentially many uses per checkout, and some
    when deciding to checkout
• eBook use
  – One time in the book is one use
About Discovery and Data
• Discovery through catalog records
• Data merger issues:
  – Title variations
  – ISBN complexities
  – Multi-volume issues
E and P Typically Pattern Together in
               Results
                             Classic
                             Catalog




 Encore (next-gen)
 Catalog
Data Difficulties: Title Variations
                         Catalog Record             Vendor Record
Series used with title   The Sri Lanka reader :     World Readers : Sri
                         history, culture,          Lanka Reader : History,
                         politics / John Clifford   Culture, Politics
                         Holt, ed.
Series used with title   Julia Child's The French   Spin Offs : Julia Child's
                         chef / Dana Polan.         The French Chef
Word renderings          Present tense : rock &     Present Tense : Rock
                         roll and culture           and Roll and Culture
Spaces                   Percussion : drumming,     Percussion: Drumming,
                         beating, striking          Beating, Striking




Vendors and catalogers don’t necessarily agree on title formation.
This makes matching on title impossible.
More Title Variations
                      Catalog Record               Vendor Record
Gremlin characters    Affective communities :      Affective communities:
(diacritics)          anticolonial thought, fin-de-anticolonial thought, Fin-De-
                      si├ ¿cle radicalism, and the SiFcle radicalism, and the
                      politics of friendship       politics of friendship
Presence/Absence of   The life and traditions of the
                                                   Life and Traditions of the Red
Subtitles             Red man                      Man : A rediscovered
                                                   treasure of Native American
                                                   Literature
Title Discrepencies   A coincidence of desires :   Coincidence of Desire :
                      anthropology, queer studies, Anthropology, Queer
                      Indonesia                    Studies, Indonesia
Translated Titles     Desencuentros de la          Divergent Modernities :
                      modernidad en América       Culture and Politics in
                      Latina. English, Divergent   Nineteenth-Century Latin
                      modernities : culture and    America
                      politics in nineteenth-
                      century Latin America
Data Difficulties: Multiple ISBNs
Data Difficulties: ISBN 10? ISBN 13?
Data Difficulties: ISBN Irregularities




   Note the ISBN10 with ISBN 13, the parentheses, and the
   multiple ISBNs.
Data Solution: Create an ISBN 9




ISBN 9 eliminates the ISBN 13 – 978 prefix
and the final check-digit, creating a useable
match-point, in cases where the electronic
and print versions agree on base ISBN.
Data Difficulties: Too Many
               Sources
• Usage reports include only titles that are
  actually used
  – Needed to pull in unused titles from elsewhere
     • Different formats
Data Methodology using Microsoft Access
•    Get annual use stats of e-books from vendor
•    Get master list of e-titles from vendor.
•    Derive ISBN9 for each list for proper overlay
•    Overlay annual use stats onto master list of e-books
•    Get circ stats for print books from ILS
•    Derive master list of all print titles from ILS
•    Derive ISBN 9 for each p title.
•    Overlay annual circ stats onto master list of p-books
•    Merge circ and use data together
Summary of Data Issues
                                                   Vendor’s idea of title
Cataloger’s idea of title   ISBN Differences
                                                   Vendor records may
Catalog records contain     Title Differences:     contain ISBN 10 or
multiple ISBNs, 10 or       “and” vs “&”, etc.     ISBN 13
13 digit


      Print Books
       Print Books                                         E-Books
                                                            E-Books

Circulation stats purged    Can we compare         What does an e-usage
after a time                a print circ with an   mean?
What does a                 e-use?
“circulation” mean?
Data Conclusions
• Microsoft Access for overlays; Microsoft Excel
  for analysis
• Overlay on title is nearly impossible
• Better standards are needed – a single ISBN,
  please!
• Deriving an “ISBN9” was the only way to get
  anywhere, but even this was far from perfect
Usage
eBooks
• User Sessions
  – 588 titles used (39.7%)
  – 5,149 sessions
     • 8.8 per title used
     • 3.5 per title in the
       collection
  – 892 titles not used
• Pages Viewed
  – Total pages: 35,236
  – Average (for books
    used): 59.9
  – Highest: 2,861
eBooks
• Pages Printed
  –   68 titles
  –   Total pages: 3,244
  –   Average: 47.7 pages
  –   Highest: 380
• Pages Copied
  –   54 titles
  –   Total pages: 640
  –   Average: 11.9 pages
  –   Highest: 64
pBooks
• 1,528 titles used (63.2%)
• 903 titles used since Dec
  2008 (37.4%)
• 4,611 checkouts (2,930
  before Dec 2008)
   – 3.0 per title used
   – 1.9 per title
   – 1.1 per title (post 2008
     use)
   – 0.7 per title (post 2008)
Most Used eBooks, User Sessions
• Women and Gender Equity in Development
  Theory and Practice (2006)
  – 1,821 user sessions (1,706 in 2012)
  – 2,861 pages viewed (2,765 in 2012)
  – 380 pages printed (all in 2012)
  – 8 checkouts (6 since 2008)
• Date Which Will Live (2003)
  – 399 User Sessions (all in 2011-2012)
  – 494 pages viewed
  – 93 pages printed
  – 3 checkouts (1 each in 2009, 2011, 2012)
Most Used pBooks
• The Argumentative Turn in Policy Analysis and
  Planning (1993)
  – 37 checkouts (36 before Dec 2008)
  – 2 user sessions, 0 pages printed
• Displacing Whiteness: Essays In Social and
  Cultural Criticism (1997)
  – 24 checkouts (22 before Dec 2008)
  – 3 user sessions, 0 pages printed
Most Used pBooks Since 2008
• Kurosawa: Film Studies and Japanese Cinema
  (2000)
  – 19 checkouts (12 since 2008)
  – No eBook
• The Cinema of Naruse Mikio (2008)
  – 11 checkouts (all since 2008)
  – 6 user sessions, 45 pages viewed
• Vibrant Matter: A Political Ecology of Things
  (2010)
  – 11 checkouts (all since 2010)
  – 7 user sessions, 205 pages viewed
Dual Format Availability: A
        Preference for Print
• 1,150 titles available in both formats
• Print Use
  – 619 titles checked out since Dec 2008 (53.8%)
  – 825 titles checked out (including before Dec 2008)
    (71.7%)
• Electronic Use
  – 451 titles with user sessions (39.2%)
Dual Format Use
• 394 titles used in both formats
  – 4,221 user sessions (2,400 without the 1,821 use
    title)
     • 10.7 per title used (6.1)
  – 1,524 p-book checkouts (801 before Dec 2008)
     • 3.9 per title used (1.8 for uses since 2008)
  – 54 titles with pages printed (out of 68)
     • 7.4 pages per title used
  – 68.4 pages viewed on average
Dual Format Use post-2008
• 332 titles used in both formats
  – 3981 user sessions (2,160 without the 1,821 use
    title)
     • 12.3 per title used (6.7)
  – 712 p-book checkouts
     • 2.2 per title used
  – 48 titles with pages printed (out of 68)
     • 8.3 pages per title used
  – 72.0 pages viewed on average
P Used, E Not

• 431 titles
   – 1,004 checkouts
      • 2.3 per title used
   – 297 titles with
     checkouts since 2008
      • 479 checkouts
          – 1.6 per title used
E Used, P Not
• 57 titles
• 246 user sessions
   – 4.3 per title
• 906 pages viewed
   – 15.9 per title
• 3 titles with pages
  printed
eBook Use
eBook Use
Print Use
How Closely Are P/E Usage
        Linked?
Increased Checkouts, 2008-2012
• For titles available at the start of the project
  (Dec 2008), how many more checkouts were
  there by Dec 2012?
• Was that increase linked in any way to e-
  usage?
• Was it linked in any way to type of e-usage?
Increased Checkouts 2008-2012
• 686 titles with increased checkouts
  – Measuring titles available prior to Dec 2008
• 408 available in both formats
• 235 also had e-use
  – 15.5 user sessions per title
  – 81.2 pages viewed per title
Observations
• Use of E may lead to use of P
• Use of P doesn’t seem to lead to use of E
• If both formats are used,
  – they are both used at a higher rate than average
  – They have greater meaningful use as e-books
     • Pages viewed
     • User sessions
Thoughts
• If dual format usage is higher by all measures,
  does this mean that people’s preference is for
  good content, not format?
                       BUT
• When both formats are available, print is
  more likely to be used (53.8% vs 39.2%).
  – Does e-discovery drive p-use?
Does Subject Impact Use?
LC Class Example: B – Philosophy,
           Psychology, Religion
• 148 titles in print (6.1%   • 1.9 checkouts per title
  of all Duke print titles)     used (all print) (+0.8)
   – 64 titles checked out    • 2.3 checkouts per title
     since 2008 (7.1%)          (both formats used)
• 102 e-books (6.9%)            (+0.5)
   – 49 e-books used (8.3%)   • 5.1 user sessions per title
• 79 titles available in        (both formats used) (-1.0)
  both formats (6.9%)
   – 48 titles checked out
     (7.8%)
   – 40 e-books used (8.9%)
LC Class – Best & Worst in Print
  (Difference Between % of Collection
    and % of Checkouts – post 2008)
• P – Lang & Lit (n=579):     • M – Music (63): +2.4%
  -4.6%                       • F – Hist of Americas
• H – Soc Sci (515): -1.1%      (183): +1.5%
• J – Poli Sci (178): -0.8%   • G – Geog, Anth, Rec
• Q – Science (38): -0.7%       (82): +1.3%
• T – Technology (44):        • E – Hist of Americas
  -0.5%                         (140): +1.2%
                              • B – Phil, Psych, Rel
                                (148): +1.0%
LC Class – Best & Worst eBooks
  (Difference Between % of Collection
   and % of Titles with User Session)
• P – Lang & Lit (n=285):   • B – Phil, Psych, Rel
  -3.3%                       (102): +1.4%
• Q – Science (23): -0.5%   • F – Hist of Americas
• M – Music (64): -0.4%       (158): +0.9%
• T – Technology (22):      • N – Art (42): +0.7%
  -0.3%                     • D – History (107): +0.4%
• K – Law (40): -0.2%       • E – Hist of Americas
                              (98): +0.4%
LC Class – Best & Worst E &P
 (Difference Between % of Collection
and % of Titles Used – Both Available)
• P – Lang & Lit (n=231):   • E – Hist of Americas
  -3.9% P, -3.9% E            (81): +1.0% P, +0.9% E
• Q – Science (18): -0.6%   • B – Phil, Psych, Rel (79):
  P, -0.7% E                  +1.4% P, +2.0% E
• T – Technology (19):      • H - Soc Sci (260): +0.4%
  -0.5% P, -0.5% E            P, 0.0% E
• G – Geog, Anth, Rec       • D – History (87): +0.4%
  (52): -0.3% P, +2.0% E      P, -0.2% E
Two Oddities – E&P Available
• J – Poli Sci (54): +1.0% P, -0.5% E
• M – Music (48): +1.3% P, -0.8% E
Observations
• Some of the subject and format differences
  have to do with publication date
  – Lots of old social science material in print
• Some differences are surely local
• The sample size for most LC Classes is too
  small to be meaningful
Further Questions
• How does discovery play in?
• What might ILL/resource sharing tell us about
  demand for P when E is available?
Thank You

E or P? A Comparative Analysis of Electronic and Print Book Usage

  • 1.
    E or P?A Comparative Analysis of Electronic and Print Book Usage Electronic Resources & Libraries Austin March 19, 2013 Michael Levine-Clark Christopher C. Brown
  • 2.
  • 3.
    Duke University PresseBooks • Added October 2008 • Loaded MARC records December 2008 • Purchase all e/p • 1,480 e-books – Frontlist approximately 120 per year – Backlist • 2,416 p-books • Many predate the e-book collection • 1,150 in both formats
  • 4.
    The Data • Gatheredcirc data – Through December 2008 – Each subsequent December (2009-2012) – Cumulative • Compiled e-book use data – At end of each year, 2009-2012 – For each year
  • 5.
    Apples and Oranges •pBook checkouts – Undergrad: 3 weeks – Grad: 10 weeks – Faculty: 1 year – Potentially many uses per checkout, and some when deciding to checkout • eBook use – One time in the book is one use
  • 6.
    About Discovery andData • Discovery through catalog records • Data merger issues: – Title variations – ISBN complexities – Multi-volume issues
  • 7.
    E and PTypically Pattern Together in Results Classic Catalog Encore (next-gen) Catalog
  • 8.
    Data Difficulties: TitleVariations Catalog Record Vendor Record Series used with title The Sri Lanka reader : World Readers : Sri history, culture, Lanka Reader : History, politics / John Clifford Culture, Politics Holt, ed. Series used with title Julia Child's The French Spin Offs : Julia Child's chef / Dana Polan. The French Chef Word renderings Present tense : rock & Present Tense : Rock roll and culture and Roll and Culture Spaces Percussion : drumming, Percussion: Drumming, beating, striking Beating, Striking Vendors and catalogers don’t necessarily agree on title formation. This makes matching on title impossible.
  • 9.
    More Title Variations Catalog Record Vendor Record Gremlin characters Affective communities : Affective communities: (diacritics) anticolonial thought, fin-de-anticolonial thought, Fin-De- si├ ¿cle radicalism, and the SiFcle radicalism, and the politics of friendship politics of friendship Presence/Absence of The life and traditions of the Life and Traditions of the Red Subtitles Red man Man : A rediscovered treasure of Native American Literature Title Discrepencies A coincidence of desires : Coincidence of Desire : anthropology, queer studies, Anthropology, Queer Indonesia Studies, Indonesia Translated Titles Desencuentros de la Divergent Modernities : modernidad en América Culture and Politics in Latina. English, Divergent Nineteenth-Century Latin modernities : culture and America politics in nineteenth- century Latin America
  • 10.
  • 11.
  • 12.
    Data Difficulties: ISBNIrregularities Note the ISBN10 with ISBN 13, the parentheses, and the multiple ISBNs.
  • 13.
    Data Solution: Createan ISBN 9 ISBN 9 eliminates the ISBN 13 – 978 prefix and the final check-digit, creating a useable match-point, in cases where the electronic and print versions agree on base ISBN.
  • 14.
    Data Difficulties: TooMany Sources • Usage reports include only titles that are actually used – Needed to pull in unused titles from elsewhere • Different formats
  • 15.
    Data Methodology usingMicrosoft Access • Get annual use stats of e-books from vendor • Get master list of e-titles from vendor. • Derive ISBN9 for each list for proper overlay • Overlay annual use stats onto master list of e-books • Get circ stats for print books from ILS • Derive master list of all print titles from ILS • Derive ISBN 9 for each p title. • Overlay annual circ stats onto master list of p-books • Merge circ and use data together
  • 16.
    Summary of DataIssues Vendor’s idea of title Cataloger’s idea of title ISBN Differences Vendor records may Catalog records contain Title Differences: contain ISBN 10 or multiple ISBNs, 10 or “and” vs “&”, etc. ISBN 13 13 digit Print Books Print Books E-Books E-Books Circulation stats purged Can we compare What does an e-usage after a time a print circ with an mean? What does a e-use? “circulation” mean?
  • 17.
    Data Conclusions • MicrosoftAccess for overlays; Microsoft Excel for analysis • Overlay on title is nearly impossible • Better standards are needed – a single ISBN, please! • Deriving an “ISBN9” was the only way to get anywhere, but even this was far from perfect
  • 18.
  • 19.
    eBooks • User Sessions – 588 titles used (39.7%) – 5,149 sessions • 8.8 per title used • 3.5 per title in the collection – 892 titles not used • Pages Viewed – Total pages: 35,236 – Average (for books used): 59.9 – Highest: 2,861
  • 20.
    eBooks • Pages Printed – 68 titles – Total pages: 3,244 – Average: 47.7 pages – Highest: 380 • Pages Copied – 54 titles – Total pages: 640 – Average: 11.9 pages – Highest: 64
  • 21.
    pBooks • 1,528 titlesused (63.2%) • 903 titles used since Dec 2008 (37.4%) • 4,611 checkouts (2,930 before Dec 2008) – 3.0 per title used – 1.9 per title – 1.1 per title (post 2008 use) – 0.7 per title (post 2008)
  • 22.
    Most Used eBooks,User Sessions • Women and Gender Equity in Development Theory and Practice (2006) – 1,821 user sessions (1,706 in 2012) – 2,861 pages viewed (2,765 in 2012) – 380 pages printed (all in 2012) – 8 checkouts (6 since 2008) • Date Which Will Live (2003) – 399 User Sessions (all in 2011-2012) – 494 pages viewed – 93 pages printed – 3 checkouts (1 each in 2009, 2011, 2012)
  • 23.
    Most Used pBooks •The Argumentative Turn in Policy Analysis and Planning (1993) – 37 checkouts (36 before Dec 2008) – 2 user sessions, 0 pages printed • Displacing Whiteness: Essays In Social and Cultural Criticism (1997) – 24 checkouts (22 before Dec 2008) – 3 user sessions, 0 pages printed
  • 24.
    Most Used pBooksSince 2008 • Kurosawa: Film Studies and Japanese Cinema (2000) – 19 checkouts (12 since 2008) – No eBook • The Cinema of Naruse Mikio (2008) – 11 checkouts (all since 2008) – 6 user sessions, 45 pages viewed • Vibrant Matter: A Political Ecology of Things (2010) – 11 checkouts (all since 2010) – 7 user sessions, 205 pages viewed
  • 25.
    Dual Format Availability:A Preference for Print • 1,150 titles available in both formats • Print Use – 619 titles checked out since Dec 2008 (53.8%) – 825 titles checked out (including before Dec 2008) (71.7%) • Electronic Use – 451 titles with user sessions (39.2%)
  • 26.
    Dual Format Use •394 titles used in both formats – 4,221 user sessions (2,400 without the 1,821 use title) • 10.7 per title used (6.1) – 1,524 p-book checkouts (801 before Dec 2008) • 3.9 per title used (1.8 for uses since 2008) – 54 titles with pages printed (out of 68) • 7.4 pages per title used – 68.4 pages viewed on average
  • 27.
    Dual Format Usepost-2008 • 332 titles used in both formats – 3981 user sessions (2,160 without the 1,821 use title) • 12.3 per title used (6.7) – 712 p-book checkouts • 2.2 per title used – 48 titles with pages printed (out of 68) • 8.3 pages per title used – 72.0 pages viewed on average
  • 28.
    P Used, ENot • 431 titles – 1,004 checkouts • 2.3 per title used – 297 titles with checkouts since 2008 • 479 checkouts – 1.6 per title used
  • 29.
    E Used, PNot • 57 titles • 246 user sessions – 4.3 per title • 906 pages viewed – 15.9 per title • 3 titles with pages printed
  • 30.
  • 31.
  • 32.
  • 33.
    How Closely AreP/E Usage Linked?
  • 34.
    Increased Checkouts, 2008-2012 •For titles available at the start of the project (Dec 2008), how many more checkouts were there by Dec 2012? • Was that increase linked in any way to e- usage? • Was it linked in any way to type of e-usage?
  • 35.
    Increased Checkouts 2008-2012 •686 titles with increased checkouts – Measuring titles available prior to Dec 2008 • 408 available in both formats • 235 also had e-use – 15.5 user sessions per title – 81.2 pages viewed per title
  • 36.
    Observations • Use ofE may lead to use of P • Use of P doesn’t seem to lead to use of E • If both formats are used, – they are both used at a higher rate than average – They have greater meaningful use as e-books • Pages viewed • User sessions
  • 37.
    Thoughts • If dualformat usage is higher by all measures, does this mean that people’s preference is for good content, not format? BUT • When both formats are available, print is more likely to be used (53.8% vs 39.2%). – Does e-discovery drive p-use?
  • 38.
  • 39.
    LC Class Example:B – Philosophy, Psychology, Religion • 148 titles in print (6.1% • 1.9 checkouts per title of all Duke print titles) used (all print) (+0.8) – 64 titles checked out • 2.3 checkouts per title since 2008 (7.1%) (both formats used) • 102 e-books (6.9%) (+0.5) – 49 e-books used (8.3%) • 5.1 user sessions per title • 79 titles available in (both formats used) (-1.0) both formats (6.9%) – 48 titles checked out (7.8%) – 40 e-books used (8.9%)
  • 40.
    LC Class –Best & Worst in Print (Difference Between % of Collection and % of Checkouts – post 2008) • P – Lang & Lit (n=579): • M – Music (63): +2.4% -4.6% • F – Hist of Americas • H – Soc Sci (515): -1.1% (183): +1.5% • J – Poli Sci (178): -0.8% • G – Geog, Anth, Rec • Q – Science (38): -0.7% (82): +1.3% • T – Technology (44): • E – Hist of Americas -0.5% (140): +1.2% • B – Phil, Psych, Rel (148): +1.0%
  • 41.
    LC Class –Best & Worst eBooks (Difference Between % of Collection and % of Titles with User Session) • P – Lang & Lit (n=285): • B – Phil, Psych, Rel -3.3% (102): +1.4% • Q – Science (23): -0.5% • F – Hist of Americas • M – Music (64): -0.4% (158): +0.9% • T – Technology (22): • N – Art (42): +0.7% -0.3% • D – History (107): +0.4% • K – Law (40): -0.2% • E – Hist of Americas (98): +0.4%
  • 42.
    LC Class –Best & Worst E &P (Difference Between % of Collection and % of Titles Used – Both Available) • P – Lang & Lit (n=231): • E – Hist of Americas -3.9% P, -3.9% E (81): +1.0% P, +0.9% E • Q – Science (18): -0.6% • B – Phil, Psych, Rel (79): P, -0.7% E +1.4% P, +2.0% E • T – Technology (19): • H - Soc Sci (260): +0.4% -0.5% P, -0.5% E P, 0.0% E • G – Geog, Anth, Rec • D – History (87): +0.4% (52): -0.3% P, +2.0% E P, -0.2% E
  • 43.
    Two Oddities –E&P Available • J – Poli Sci (54): +1.0% P, -0.5% E • M – Music (48): +1.3% P, -0.8% E
  • 44.
    Observations • Some ofthe subject and format differences have to do with publication date – Lots of old social science material in print • Some differences are surely local • The sample size for most LC Classes is too small to be meaningful
  • 45.
    Further Questions • Howdoes discovery play in? • What might ILL/resource sharing tell us about demand for P when E is available?
  • 46.