SlideShare a Scribd company logo
1 of 50
Download to read offline
HathiTrust	
  and	
  HTRC:	
  the	
  changing	
  Digital	
  
Library	
  
El	
  Colegio	
  de	
  Mexico	
  |	
  20.May.14	
  
	
  
	
  
Beth	
  Plale	
  –	
  @bplale	
  	
  
Professor,	
  School	
  of	
  InformaCcs	
  and	
  CompuCng	
  
Director,	
  HathiTrust	
  Research	
  Center	
  	
  
Indiana	
  University	
  
Tweet	
  us	
  -­‐	
  @HathiTrust	
  	
  #HTRC	
  
HATHI TRUST
RESEARCH CENTER!
#HTRC	
  	
  @HathiTrust	
  
HathiTrust	
  
•  HathiTrust	
  is	
  a	
  consorBum	
  of	
  academic	
  &	
  
research	
  insBtuBons,	
  offering	
  a	
  collecBon	
  of	
  
millions	
  of	
  Btles	
  digiBzed	
  from	
  libraries	
  
around	
  the	
  world.	
  
– Founding	
  members:	
  University	
  of	
  Michigan,	
  
Indiana	
  University,	
  University	
  of	
  California,	
  and	
  
University	
  of	
  Virginia	
  
http://www.hathitrust.org/htrc	
  
http://www.hathitrust.org	
  
à	
  DisBnguished	
  
from	
  
#HTRC	
  	
  @HathiTrust	
  
Take	
  look	
  at	
  Details	
  of	
  
HathiTrust	
  CollecBon	
  	
  
#HTRC	
  	
  @HathiTrust	
  
Content	
  
•  Books	
  and	
  journals	
  
– Pilots	
  around	
  images,	
  audio,	
  born-­‐digital	
  
•  DigiBzaBon	
  sources	
  
– Google	
  (96.8%,	
  10,162,104)	
  
– Internet	
  Archive	
  (2.9%,	
  301,972)	
  
– Local	
  (0.3%,	
  31,840)	
  
#HTRC	
  	
  @HathiTrust	
  
Content	
  Sources	
  
#HTRC	
  	
  @HathiTrust	
  
Content	
  Package	
  
#HTRC	
  	
  @HathiTrust	
  
Metadata	
  
•  Bibliographic	
  
•  Structural	
  
•  Rights	
  
•  AdministraBve	
  (preservaBon)	
  
•  Holdings	
  
HathiTrust	
  	
  
Repository	
  OrganizaBon	
  
#HTRC	
  	
  @HathiTrust	
  
HathiTrust	
  Repository	
  OrganizaBon	
  
#HTRC	
  	
  @HathiTrust	
  
File	
  System	
  
#HTRC	
  	
  @HathiTrust	
  
Content	
  distribuBon	
  
#HTRC	
  	
  @HathiTrust	
  
Content	
  distribuBon	
  
Not	
  public	
  domain	
  
outside	
  available	
  
#HTRC	
  	
  @HathiTrust	
  
à HathiTrust repository is a latent
goldmine for text mining analysis,
analysis of large-scale corpi through
computational tools, and time-based
analysis
à Restricted nature of HT content
suggests need for new forms of
access that preserve intimate nature
of research investigation while
honoring restrictions
à Paradigm: computation moves to
the data (not vice versa)
#HTRC	
  	
  @HathiTrust	
  
	
  Mission	
  of	
  HT	
  Research	
  Center	
  
•  Research	
  arm	
  of	
  HathiTrust	
  	
  
•  Goal:	
  	
  enable	
  researchers	
  world-­‐wide	
  to	
  carry	
  out	
  
computaBonal	
  invesBgaBon	
  of	
  HT	
  repository	
  through	
  
–  Develop	
  model	
  for	
  access:	
  the	
  ‘workset’	
  
–  Develop	
  tools	
  that	
  facilitate	
  research	
  by	
  digital	
  humaniBes	
  
and	
  informaBcs	
  communiBes	
  
–  Develop	
  secure	
  cyberinfrastructure	
  that	
  allows	
  
computaBonal	
  invesBgaBon	
  of	
  enBre	
  copyrighted	
  and	
  
public	
  domain	
  HathiTrust	
  repository	
  
•  Established:	
  	
  July,	
  2011	
  
•  CollaboraBve	
  effort	
  of	
  Indiana	
  University	
  and	
  
University	
  of	
  Illinois	
  
	
  
	
  
HTRC	
  system	
  	
  
Complexity	
  hiding	
  interface	
  
The	
  complexity	
  
Tabular	
  info	
  
StaBsBcal	
  plots	
  
SpaBal	
  plots	
  
Request	
  
 
	
  
Complexity	
  hiding	
  interface	
  
	
  
	
  
Workset	
  builder	
  
#HTRC	
  	
  @HathiTrust	
  
HTRC	
  Timeline	
  
•  Phase	
  I:	
  	
  development	
  01	
  Jul	
  2011	
  –	
  31	
  Mar	
  2013	
  	
  	
  
–  HTRC	
  soiware	
  and	
  services	
  release	
  v1.0	
  
hjp://sourceforge.net/p/htrc/code/	
  	
  
•  Phase	
  II:	
  	
  outreach,	
  01	
  Apr	
  2013	
  -­‐	
  present	
  
–  2nd	
  HTRC	
  UnCamp	
  Sep	
  ‘13	
  
	
  
Ajendees	
  of	
  UnCamp’13	
  
#HTRC	
  	
  @HathiTrust	
  
Access	
  to	
  copyrighted	
  materials:	
  HTRC	
  
Data	
  Capsule	
  
A	
  secure	
  compuBng	
  framework	
  that:	
  
•  Trusts	
  that	
  researcher	
  will	
  not	
  deliberately	
  leak	
  repository	
  data,	
  but	
  
•  Prevents	
  malware	
  acBng	
  on	
  user's	
  behalf	
  from	
  leaking	
  data.	
  
	
  
Enforces:	
  
•  Non-­‐consumpBve	
  use:	
  	
  framework	
  provides	
  safe	
  handling	
  of	
  large	
  
volumes	
  of	
  protected	
  data	
  
•  Openness:	
  framework	
  supports	
  user-­‐contributed	
  analysis	
  tools	
  
(that	
  is,	
  not	
  limit	
  uses	
  to	
  a	
  known	
  set	
  of	
  algorithms)	
  
•  Efficiency:	
  framework	
  supports	
  user-­‐contributed	
  analysis	
  tools	
  
without	
  resorBng	
  to	
  code	
  walkthroughs	
  prior	
  to	
  acceptance	
  
•  Large-­‐scale	
  and	
  low	
  cost:	
  	
  protecBons	
  can	
  be	
  extended	
  to	
  uBlizaBon	
  
of	
  large-­‐scale	
  naBonal	
  (public)	
  supercomputers	
  
VM	
  Image	
  
Manager	
  
VM	
  Image	
  
Store	
  
VM	
  Image	
  
Builder	
  
VM	
  
Manager	
  
VM	
  
instance	
  
Secure	
  
Capsule	
  
cluster	
  
SSH	
   Research	
  
results	
  
Researcher	
  
HTRC	
  Secure	
  
Capsule	
  
Architectural	
  
Components	
  
	
  
	
  
Registry	
  	
  
Services,	
  
worksets	
  
	
  
	
  
VM	
  
Image	
  
Manager	
  
VM	
  
Image	
  
Store	
  
VM	
  
Image	
  
Builder	
  
VM	
  
Manager	
  
VM	
  
instance	
  
Upon	
  run,	
  
Secure	
  
Capsule:	
  
controls	
  I/O	
  
behind	
  
scenes	
  
SSH	
   Research	
  
results	
  
Researcher	
  
HTRC	
  Secure	
  
Capsule	
  
Architecture	
  
Researcher	
  
requests	
  	
  
new	
  VM	
  of	
  
type	
  X	
  
Researcher	
  install	
  tools	
  onto	
  
VM	
  through	
  window	
  on	
  her	
  
desktop.	
  	
  
	
  
	
  
Registry	
  	
  
Services,	
  
worksets	
  
	
  
	
  
Final	
  locaBon	
  
of	
  results	
  is	
  
registry	
  
1)	
  
2)	
  
Image	
  
instance	
  is	
  
created	
  
3)	
  
4)	
  
23	
  
HTRC	
  secure	
  data	
  capsule:	
  view	
  from	
  researcher	
  desktop	
  
EXAMPLES	
  OF	
  RESEARCH	
  CARRIED	
  OUT	
  
THROUGH	
  HATHI	
  TRUST	
  RESEARCH	
  
CENTER	
  
•  Author	
  Gender	
  IdenBficaBon	
  
•  Using	
  Topic	
  Modeling	
  to	
  Locate	
  (down	
  to	
  
sentence	
  level)	
  Philosophical	
  Arguments	
  in	
  
Science	
  Texts	
  
GENDER	
  IDENTIFICATION	
  OF	
  HTRC	
  
AUTHORS	
  BY	
  NAMES	
  
	
  
Stacy	
  Kowalczyk,	
  Asst.	
  Professor,	
  Dominican	
  University	
  
Zong	
  Peng,	
  HTRC,	
  Indiana	
  University	
  
Ref	
  talk	
  by	
  Stacy	
  Kowalczyk,	
  hjp://www.hathitrust.org/htrc_uncamp2013	
  
#HTRC	
  	
  @HathiTrust	
  
Gender	
  IdenBficaBon	
  of	
  Text	
  
•  QuesBon	
  InvesBgated:	
  Can	
  we	
  use	
  author	
  names	
  in	
  	
  
bibliographic	
  records	
  to	
  idenBfy	
  gender?	
  
•  2.6	
  million	
  bibliographic	
  records	
  
–  Extracted	
  personal	
  author	
  data	
  	
  
–  Marc	
  100	
  abcd	
  and	
  700	
  abcd	
  
•  606,437	
  unique	
  personal	
  author	
  strings	
  
•  Bibliographic	
  data	
  is	
  not	
  fielded	
  like	
  patent	
  names	
  
•  Relying	
  on	
  Standard	
  cataloging	
  pracBce	
  
–  Last	
  name,	
  first	
  name	
  middle	
  name,	
  	
  Btles/honorifics,	
  
dates	
  
Why	
  interesBng	
  to	
  HTRC?	
  Introduces	
  new	
  
source	
  of	
  metadata	
  and	
  from	
  sources	
  with	
  
varying	
  authority	
  
	
  
	
  	
  	
  	
  
	
  
Raises	
  quesBons:	
  
1)  How	
  should	
  community	
  contributed	
  metadata	
  
be	
  disBnguished	
  from	
  more	
  authoritaBve	
  
sources?	
  	
  
2)  How	
  should	
  variability	
  of	
  quality	
  even	
  within	
  a	
  
single	
  contribuBon	
  be	
  conveyed	
  to	
  community?	
  
#HTRC	
  	
  @HathiTrust	
  
Authors	
  vs	
  Names	
  
•  Methuen,	
  Algernon	
  Methuen	
  Marshall,	
  Sir	
  bart.,	
  
1856-­‐1924	
  
•  Methuem,	
  Algernon	
  	
  
•  Methuen	
  Algernon	
  	
  
•  Methuen	
  Marshall,	
  Sir,	
  bart.,	
  1856-­‐	
  	
  
•  Methuen,	
  A.	
  Sir,	
  1856-­‐1924	
  	
  
•  Methuen,	
  A.	
  Sir,	
  bart.,	
  1856-­‐1924	
  	
  
•  Methuen	
  Marshall,	
  Sir	
  bart	
  1856-­‐1924	
  	
  
•  Methuen,	
  Algernon	
  Methuen	
  Marshall,	
  Sir,	
  1856-­‐1924	
  
•  Methuen,	
  Algernon	
  Methuen	
  Marshall,	
  Sir,	
  bart.,	
  
1856-­‐1924	
  
•  Methuen,	
  Algernon,	
  1856-­‐1924	
  	
  
	
  
#HTRC	
  	
  @HathiTrust	
  
Sources	
  of	
  Data	
  
•  The	
  Virtual	
  InternaBonal	
  Authority	
  File	
  
–  Hosted	
  by	
  OCLC	
  
•  Harvested	
  names	
  from	
  mulBple	
  data	
  sources	
  
–  Census	
  bureau	
  	
  
–  Baby	
  name	
  sites	
  
•  EU	
  Patent	
  Research	
  names	
  list	
  (Frietsch	
  et	
  al,	
  2009;	
  
Naldi	
  et	
  al.	
  2005)	
  
–  Developed	
  an	
  extensive	
  list	
  of	
  European	
  names	
  
•  Titles	
  and	
  honorifics	
  
–  MulBple	
  web	
  resources	
  	
  
–  Sir,	
  Baron,	
  Count,	
  Duke,	
  Father,	
  Cardinal,	
  etc	
  
–  Lady,	
  Mrs.	
  Miss,	
  Countess,	
  Duchess,	
  Sister,	
  etc	
  
#HTRC	
  	
  @HathiTrust	
  
IniBal	
  Gender	
  Results	
  
•  Approximately	
  80%	
  of	
  name	
  strings	
  have	
  iniBal	
  
gender	
  idenBficaBon	
  
–  Female	
  
•  59,365	
  
•  10%	
  
–  Male	
  
•  425,994	
  
•  70%	
  
–  Unknown	
  
•  114,204	
  
•  19%	
  
–  Ambiguous	
  
•  5,965	
  
•  Less	
  than	
  1%	
  
#HTRC	
  	
  @HathiTrust	
  
Results	
  by	
  Data	
  Source	
  
Against	
  the	
  whole	
  set	
  of	
  name	
  strings	
  
•  VIAF	
  	
  	
  
– 19%	
  hit	
  rate	
  	
  
•  Web	
  Names	
  
– 54%	
  hit	
  rate	
  
•  Patents	
  Names	
  
– 8%	
  
	
  
Colin	
  Allen,	
  Jamie	
  Murdock	
  
CogniCve	
  Science,	
  Indiana	
  University	
  
Ref	
  talk	
  by	
  Jamie	
  Murdock,	
  hjp://www.hathitrust.org/htrc_uncamp2013	
  
The	
  InPhO	
  project	
  is	
  instrucBve	
  because	
  it	
  
demonstrates	
  an	
  interacBon	
  sequence	
  between	
  
a	
  researcher	
  and	
  his/her	
  corpus	
  that	
  is	
  
nuanced,	
  is	
  mulBstep,	
  and	
  mulB-­‐modal.	
  	
  	
  
	
  
The	
  HTRC	
  cyberinfrastructure	
  must	
  be	
  able	
  to	
  
handle	
  such	
  a	
  nuanced	
  form	
  of	
  interacBon	
  
between	
  a	
  researcher	
  and	
  their	
  texts.	
  
Digging	
  into	
  philosophy	
  of	
  science	
  
•  Establish	
  points	
  of	
  contact	
  between	
  philosophy	
  
and	
  science:	
  where	
  philosophical	
  arguments	
  on	
  
anthropomorphism	
  appear	
  in	
  science	
  texts	
  
•  Use	
  topic	
  modeling	
  to	
  idenBfy	
  the	
  volumes	
  and	
  
pages	
  within	
  these	
  volumes	
  that	
  are	
  “rich”	
  in	
  a	
  
chosen	
  topic	
  
•  Use	
  semi-­‐formal	
  discourse	
  analysis	
  technique	
  to	
  
idenBfy	
  key	
  arguments	
  in	
  selected	
  pages	
  to	
  
incrementally	
  expose	
  and	
  represent	
  argument	
  
structures	
  
The	
  How	
  
•  1315	
  volumes	
  from	
  HTRC	
  selected	
  using	
  
keyword	
  search	
  for	
  ‘darwin’,	
  ‘romanes’,	
  
‘anthropomorphism’,	
  and	
  ‘comparaBve	
  
psychology’	
  
•  Set	
  contains	
  lots	
  of	
  uninteresBng	
  books:	
  	
  e.g.,	
  
college	
  course	
  catalogs	
  
•  Apply	
  LDA	
  on	
  86	
  volume	
  subset	
  	
  
•  Using	
  iPy	
  Notebook	
  
LDA	
  topic	
  modeling	
  
•  LDA	
  (Latent	
  Dirichlet	
  Analysis)	
  uses	
  a	
  Bayesian	
  
updaBng	
  method	
  to	
  generate	
  a	
  set	
  of	
  “topics”	
  –	
  
probability	
  distribuBons	
  over	
  set	
  of	
  terms	
  in	
  a	
  corpus	
  
•  Number	
  of	
  topics	
  is	
  a	
  parameter	
  in	
  the	
  modeling	
  
technique	
  
•  Method	
  finds	
  set	
  of	
  topics	
  that	
  is	
  best	
  able	
  to	
  
reproduce	
  the	
  term	
  distribuBons	
  in	
  documents	
  
belonging	
  to	
  the	
  corpus	
  
•  Documents	
  may	
  be	
  whole	
  volumes,	
  chapters,	
  arBcles,	
  
single	
  pages,	
  even	
  individual	
  sentences	
  –	
  modeler’s	
  
choice	
  
Volume	
  level	
  topic	
  modeling	
  on	
  
‘anthropomorphism’	
  yields	
  set	
  of	
  
topics	
  
..	
  Of	
  set	
  of	
  topics,	
  choose	
  ‘16’	
  as	
  best	
  
Volumes	
  most	
  similar	
  to	
  topic	
  16	
  
Repeat	
  LDA	
  at	
  page	
  level	
  
Topic	
  model	
  at	
  page	
  level	
  for	
  topics	
  
anthropomorphism,	
  animal,	
  and	
  psychology	
  
Words	
  sorted	
  by	
  similarity	
  
Pick	
  top	
  3:	
  topics	
  16,	
  10,	
  26	
  
Show	
  documents	
  of	
  topics	
  10,	
  16,	
  26	
  
Drop	
  to	
  sentence	
  level	
  
•  Select	
  three	
  books	
  with	
  highest	
  aggregate	
  of	
  
20-­‐40	
  topic-­‐relevant	
  pages	
  for	
  more	
  precise	
  
analysis	
  
•  Manually	
  augment	
  argument	
  analysis	
  
– Remodeling	
  of	
  three	
  volumes	
  at	
  sentence	
  level	
  
– Training	
  other	
  methods	
  using	
  human	
  analysis	
  plus	
  
sentence	
  similarity	
  
Promising	
  early	
  results	
  …	
  
Scholarly	
  Commons	
  	
  
User	
  Support	
  Service	
  
•  Develop	
  training	
  materials	
  	
  
•  EducaBonal	
  workshops	
  
•  Tool	
  and	
  workset	
  creaBon	
  
•  Collaborate	
  with	
  librarians	
  and	
  
DH	
  centers	
  at	
  HT	
  insBtuBons	
  
•  Assist	
  researchers	
  in	
  HTRC	
  text	
  
data	
  mining	
  research	
  projects	
  
•  Based	
  at	
  University	
  of	
  Illinois	
  
Library	
  
	
  
47	
  
Scholarly	
  Commons	
  User	
  Support	
  
•  Gives	
  HT	
  insBtuBons	
  exclusive	
  access	
  to	
  training	
  and	
  learning	
  materials	
  
that	
  help	
  them	
  establish	
  programs	
  that	
  integrate	
  HTRC	
  tools	
  and	
  services	
  
into	
  their	
  scholarly	
  commons	
  programs	
  in	
  libraries	
  and	
  digital	
  humaniBes	
  
centers.	
  	
  	
  
•  Physically	
  located	
  on	
  the	
  University	
  of	
  Illinois	
  Library’s	
  Scholarly	
  commons.	
  	
  	
  
•  Supported	
  by	
  several	
  Library	
  staff	
  and	
  faculty.	
  	
  Key	
  among	
  these	
  is	
  the	
  
Digital	
  Humani,es	
  Research	
  Specialist	
  who	
  will	
  assist	
  with	
  the	
  
development	
  of	
  training	
  and	
  outreach	
  iniBaBves	
  in	
  support	
  of	
  researchers	
  
working	
  with	
  the	
  Hathi	
  Trust	
  Research	
  Center	
  and	
  HathiTrust	
  digital	
  library	
  
affiliates	
  who	
  seek	
  to	
  start	
  their	
  own	
  HTRC	
  research	
  services.	
  	
  
•  Effort	
  involves	
  planning,	
  implementaBon	
  and	
  conBnuous	
  development	
  of	
  
training	
  materials,	
  educaBonal	
  workshops,	
  and	
  potenBal	
  tools,	
  and	
  
outreach	
  acBviBes	
  in	
  support	
  of	
  the	
  usage	
  of	
  HTRC	
  tools	
  and	
  datasets.	
  
Thanks	
  to	
  sponsors	
  
#HTRC	
  	
  @HathiTrust	
  
http://www.hathitrust.org/htrc	
  
http://www.hathitrust.org	
  

More Related Content

What's hot

DMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionDMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionPier Luca Lanzi
 
Research Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesResearch Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesCelia Emmelhainz
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningPier Luca Lanzi
 
Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012University of South Australlia
 
Research into Practice case study 2: Library linked data implementations an...
	Research into Practice case study 2:  Library linked data implementations an...	Research into Practice case study 2:  Library linked data implementations an...
Research into Practice case study 2: Library linked data implementations an...Hazel Hall
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesChristophe Guéret
 
Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Martin Donnelly
 
Research Data Management at the University of Edinburgh
Research Data Management at the University of EdinburghResearch Data Management at the University of Edinburgh
Research Data Management at the University of EdinburghEDINA, University of Edinburgh
 
Big Data and ContentMining for Libraries
Big Data and ContentMining for LibrariesBig Data and ContentMining for Libraries
Big Data and ContentMining for Librariespetermurrayrust
 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityJames Hendler
 
Principles and practice of Open Science
Principles and practice of Open SciencePrinciples and practice of Open Science
Principles and practice of Open Sciencepetermurrayrust
 
ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017petermurrayrust
 
From Theory to Practice: Can Opennesss Improve the Quality of OER Research?
From Theory to Practice: Can Opennesss Improve the Quality of OER Research? From Theory to Practice: Can Opennesss Improve the Quality of OER Research?
From Theory to Practice: Can Opennesss Improve the Quality of OER Research? Beck Pitt
 
From Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental DataFrom Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental DataDongpo Deng
 
Tragedy of the (Data) Commons
Tragedy of the (Data) CommonsTragedy of the (Data) Commons
Tragedy of the (Data) CommonsJames Hendler
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionUniversity of Washington
 
Search, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving DataSearch, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving DataNattiya Kanhabua
 
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...Micah Altman
 

What's hot (20)

DMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionDMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course Introduction
 
Research Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesResearch Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social Sciences
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 
DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data Mining
 
Urban Data Science at UW
Urban Data Science at UWUrban Data Science at UW
Urban Data Science at UW
 
Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012
 
Research into Practice case study 2: Library linked data implementations an...
	Research into Practice case study 2:  Library linked data implementations an...	Research into Practice case study 2:  Library linked data implementations an...
Research into Practice case study 2: Library linked data implementations an...
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital Humanities
 
Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms:
 
Research Data Management at the University of Edinburgh
Research Data Management at the University of EdinburghResearch Data Management at the University of Edinburgh
Research Data Management at the University of Edinburgh
 
Big Data and ContentMining for Libraries
Big Data and ContentMining for LibrariesBig Data and ContentMining for Libraries
Big Data and ContentMining for Libraries
 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/Interoperability
 
Principles and practice of Open Science
Principles and practice of Open SciencePrinciples and practice of Open Science
Principles and practice of Open Science
 
ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017
 
From Theory to Practice: Can Opennesss Improve the Quality of OER Research?
From Theory to Practice: Can Opennesss Improve the Quality of OER Research? From Theory to Practice: Can Opennesss Improve the Quality of OER Research?
From Theory to Practice: Can Opennesss Improve the Quality of OER Research?
 
From Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental DataFrom Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental Data
 
Tragedy of the (Data) Commons
Tragedy of the (Data) CommonsTragedy of the (Data) Commons
Tragedy of the (Data) Commons
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data Interaction
 
Search, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving DataSearch, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving Data
 
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...
 

Similar to Plale HathiTrust El Colegio de Mexico May2014

HathiTrust Reserach Center Nov2013
HathiTrust Reserach Center Nov2013HathiTrust Reserach Center Nov2013
HathiTrust Reserach Center Nov2013Beth Plale
 
HathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsHathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsBeth Plale
 
CST4599 July 2020
CST4599 July 2020 CST4599 July 2020
CST4599 July 2020 EISLibrarian
 
Research Data Publishing
Research Data PublishingResearch Data Publishing
Research Data PublishingBrian Hole
 
THe HathiTrust Research Center: Digital Humanities at Scale
THe HathiTrust Research Center: Digital Humanities at ScaleTHe HathiTrust Research Center: Digital Humanities at Scale
THe HathiTrust Research Center: Digital Humanities at ScaleRobert H. McDonald
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceGlobus
 
Building a Public Research Center for the HathiTrust Digital Library
Building a Public Research Center for the HathiTrust Digital LibraryBuilding a Public Research Center for the HathiTrust Digital Library
Building a Public Research Center for the HathiTrust Digital LibraryRobert H. McDonald
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterRobert H. McDonald
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and TechniquesBernhard Haslhofer
 
JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesRobert H. McDonald
 
From Open Access to Open Data
From Open Access to Open DataFrom Open Access to Open Data
From Open Access to Open DataBrian Hole
 
The HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoThe HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoRobert H. McDonald
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsJon Voss
 
Metadata Ownership & Metadata Rights
Metadata Ownership & Metadata RightsMetadata Ownership & Metadata Rights
Metadata Ownership & Metadata RightsChelcie Rowell
 
HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14Robert H. McDonald
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 Scott Edmunds
 
The HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesThe HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesRobert H. McDonald
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎Libcorpio
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Robert H. McDonald
 

Similar to Plale HathiTrust El Colegio de Mexico May2014 (20)

HathiTrust Reserach Center Nov2013
HathiTrust Reserach Center Nov2013HathiTrust Reserach Center Nov2013
HathiTrust Reserach Center Nov2013
 
HathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsHathiTrust Research Center Secure Commons
HathiTrust Research Center Secure Commons
 
CST4599 July 2020
CST4599 July 2020 CST4599 July 2020
CST4599 July 2020
 
Research Data Publishing
Research Data PublishingResearch Data Publishing
Research Data Publishing
 
THe HathiTrust Research Center: Digital Humanities at Scale
THe HathiTrust Research Center: Digital Humanities at ScaleTHe HathiTrust Research Center: Digital Humanities at Scale
THe HathiTrust Research Center: Digital Humanities at Scale
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials Science
 
Building a Public Research Center for the HathiTrust Digital Library
Building a Public Research Center for the HathiTrust Digital LibraryBuilding a Public Research Center for the HathiTrust Digital Library
Building a Public Research Center for the HathiTrust Digital Library
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
 
JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening Slides
 
From Open Access to Open Data
From Open Access to Open DataFrom Open Access to Open Data
From Open Access to Open Data
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
The HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoThe HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and Demo
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
Metadata Ownership & Metadata Rights
Metadata Ownership & Metadata RightsMetadata Ownership & Metadata Rights
Metadata Ownership & Metadata Rights
 
HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
The HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesThe HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational Services
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
 

More from Beth Plale

Trustworthy AI and Open Science
Trustworthy AI and Open ScienceTrustworthy AI and Open Science
Trustworthy AI and Open ScienceBeth Plale
 
Open science as roadmap to better data science research
Open science as roadmap to better data science researchOpen science as roadmap to better data science research
Open science as roadmap to better data science researchBeth Plale
 
Capsule Computing: Safe Open Science
Capsule Computing: Safe Open Science Capsule Computing: Safe Open Science
Capsule Computing: Safe Open Science Beth Plale
 
Towards FAIR Open Science with PID Kernel Information: RPID Testbed
Towards FAIR Open Science with PID Kernel Information: RPID TestbedTowards FAIR Open Science with PID Kernel Information: RPID Testbed
Towards FAIR Open Science with PID Kernel Information: RPID TestbedBeth Plale
 
Trust threads : Active Curation and Publishing in SEAD
Trust threads : Active Curation and Publishing in SEADTrust threads : Active Curation and Publishing in SEAD
Trust threads : Active Curation and Publishing in SEADBeth Plale
 
Trust threads: Provenance for Data Reuse in Long Tail Science
Trust threads: Provenance for Data Reuse in Long Tail ScienceTrust threads: Provenance for Data Reuse in Long Tail Science
Trust threads: Provenance for Data Reuse in Long Tail ScienceBeth Plale
 
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsCase Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsBeth Plale
 
Big data and open access: a collision course for science
Big data and open access: a collision course for scienceBig data and open access: a collision course for science
Big data and open access: a collision course for scienceBeth Plale
 

More from Beth Plale (8)

Trustworthy AI and Open Science
Trustworthy AI and Open ScienceTrustworthy AI and Open Science
Trustworthy AI and Open Science
 
Open science as roadmap to better data science research
Open science as roadmap to better data science researchOpen science as roadmap to better data science research
Open science as roadmap to better data science research
 
Capsule Computing: Safe Open Science
Capsule Computing: Safe Open Science Capsule Computing: Safe Open Science
Capsule Computing: Safe Open Science
 
Towards FAIR Open Science with PID Kernel Information: RPID Testbed
Towards FAIR Open Science with PID Kernel Information: RPID TestbedTowards FAIR Open Science with PID Kernel Information: RPID Testbed
Towards FAIR Open Science with PID Kernel Information: RPID Testbed
 
Trust threads : Active Curation and Publishing in SEAD
Trust threads : Active Curation and Publishing in SEADTrust threads : Active Curation and Publishing in SEAD
Trust threads : Active Curation and Publishing in SEAD
 
Trust threads: Provenance for Data Reuse in Long Tail Science
Trust threads: Provenance for Data Reuse in Long Tail ScienceTrust threads: Provenance for Data Reuse in Long Tail Science
Trust threads: Provenance for Data Reuse in Long Tail Science
 
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsCase Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
 
Big data and open access: a collision course for science
Big data and open access: a collision course for scienceBig data and open access: a collision course for science
Big data and open access: a collision course for science
 

Recently uploaded

obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di Ban...
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di  Ban...obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di  Ban...
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di Ban...siskavia95
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontangsiskavia95
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"John Sobanski
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444saurabvyas476
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunksgmuir1066
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Voces Mineras
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...ssuserf63bd7
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...mikehavy0
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives23050636
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样jk0tkvfv
 
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...Obat Aborsi 088980685493 Jual Obat Aborsi
 

Recently uploaded (20)

obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di Ban...
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di  Ban...obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di  Ban...
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di Ban...
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...
 

Plale HathiTrust El Colegio de Mexico May2014

  • 1. HathiTrust  and  HTRC:  the  changing  Digital   Library   El  Colegio  de  Mexico  |  20.May.14       Beth  Plale  –  @bplale     Professor,  School  of  InformaCcs  and  CompuCng   Director,  HathiTrust  Research  Center     Indiana  University   Tweet  us  -­‐  @HathiTrust    #HTRC   HATHI TRUST RESEARCH CENTER!
  • 2. #HTRC    @HathiTrust   HathiTrust   •  HathiTrust  is  a  consorBum  of  academic  &   research  insBtuBons,  offering  a  collecBon  of   millions  of  Btles  digiBzed  from  libraries   around  the  world.   – Founding  members:  University  of  Michigan,   Indiana  University,  University  of  California,  and   University  of  Virginia   http://www.hathitrust.org/htrc   http://www.hathitrust.org   à  DisBnguished   from  
  • 4. Take  look  at  Details  of   HathiTrust  CollecBon    
  • 5. #HTRC    @HathiTrust   Content   •  Books  and  journals   – Pilots  around  images,  audio,  born-­‐digital   •  DigiBzaBon  sources   – Google  (96.8%,  10,162,104)   – Internet  Archive  (2.9%,  301,972)   – Local  (0.3%,  31,840)  
  • 6. #HTRC    @HathiTrust   Content  Sources  
  • 7. #HTRC    @HathiTrust   Content  Package  
  • 8. #HTRC    @HathiTrust   Metadata   •  Bibliographic   •  Structural   •  Rights   •  AdministraBve  (preservaBon)   •  Holdings  
  • 9. HathiTrust     Repository  OrganizaBon  
  • 10. #HTRC    @HathiTrust   HathiTrust  Repository  OrganizaBon  
  • 11. #HTRC    @HathiTrust   File  System  
  • 12. #HTRC    @HathiTrust   Content  distribuBon  
  • 13. #HTRC    @HathiTrust   Content  distribuBon   Not  public  domain   outside  available  
  • 14. #HTRC    @HathiTrust   à HathiTrust repository is a latent goldmine for text mining analysis, analysis of large-scale corpi through computational tools, and time-based analysis à Restricted nature of HT content suggests need for new forms of access that preserve intimate nature of research investigation while honoring restrictions à Paradigm: computation moves to the data (not vice versa)
  • 15. #HTRC    @HathiTrust    Mission  of  HT  Research  Center   •  Research  arm  of  HathiTrust     •  Goal:    enable  researchers  world-­‐wide  to  carry  out   computaBonal  invesBgaBon  of  HT  repository  through   –  Develop  model  for  access:  the  ‘workset’   –  Develop  tools  that  facilitate  research  by  digital  humaniBes   and  informaBcs  communiBes   –  Develop  secure  cyberinfrastructure  that  allows   computaBonal  invesBgaBon  of  enBre  copyrighted  and   public  domain  HathiTrust  repository   •  Established:    July,  2011   •  CollaboraBve  effort  of  Indiana  University  and   University  of  Illinois      
  • 16. HTRC  system     Complexity  hiding  interface   The  complexity   Tabular  info   StaBsBcal  plots   SpaBal  plots   Request  
  • 17.     Complexity  hiding  interface      
  • 19. #HTRC    @HathiTrust   HTRC  Timeline   •  Phase  I:    development  01  Jul  2011  –  31  Mar  2013       –  HTRC  soiware  and  services  release  v1.0   hjp://sourceforge.net/p/htrc/code/     •  Phase  II:    outreach,  01  Apr  2013  -­‐  present   –  2nd  HTRC  UnCamp  Sep  ‘13     Ajendees  of  UnCamp’13  
  • 20. #HTRC    @HathiTrust   Access  to  copyrighted  materials:  HTRC   Data  Capsule   A  secure  compuBng  framework  that:   •  Trusts  that  researcher  will  not  deliberately  leak  repository  data,  but   •  Prevents  malware  acBng  on  user's  behalf  from  leaking  data.     Enforces:   •  Non-­‐consumpBve  use:    framework  provides  safe  handling  of  large   volumes  of  protected  data   •  Openness:  framework  supports  user-­‐contributed  analysis  tools   (that  is,  not  limit  uses  to  a  known  set  of  algorithms)   •  Efficiency:  framework  supports  user-­‐contributed  analysis  tools   without  resorBng  to  code  walkthroughs  prior  to  acceptance   •  Large-­‐scale  and  low  cost:    protecBons  can  be  extended  to  uBlizaBon   of  large-­‐scale  naBonal  (public)  supercomputers  
  • 21. VM  Image   Manager   VM  Image   Store   VM  Image   Builder   VM   Manager   VM   instance   Secure   Capsule   cluster   SSH   Research   results   Researcher   HTRC  Secure   Capsule   Architectural   Components       Registry     Services,   worksets      
  • 22. VM   Image   Manager   VM   Image   Store   VM   Image   Builder   VM   Manager   VM   instance   Upon  run,   Secure   Capsule:   controls  I/O   behind   scenes   SSH   Research   results   Researcher   HTRC  Secure   Capsule   Architecture   Researcher   requests     new  VM  of   type  X   Researcher  install  tools  onto   VM  through  window  on  her   desktop.         Registry     Services,   worksets       Final  locaBon   of  results  is   registry   1)   2)   Image   instance  is   created   3)   4)  
  • 23. 23   HTRC  secure  data  capsule:  view  from  researcher  desktop  
  • 24. EXAMPLES  OF  RESEARCH  CARRIED  OUT   THROUGH  HATHI  TRUST  RESEARCH   CENTER   •  Author  Gender  IdenBficaBon   •  Using  Topic  Modeling  to  Locate  (down  to   sentence  level)  Philosophical  Arguments  in   Science  Texts  
  • 25. GENDER  IDENTIFICATION  OF  HTRC   AUTHORS  BY  NAMES     Stacy  Kowalczyk,  Asst.  Professor,  Dominican  University   Zong  Peng,  HTRC,  Indiana  University   Ref  talk  by  Stacy  Kowalczyk,  hjp://www.hathitrust.org/htrc_uncamp2013  
  • 26. #HTRC    @HathiTrust   Gender  IdenBficaBon  of  Text   •  QuesBon  InvesBgated:  Can  we  use  author  names  in     bibliographic  records  to  idenBfy  gender?   •  2.6  million  bibliographic  records   –  Extracted  personal  author  data     –  Marc  100  abcd  and  700  abcd   •  606,437  unique  personal  author  strings   •  Bibliographic  data  is  not  fielded  like  patent  names   •  Relying  on  Standard  cataloging  pracBce   –  Last  name,  first  name  middle  name,    Btles/honorifics,   dates  
  • 27. Why  interesBng  to  HTRC?  Introduces  new   source  of  metadata  and  from  sources  with   varying  authority               Raises  quesBons:   1)  How  should  community  contributed  metadata   be  disBnguished  from  more  authoritaBve   sources?     2)  How  should  variability  of  quality  even  within  a   single  contribuBon  be  conveyed  to  community?  
  • 28. #HTRC    @HathiTrust   Authors  vs  Names   •  Methuen,  Algernon  Methuen  Marshall,  Sir  bart.,   1856-­‐1924   •  Methuem,  Algernon     •  Methuen  Algernon     •  Methuen  Marshall,  Sir,  bart.,  1856-­‐     •  Methuen,  A.  Sir,  1856-­‐1924     •  Methuen,  A.  Sir,  bart.,  1856-­‐1924     •  Methuen  Marshall,  Sir  bart  1856-­‐1924     •  Methuen,  Algernon  Methuen  Marshall,  Sir,  1856-­‐1924   •  Methuen,  Algernon  Methuen  Marshall,  Sir,  bart.,   1856-­‐1924   •  Methuen,  Algernon,  1856-­‐1924      
  • 29. #HTRC    @HathiTrust   Sources  of  Data   •  The  Virtual  InternaBonal  Authority  File   –  Hosted  by  OCLC   •  Harvested  names  from  mulBple  data  sources   –  Census  bureau     –  Baby  name  sites   •  EU  Patent  Research  names  list  (Frietsch  et  al,  2009;   Naldi  et  al.  2005)   –  Developed  an  extensive  list  of  European  names   •  Titles  and  honorifics   –  MulBple  web  resources     –  Sir,  Baron,  Count,  Duke,  Father,  Cardinal,  etc   –  Lady,  Mrs.  Miss,  Countess,  Duchess,  Sister,  etc  
  • 30. #HTRC    @HathiTrust   IniBal  Gender  Results   •  Approximately  80%  of  name  strings  have  iniBal   gender  idenBficaBon   –  Female   •  59,365   •  10%   –  Male   •  425,994   •  70%   –  Unknown   •  114,204   •  19%   –  Ambiguous   •  5,965   •  Less  than  1%  
  • 31. #HTRC    @HathiTrust   Results  by  Data  Source   Against  the  whole  set  of  name  strings   •  VIAF       – 19%  hit  rate     •  Web  Names   – 54%  hit  rate   •  Patents  Names   – 8%    
  • 32. Colin  Allen,  Jamie  Murdock   CogniCve  Science,  Indiana  University   Ref  talk  by  Jamie  Murdock,  hjp://www.hathitrust.org/htrc_uncamp2013  
  • 33. The  InPhO  project  is  instrucBve  because  it   demonstrates  an  interacBon  sequence  between   a  researcher  and  his/her  corpus  that  is   nuanced,  is  mulBstep,  and  mulB-­‐modal.         The  HTRC  cyberinfrastructure  must  be  able  to   handle  such  a  nuanced  form  of  interacBon   between  a  researcher  and  their  texts.  
  • 34. Digging  into  philosophy  of  science   •  Establish  points  of  contact  between  philosophy   and  science:  where  philosophical  arguments  on   anthropomorphism  appear  in  science  texts   •  Use  topic  modeling  to  idenBfy  the  volumes  and   pages  within  these  volumes  that  are  “rich”  in  a   chosen  topic   •  Use  semi-­‐formal  discourse  analysis  technique  to   idenBfy  key  arguments  in  selected  pages  to   incrementally  expose  and  represent  argument   structures  
  • 35. The  How   •  1315  volumes  from  HTRC  selected  using   keyword  search  for  ‘darwin’,  ‘romanes’,   ‘anthropomorphism’,  and  ‘comparaBve   psychology’   •  Set  contains  lots  of  uninteresBng  books:    e.g.,   college  course  catalogs   •  Apply  LDA  on  86  volume  subset     •  Using  iPy  Notebook  
  • 36. LDA  topic  modeling   •  LDA  (Latent  Dirichlet  Analysis)  uses  a  Bayesian   updaBng  method  to  generate  a  set  of  “topics”  –   probability  distribuBons  over  set  of  terms  in  a  corpus   •  Number  of  topics  is  a  parameter  in  the  modeling   technique   •  Method  finds  set  of  topics  that  is  best  able  to   reproduce  the  term  distribuBons  in  documents   belonging  to  the  corpus   •  Documents  may  be  whole  volumes,  chapters,  arBcles,   single  pages,  even  individual  sentences  –  modeler’s   choice  
  • 37. Volume  level  topic  modeling  on   ‘anthropomorphism’  yields  set  of   topics  
  • 38. ..  Of  set  of  topics,  choose  ‘16’  as  best  
  • 39. Volumes  most  similar  to  topic  16  
  • 40. Repeat  LDA  at  page  level  
  • 41. Topic  model  at  page  level  for  topics   anthropomorphism,  animal,  and  psychology  
  • 42. Words  sorted  by  similarity  
  • 43. Pick  top  3:  topics  16,  10,  26  
  • 44. Show  documents  of  topics  10,  16,  26  
  • 45. Drop  to  sentence  level   •  Select  three  books  with  highest  aggregate  of   20-­‐40  topic-­‐relevant  pages  for  more  precise   analysis   •  Manually  augment  argument  analysis   – Remodeling  of  three  volumes  at  sentence  level   – Training  other  methods  using  human  analysis  plus   sentence  similarity  
  • 47. Scholarly  Commons     User  Support  Service   •  Develop  training  materials     •  EducaBonal  workshops   •  Tool  and  workset  creaBon   •  Collaborate  with  librarians  and   DH  centers  at  HT  insBtuBons   •  Assist  researchers  in  HTRC  text   data  mining  research  projects   •  Based  at  University  of  Illinois   Library     47  
  • 48. Scholarly  Commons  User  Support   •  Gives  HT  insBtuBons  exclusive  access  to  training  and  learning  materials   that  help  them  establish  programs  that  integrate  HTRC  tools  and  services   into  their  scholarly  commons  programs  in  libraries  and  digital  humaniBes   centers.       •  Physically  located  on  the  University  of  Illinois  Library’s  Scholarly  commons.       •  Supported  by  several  Library  staff  and  faculty.    Key  among  these  is  the   Digital  Humani,es  Research  Specialist  who  will  assist  with  the   development  of  training  and  outreach  iniBaBves  in  support  of  researchers   working  with  the  Hathi  Trust  Research  Center  and  HathiTrust  digital  library   affiliates  who  seek  to  start  their  own  HTRC  research  services.     •  Effort  involves  planning,  implementaBon  and  conBnuous  development  of   training  materials,  educaBonal  workshops,  and  potenBal  tools,  and   outreach  acBviBes  in  support  of  the  usage  of  HTRC  tools  and  datasets.  
  • 50. #HTRC    @HathiTrust   http://www.hathitrust.org/htrc   http://www.hathitrust.org