Text and Data Mining:
what librarians
need to know
EIFL-Licensing/EIFL-IP webinar, 6 February 2014

www.bl.uk

1
Ben White
Ben O’Steen

British Library
• Lorem ipsum dolor sit amet, consectetur adipiscing elit

• Ut tristique lectus a massa tristique accumsan
• Integer cong...
How Much Data is there?
2013
1.8 zetabytes?
And 80% is unstructured.

www.bl.uk

4
• Lorem ipsum dolor sit amet,
consectetur adipiscing elit
• Ut tristique lectus a massa
tristique accumsan
• Integer congu...
Learning and Research
• For millennia learning has been based on people reading;

• Taking notes;
• Extracting facts and d...
Pre mid 1990s = pen, pencil and eyes
.

www.bl.uk

7
Computers can now read

© Woodguy
www.bl.uk

8
And a lot faster than humans

www.bl.uk

9
How to Do Research in 2013?

Post mid 1990s = pen, pencil, eyes AND computers.

Are off the shelf text and data mining too...
What is Text and Data Mining?
(NOT search by a search engine)

Algorithms are “intelligently” analysing and reading the te...
Text Mining Shakespeare

www.bl.uk

12
What is Text and Data Mining?
This allows for example people to:

i) See if there is some kind of relationship between a
c...
TDM & Libraries
Libraries important as they provide access to scholarly
information.
A lot of text and data on the web but...
Text and Data Mining – Big Business

Video Time!

(hopefully)
http://www.youtube.com/watch?v=2YQNQ_GLe9Q

www.bl.uk

15
Savings in the Health Sector

www.bl.uk

16
www.bl.uk

17
New Medical Discoveries

www.bl.uk

18
Reduces Reading Times Exponentially

www.bl.uk

19
Not Just Computer Scientists Either

© South Wiltshire Girls School
www.bl.uk

20
The Right to Read is the Right to Mine?
• Facts and data not subject to copyright and database rights
• But computers have...
The Right to Read is the Right to Mine?
• How would you license the internet?
• UKPMC – 75 publishers had articles with th...
Thank you

(unless indicated otherwise)

www.bl.uk

23
Now it’s
question time!
Further information
• Find out more about the EIFL-Licensing
programme

– www.eifl.net/licensing
• Find out more about the...
Stay connected
• Visit our website - www.EIFL.net
• Subscribe to our newsletter www.EIFL.net/subscribe

• Join email lists...
Upcoming SlideShare
Loading in …5
×

Text and Data Mining: what librarians need to know

829 views
621 views

Published on

Text and data mining of large datasets is often described as the new frontier for science and research.

This presentation is from a webinar hosted by the EIFL-Licensing Programme and the EIFL-IP (Copyright and Libraries) Programme on February 6, 2014 which can be found here: http://bit.ly/1iwr4io

In the webinar Benjamin White (Head of Intellectual Property at the British Library) provided a clear introduction to what text and data mining is, and how it differs from other methods of information retrieval

About EIFL

Working in collaboration with libraries in more than 60 developing and transition countries in Africa, Asia, Europe, and Latin America, EIFL enables access to knowledge for education, learning, research and sustainable community development. Visit eifl.net to learn more.

Connect to EIFL on:

Facebook - facebook.com/eIFL.net
Twitter - twitter.com/EIFLnet
LinkedIn - linkedin.com/groups/Friends-EIFL-1862455
Google+ - plus.google.com/+EiflNet/posts

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
829
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Text and Data Mining: what librarians need to know

  1. 1. Text and Data Mining: what librarians need to know EIFL-Licensing/EIFL-IP webinar, 6 February 2014 www.bl.uk 1
  2. 2. Ben White Ben O’Steen British Library
  3. 3. • Lorem ipsum dolor sit amet, consectetur adipiscing elit • Ut tristique lectus a massa tristique accumsan • Integer congue felis nec purus condimentum ultricies • Donec volutpat diam nec sapien lobortis malesuada • Morbi in dolor in lorem faucibus semper www.bl.uk 3
  4. 4. How Much Data is there? 2013 1.8 zetabytes? And 80% is unstructured. www.bl.uk 4
  5. 5. • Lorem ipsum dolor sit amet, consectetur adipiscing elit • Ut tristique lectus a massa tristique accumsan • Integer congue felis nec purus condimentum ultricies • Donec volutpat diam nec sapien lobortis malesuada • Morbi in dolor in loremfaucibus www.bl.uk 5
  6. 6. Learning and Research • For millennia learning has been based on people reading; • Taking notes; • Extracting facts and data; and • Organising information. www.bl.uk 6
  7. 7. Pre mid 1990s = pen, pencil and eyes . www.bl.uk 7
  8. 8. Computers can now read © Woodguy www.bl.uk 8
  9. 9. And a lot faster than humans www.bl.uk 9
  10. 10. How to Do Research in 2013? Post mid 1990s = pen, pencil, eyes AND computers. Are off the shelf text and data mining tools from software providers, but researchers write their own programmes too. www.bl.uk 10
  11. 11. What is Text and Data Mining? (NOT search by a search engine) Algorithms are “intelligently” analysing and reading the text / data (using statistics, probabilities, computational linguistics etc) to do amongst other things: i) Make assumptions what text strings are about - (e.g. Is the “tree” a piece of wood, a family tree, the tree of life (biology)?); ii) Analyse what the entire text is about; iii) See if there is a +ve or –ve relationship between two preselected variables. www.bl.uk 11
  12. 12. Text Mining Shakespeare www.bl.uk 12
  13. 13. What is Text and Data Mining? This allows for example people to: i) See if there is some kind of relationship between a chemical / enzyme etc and a medical disease; ii) Discover some previously undiscovered use for a drug or a chemical compound; iii) Allow organisations to organise electronic data by subject category etc. www.bl.uk 13
  14. 14. TDM & Libraries Libraries important as they provide access to scholarly information. A lot of text and data on the web but also very valuable content in books and journals. People want to hold the data locally and work on it using their own tools. www.bl.uk 14
  15. 15. Text and Data Mining – Big Business Video Time! (hopefully) http://www.youtube.com/watch?v=2YQNQ_GLe9Q www.bl.uk 15
  16. 16. Savings in the Health Sector www.bl.uk 16
  17. 17. www.bl.uk 17
  18. 18. New Medical Discoveries www.bl.uk 18
  19. 19. Reduces Reading Times Exponentially www.bl.uk 19
  20. 20. Not Just Computer Scientists Either © South Wiltshire Girls School www.bl.uk 20
  21. 21. The Right to Read is the Right to Mine? • Facts and data not subject to copyright and database rights • But computers have to copy in order to mine the data – so is it a licensable activity? (EU has an “internet browser” exception as browsers cache …) • European Union Commission stakeholder dialogue on TDM / “Licences for Europe” – Research / Library, Technology Sector and Open Access Publishers boycotted. www.bl.uk 21
  22. 22. The Right to Read is the Right to Mine? • How would you license the internet? • UKPMC – 75 publishers had articles with the word “malaria” in the title. BL’s estimate that from experience of negotiating a new licence it takes 16 months on average. • TDM goes across thousands / tens of thousands of articles which you ALREADY have legal access to. How can you renegotiate this with all publishers concerned? • UK universities experiencing server access being suspended automatically when abnormal access is being detected. www.bl.uk 22
  23. 23. Thank you (unless indicated otherwise) www.bl.uk 23
  24. 24. Now it’s question time!
  25. 25. Further information • Find out more about the EIFL-Licensing programme – www.eifl.net/licensing • Find out more about the EIFL-IP programme – www.eifl.net/copyright
  26. 26. Stay connected • Visit our website - www.EIFL.net • Subscribe to our newsletter www.EIFL.net/subscribe • Join email lists for EIFL programmes • facebook.com/EIFLnet • twitter.com/EIFLnet • www.flickr.com/photos/EIFL

×