Databases, Markup,
and Regular Expressions
2 November 2010
Weekly reflection
•What keeps you from “being technical,” or feeling like
you are?
•Alternately, if you know you’re technic...
Tool of the week:
text editors
•AKA “programmers’ editors”
•Just the text, ma’am! No binary garbage, no WYSIWYG;
that’s no...
Tip of the week: Getting the
most out of library school
•An MLS does not guarantee you a library job. Anybody
who says it ...
The relational database
•Designed by EF Codd in the mid-1960s.
•RUNS THE WORLD. Almost every non-trivial web
application y...
Tables
•Most tables represent the “things” you’re describing.
•Some tables relate those things to each other.
book_id book...
book_id book_isbn book_barcode
1 441009328 12345_67890
2 441478123 01234_56789
3 441012248 23456_78901
BOOK PATRON
patron_...
The magic: relations!
book_id book_isbn book_barcode
1 441009328 12345_67890
2 441478123 01234_56789
3 441012248 23456_789...
My First SQL Query
•Syntax: SELECT <thing(s) you want> FROM <table(s)>
WHERE <how you know which things you want>;
• Often...
A little harder!
•Who has checked out the book with barcode 12345_67890?
•Oh no! Everything’s in different tables!
book_id...
Subqueries
•You can put whole queries in the WHERE clause!
•So. What do you want, and from which table?
• patron_lname fro...
Where what?
•Where the patron_id is associated with the right book_id in
the CHECKOUT table.
• WHERE patron_id = (SELECT p...
Where what?
•You now want the book_id from the BOOK table given the
barcode number.
• WHERE book_id = (SELECT book_id FROM...
Putting it all together
•SELECT patron_lname FROM patron WHERE patron_id =
(SELECT patron_id FROM checkout WHERE book_id =...
Markup
XML and (X)HTML
Markup
•In the dark ages of typesetting, we told text what to
look like. [ol0[ep[fy120,10,12,1]blah[ep
• Renear: “presenta...
Paragraphs and
characters
•Most WYSIWYG programs mark text this way.
• Microsoft Word: “paragraph” and “character” styles....
Nested structures
•Structures exist in texts that are bigger than paragraphs.
• A list has a beginning and end... but not ...
Extensible Markup Language
•A set of rules for delimiting text structures.
•Also a family of standards designed to work wi...
The Rules
•Thou shalt use Unicode, or else mark thy preferred encoding.
•Thou shalt put thy markup in angle brackets, clea...
More rules
•To describe a text run further, thou mayst add “attributes” (key-
value pairs) to thy start tags. Thou shalt p...
That’s pretty much it.
Those are the rules!
And if your document obeys them, it is
“well-formed.”
But wait!
Don’t different kinds of text have rules of their own?
Markup languages
•The basic rules of XML, plus constraints relating to the
type of text you’re dealing with.
• Tag and att...
Markup languages we use
•XHTML, of course!
• (the “X” is because this version of HTML uses the XML rules)
• (earlier versi...
Regular expressions
the metadata librarian’s lifesaver!
http://xkcd.com/208
Upcoming SlideShare
Loading in...5
×

Databases, Markup, and Regular Expressions

1,500

Published on

Class slidedeck for LIS 644, "Digital Trends, Tools, and Debates," at the University of Wisconsin-Madison's School of Library and Information Studies.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,500
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Databases, Markup, and Regular Expressions

  1. 1. Databases, Markup, and Regular Expressions 2 November 2010
  2. 2. Weekly reflection •What keeps you from “being technical,” or feeling like you are? •Alternately, if you know you’re technical, how did you get to be that way? •Or both! What keeps you from feeling as technical as you actually are?
  3. 3. Tool of the week: text editors •AKA “programmers’ editors” •Just the text, ma’am! No binary garbage, no WYSIWYG; that’s not what these are FOR. •Look for: • Regular expressions (“grep”) • Syntax coloring in your favorite language • Code-folding, code completion... lots of bells and whistles •Windows: UltraEdit. Mac: BBEdit (TextWrangler is OK). Cross-platform: jedit. Emacs and vi are for geeks only.
  4. 4. Tip of the week: Getting the most out of library school •An MLS does not guarantee you a library job. Anybody who says it does is lying to you. •You get out of library school what you put in. • The “extras” like workshops, talks, committees? NOT EXTRAS. • Don’t breeze through. Take the classes that mean something. • Pick your practicum carefully. • Look for champions. You’ll need those recommendations. •Get professionally involved NOW. •Take any chance to have your résumé and sample cover letter read by a professional librarian.
  5. 5. The relational database •Designed by EF Codd in the mid-1960s. •RUNS THE WORLD. Almost every non-trivial web application you’ll find has a relational DB underneath it. •Interacts with the outside world (i.e. programs) through SQL: Structured Query Language. • There is an actual SQL standard... • ... but no two databases implement it quite the same way. • The basics, however, are pretty consistent. •Taught here at SLIS. If you have any thoughts of being a techie, TAKE THAT CLASS.
  6. 6. Tables •Most tables represent the “things” you’re describing. •Some tables relate those things to each other. book_id book_isbn book_barcode 1 441009328 12345_67890 2 441478123 01234_56789 3 441012248 23456_78901 BOOK PATRON patron_id patron_lname patron_phone 1 Salo 262-5493 2 Gorman 265-5291 3 Tobias 265-6381
  7. 7. book_id book_isbn book_barcode 1 441009328 12345_67890 2 441478123 01234_56789 3 441012248 23456_78901 BOOK PATRON patron_id patron_lname patron_phone 1 Salo 262-5493 2 Gorman 265-5291 3 Tobias 265-6381 Primary key, foreign key •Every row in a table should have some kind of unique identifier within the table: PRIMARY KEY. • It is often named <thing>_id, and often just a number. •You can use a PK in other tables to refer to a row. In that other table, it is a FOREIGN KEY. •For BOOK, could I have chosen a different PK? PATRON?
  8. 8. The magic: relations! book_id book_isbn book_barcode 1 441009328 12345_67890 2 441478123 01234_56789 3 441012248 23456_78901 BOOK PATRON patron_id patron_lname patron_phone 1 Salo 262-5493 2 Gorman 265-5291 3 Tobias 265-6381 checkout_id book_id patron_id 1 2 1 2 3 1 3 1 3 CHECKOUT
  9. 9. My First SQL Query •Syntax: SELECT <thing(s) you want> FROM <table(s)> WHERE <how you know which things you want>; • Often “how you know...” is the information you’re starting with. •What’s the barcode on the book with the ISBN 441478123? • What happens if we have two copies of the book with this ISBN? book_id book_isbn book_barcode 1 441009328 12345_67890 2 441478123 01234_56789 3 441012248 23456_78901 BOOK SELECT book_barcode FROM book WHERE book_isbn = ‘441478123’;
  10. 10. A little harder! •Who has checked out the book with barcode 12345_67890? •Oh no! Everything’s in different tables! book_id book_isbn book_barcode 1 441009328 12345_67890 2 441478123 01234_56789 3 441012248 23456_78901 BOOK PATRON patron_id patron_lname patron_phone 1 Salo 262-5493 2 Gorman 265-5291 3 Tobias 265-6381 checkout_id book_id patron_id 1 2 1 2 3 1 3 1 3 CHECKOUT
  11. 11. Subqueries •You can put whole queries in the WHERE clause! •So. What do you want, and from which table? • patron_lname from the PATRON table • SELECT patron_lname FROM patron WHERE... • Or “SELECT patron_lname, patron_phone FROM patron WHERE...” book_id book_isbn book_barcode 1 441009328 12345_67890 2 441478123 01234_56789 3 441012248 23456_78901 BOOK PATRON patron_id patron_lname patron_phone 1 Salo 262-5493 2 Gorman 265-5291 3 Tobias 265-6381 checkout_id book_id patron_id 1 2 1 2 3 1 3 1 3 CHECKOUT
  12. 12. Where what? •Where the patron_id is associated with the right book_id in the CHECKOUT table. • WHERE patron_id = (SELECT patron_id FROM checkout WHERE...) book_id book_isbn book_barcode 1 441009328 12345_67890 2 441478123 01234_56789 3 441012248 23456_78901 BOOK PATRON patron_id patron_lname patron_phone 1 Salo 262-5493 2 Gorman 265-5291 3 Tobias 265-6381 checkout_id book_id patron_id 1 2 1 2 3 1 3 1 3 CHECKOUT
  13. 13. Where what? •You now want the book_id from the BOOK table given the barcode number. • WHERE book_id = (SELECT book_id FROM book WHERE book_barcode = ‘12345_67890’) book_id book_isbn book_barcode 1 441009328 12345_67890 2 441478123 01234_56789 3 441012248 23456_78901 BOOK PATRON patron_id patron_lname patron_phone 1 Salo 262-5493 2 Gorman 265-5291 3 Tobias 265-6381 checkout_id book_id patron_id 1 2 1 2 3 1 3 1 3 CHECKOUT
  14. 14. Putting it all together •SELECT patron_lname FROM patron WHERE patron_id = (SELECT patron_id FROM checkout WHERE book_id = (SELECT book_id FROM book WHERE book_barcode = ‘12345_67890’)); •Whew!
  15. 15. Markup XML and (X)HTML
  16. 16. Markup •In the dark ages of typesetting, we told text what to look like. [ol0[ep[fy120,10,12,1]blah[ep • Renear: “presentational” markup. •Lots of drawbacks to this approach! • If “what it looks like” changes, you have to change EVERY SINGLE PLACE where that particular kind of text appears. • You can’t do ANYTHING consistently across documents with different designs.
  17. 17. Paragraphs and characters •Most WYSIWYG programs mark text this way. • Microsoft Word: “paragraph” and “character” styles. •Most copyeditors still think this way, too. • “keymarking” = going through a manuscript to decide what each paragraph of text is and label it •Notice the difference! Now you can tell text what to BE. • Heading 1, Body Text, Abstract, Citation •What does that let you do? •But there’s a problem with this, too...
  18. 18. Nested structures •Structures exist in texts that are bigger than paragraphs. • A list has a beginning and end... but not within the same list item, most times! And abstracts can be >1 paragraph. • What about a section? Or a pullout? Or a chapter? • Need some hierarchy here! •WYSIWYG programs can’t do this at all, or do it very badly. Markup does it very well! •And so (leaving aside decades of development) we have XML.
  19. 19. Extensible Markup Language •A set of rules for delimiting text structures. •Also a family of standards designed to work with marked-up text structures! • DOM: Document Object Model (for programmers) • XSLT: transform one text structure to another • XPath: drill down into a text structure • ... etc.
  20. 20. The Rules •Thou shalt use Unicode, or else mark thy preferred encoding. •Thou shalt put thy markup in angle brackets, clearly marking the start and end of a text run with “tags.” • <exclamation>Hello, World!</exclamation> •To mark a point instead of a text run, thou shalt use empty tags. • <empty /> OR <empty></empty> •Thou shalt enclose thine entire document in ONE SET of tags. •Thou shalt not permit overlapping text runs; thou shalt keep thy hierarchy clean. • <exclamation>Hello, <addressee>World</addressee>!</exclamation> • <exclamation>Hello, <addressee>World!</exclamation></addressee>
  21. 21. More rules •To describe a text run further, thou mayst add “attributes” (key- value pairs) to thy start tags. Thou shalt put quote marks around the value! • <exclamation type=”greeting”>Hello, World!</exclamation> •Thou shalt neither use angle brackets nor ampersands in thy text, lest thou confuse the computer. Thou shalt refer to them thus: & as &amp;, < as &lt;, and > as &gt;. •Thou shalt always use the same case in thine tag and attribute names. • <exclamation>Hello, World!</EXCLAMATION>
  22. 22. That’s pretty much it. Those are the rules! And if your document obeys them, it is “well-formed.”
  23. 23. But wait! Don’t different kinds of text have rules of their own?
  24. 24. Markup languages •The basic rules of XML, plus constraints relating to the type of text you’re dealing with. • Tag and attribute name/value constraints • Hierarchy constraints • Required/optional constraints • Constraints on number of occurrences •These constraints are laid out in a Schema or DTD. • “Parser” checks that you’ve followed the XML rules and are “well-formed.” • “Validator” checks that you’ve followed your constraints. If you have, you are “well-formed” AND “valid.”
  25. 25. Markup languages we use •XHTML, of course! • (the “X” is because this version of HTML uses the XML rules) • (earlier versions of HTML didn’t) •MODS and METS and XMLMARC, oh my! •TEI • Text Encoding Initiative • For marking up books, manuscripts, dictionaries, etc. •EAD • Encoded Archival Description • For marking up finding aids.
  26. 26. Regular expressions the metadata librarian’s lifesaver!
  27. 27. http://xkcd.com/208
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×