Tech 802: Data, Databases & XML


Published on

Monday, January 14, 2012 presentation on 3 different data types (unstructured, structured and semi-structured) and how xml plays a role in content management systems, onix (bibliographic data sharing), RSS (real simple syndication) and xml-first publishing for ebooks.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Tech 802: Data, Databases & XML

  1. 1. Data,  Databases  &  XML A  Crash  Course.    Monique  Sherre8monique@boxcarmarke>
  2. 2. 3  Types  of  DataUnstructured  Data• eg.  Word  documents,  PDFs,  audio/video  files,  emails,  • No  search• No  version  controlStructured  Data• eg.  Inventory  management  database,  wordpress• Searchable• Version  and  user  control  (secure  access)• Rela>onship  structures  (show  everything  tagged  “winter”)• Import  /  Export• Display  op>ons• Machine  readable;  run  queries  against  the  dataSemi-­‐Structured  Data• eg.  xml  (html,  onix,  rss)  • formal/standardized  data   2
  3. 3. Structured  Data:  Wordpress• Open  Source  content  management  system  based  on  PHP  and  MySQL – Open  Source:  source  code  is  freely  available,  which  encourages  development   by  many  independent  programmers.   – CMS:  a  database  +  presenta>on  layer  (set  of  templates) – MySQL:  a  type  of  database – PHP:  a  scrip>ng  language  designed  to  produce  dynamic  web  pages• Plugin  architecture  (Akismet  for  spam,  SEO  by  Yoast,  WP  to  Twi8er,  etc.)• Pages  &  Posts• Categories  &  Tags 3
  4. 4. Pages  vs  PostsPage  (~unstructured)• Sta>c  content,  won’t  change  frequently• eg.  About  page• Can  be  organized  manually  a  hierarchy.   Page  (parent)  and  subpages  (child) – About  Us  >  Team;  About  Us  >  HistoryPost  (~structured)• Frequently  updated  content  dynamically  organized  in  a  hierarchy  (chronological,   category),  plus  archive – News  ar>cles,  Event  informa>on – Frequently  published  in  an  RSS  feed  that  is  subscribed  to  by  users 4
  5. 5. Semi-­‐Structured  Data:  RSS• Real  Simple  Syndica>on  or  Rich  Site  Summary• Publish  it.  Subscribe  to  it.  Pull  it  into  other  websites.  • RSS  is  a  standardized  XML  file  format. 5
  6. 6. WordPress  As  Database• Instead  of  a  series  of  HTML  files,  WordPress  offers  a  system  that  allows  for  the   organiza>on  and  efficient  storage  &  retrieval  of  informa>on. – Structured  data  can  be  exported  into  semi-­‐structured  data  (RSS,  XML) 6
  7. 7. RSS  is  XML• eXtensible  Markup  Language  (XML)  is  a  markup  language  that  defines  a  set  of  rules   for  encoding  documents  in  a  format  that  is  machine-­‐  and  human-­‐readable.• RSS,  XHTML  (unzipped  EPUB)  and  ONIX  (ONline  Informa>on  eXchange—standard   for  sharing  bibliographic  data)  are  some  of  the  100s  of  XML-­‐based  languages  that   have  been  developed.• How  might  we  use  XML  for  the  Tech  Project?   7
  8. 8. Current db Export to XML Rename / Modify XML New db Import from XML 8
  9. 9. 9
  10. 10. ONIX  is  XML• Interna>onal  standard  for  represen>ng  and  communica>ng  book  and  product  info   in  electronic  form – text-­‐readable  (human  &  computer) – tagged/markup – transferred  by  email  or  rp  (file  transfer  protocol) – More  info 10
  11. 11. Publisher db Export to ONIX & FTP file to Server ServerBookseller db Grab file from Server & Import from ONIX 11
  12. 12. Publisher db Export to ONIX & FTP file to Server ServerBookseller db Grab file from Server & Import from ONIX 12
  13. 13. EDI:  Electronic  Data  Interchange• structured  (db  to  db)  transmission  of  data• Oren  XML  tagged  format Source 13
  14. 14. Ques>ons  on  XML?• Data,  database  ques>ons?• Tech  project? 14
  15. 15. WEBCASTA Roadmap to Efficiently ProducingMulti-Format/Multi-Screen eBooksLessons from Market InnovatorsNovember 8, 2012
  16. 16. Speakers§ Thad McIlroy – Electronic publishing analyst and author The Future of Publishing§ Stephen Driver – Vice President, Production Services The Rowman & Littlefield Publishing Group
  17. 17. XML  Workflows  for  eBooks 17
  18. 18. XML Adoption by Sector STM Educational Trade
  19. 19. XML DefinedXML is:n A device-independent, system- independent method of storing and processing electronic text n Markup for form and/or meaningn A data interchange format used by many applications on the Web.
  20. 20. XML Provides Real Solutionsn But it is a big, ugly, unwieldy bearn And its conceptual metaphors bear little resemblance for book publishersn It’s based on 25-year-old thinking about technical documents and ecommercen Yet it’s the only real game in townn ONIX book metadata is enabled by XML
  21. 21. The Importance of XMLn XML enables content managementn Separates form from contentn Combines of style sheets with the power of databases in an extensible languagen Its long-term killer feature is semantic markup – marking up meaning, making text discoverablen Future-proofing content
  22. 22. XML TaggingSemantic tagging requires human judgmentbut offers the benefit of meaning<book price=“49.95" ISBN="string" publicationdate="2012-12-09"> <title>string</title> <author> <first-name>string</first-name> <last-name>string</last-name> </author> <genre>string</genre> </book>
  23. 23. Structured Tagging by Authors?24 Typéfi sample approach
  24. 24. If you show this to editors...“They’re going to startdrinking at their desks”
  25. 25. Templated DesignsHow much book content fitsinto automatic composition?
  26. 26. The Human Factor New Internal Skills & Positionsn The production skill set changes substantially n Much of the existing knowledge base changes or obsoletes n The move from design & composition & production management to content & product architecting and engineering n There is an enormous training challenge ahead
  27. 27. Key Takeawaysn XML is complex, but packed with valuen XML is not an all-or-nothing deal n Your should start with small stepsn XML’s complexity demands outside help n Services, consultants, trainers, associationsn The rapid proliferation of output formats can only be mastered with a structured approach like XML
  28. 28. Obstacles  to  using  XML• XML  is  in>mida>ng,  full  of  jargon• We’re  editors,  not  programmers• And  what  about  the  authors?• You  mean  I  can’t  move  that  line  of  text  half  a  pica?!  And   other  design  concerns• Editorial,  or  “my  book’s  too  good  for  a  template”
  29. 29. So  how’d  we  solve  it?• We  manipulated  XML  to  our  uses,  not  the  other  way  around• We  s>ll  used  authors’  Word  documents  as  the  source• Template  interiors  were  something  we  had  already  been  doing   for  years• XML  coding  was  translated  into  a  coding  structure  virtually  all   produc>on  people  know:    typeseung  short  tags• We  adapted  exis>ng  XML  approaches  to  our  specific  needs  by   discarding  coding  that  didn’t  fit  our  content
  30. 30. But  weren’t  there  problems?
  31. 31. A  Mul>-­‐Channel  Workflow  Example
  32. 32. 1.  Word  document  received  from  author
  33. 33. 2.  Word  file  coded  for  XML  conversion            (resembles  standard  typeseung  short  tags)
  34. 34.          3.    Typeseung  short  tags  replaced  with  XML  via                    conversion  process  (some  file  edi>ng  required.)
  35. 35.  4.  Final  PDF  generated            arer  style  template          applied  to  XML  file.          EPUB,  .mobi  and            WebPDF  generated.
  36. 36. Insider  Tips• Know  your  staff Who  can  adjust  and  how  will  you  address  those  who  can’t?• Know  your  content Using  the  right  tool  for  the  job  is  cri>cal,  not  all  content  is  suitable  for   XML  composi>on• Be  realisCc  about  the  learning  curve If  you’re  s>ll  paper  edi>ng,  making  the  leap  straight  to  XML  may  be   too  great,  so  start  small• Be  flexible You’ll  likely  revisit  several  core  values  of  your  publishing  program,   iden>fy  the  most  important  things  and  be  honest  about  the  less   important  ones
  37. 37. Insider  Tips,  cont.• XML  need  not  be  an  off-­‐the-­‐shelf  product You  can  and  should  work  to  customize  it  to  your  own  produc>on   needs• See  it  through It’s  taken  us  two  years  to  arrive  at  a  point  where  we’re  comfortable,   and  we’re  s>ll  making  changes• Partner  with  the  right  vendors Find  someone  willing  and  capable  of  adap>ng  to  your  publishing  needs• When  you  need  a  hammer,  use  a  hammer Remember  XML  is  just  another  tool,  it  shouldn’t  be  your  only  tool.  
  38. 38. Ques>ons? 38
  39. 39. What’s  NextTech  Course  8021. Chris>ne  on  Tues  15th:  coming  in  to  talk  templates  and  wordpress2. Next  Tues  22nd:  Chloe  and  Stacey  coming  in  to  talk  about  ebooks,  and  xml3. Following  Mon  28  and  Tues  29:  Brenda  J  Walker  and  Haig  Armen  on  appsTech  Project  6071. This  Wed  16th:  Content  to  present  assignment  to  Design  &  Tech  so  we  can  all  be  on   the  same  page  and  on  Thurs  carry  on  with  wireframes/design  mockups  (Design),   plaworm  set  up  (Tech)  and  discoverability/ed  calendar  (Content)2. Following  Wed  23rd:  Present  to  Alan  and  David  designs  and  ideas  so  far.