SlideShare a Scribd company logo
1 of 39
Download to read offline
Data,	
  Databases	
  &	
  XML
                       A	
  Crash	
  Course.	
  	
  




Monique	
  Sherre8
monique@boxcarmarke>ng.com
3	
  Types	
  of	
  Data
Unstructured	
  Data
• eg.	
  Word	
  documents,	
  PDFs,	
  audio/video	
  files,	
  emails,	
  
• No	
  search
• No	
  version	
  control
Structured	
  Data
• eg.	
  Inventory	
  management	
  database,	
  wordpress
• Searchable
• Version	
  and	
  user	
  control	
  (secure	
  access)
• Rela>onship	
  structures	
  (show	
  everything	
  tagged	
  “winter”)
• Import	
  /	
  Export
• Display	
  op>ons
• Machine	
  readable;	
  run	
  queries	
  against	
  the	
  data
Semi-­‐Structured	
  Data
• eg.	
  xml	
  (html,	
  onix,	
  rss)	
  
• formal/standardized	
  data	
  
                                                                              2
Structured	
  Data:	
  Wordpress
•   Open	
  Source	
  content	
  management	
  system	
  based	
  on	
  PHP	
  and	
  MySQL
     – Open	
  Source:	
  source	
  code	
  is	
  freely	
  available,	
  which	
  encourages	
  development	
  
       by	
  many	
  independent	
  programmers.	
  
     – CMS:	
  a	
  database	
  +	
  presenta>on	
  layer	
  (set	
  of	
  templates)
     – MySQL:	
  a	
  type	
  of	
  database
     – PHP:	
  a	
  scrip>ng	
  language	
  designed	
  to	
  produce	
  dynamic	
  web	
  pages
•   Plugin	
  architecture	
  (Akismet	
  for	
  spam,	
  SEO	
  by	
  Yoast,	
  WP	
  to	
  Twi8er,	
  etc.)
•   Pages	
  &	
  Posts
•   Categories	
  &	
  Tags




                                                                                                                   3
Pages	
  vs	
  Posts
Page	
  (~unstructured)
•   Sta>c	
  content,	
  won’t	
  change	
  frequently
•   eg.	
  About	
  page
•   Can	
  be	
  organized	
  manually	
  a	
  hierarchy.	
  
    Page	
  (parent)	
  and	
  subpages	
  (child)
     – About	
  Us	
  >	
  Team;	
  About	
  Us	
  >	
  History
Post	
  (~structured)
•   Frequently	
  updated	
  content	
  dynamically	
  organized	
  in	
  a	
  hierarchy	
  (chronological,	
  
    category),	
  plus	
  archive
     – News	
  ar>cles,	
  Event	
  informa>on
     – Frequently	
  published	
  in	
  an	
  RSS	
  feed	
  that	
  is	
  subscribed	
  to	
  by	
  users




                                                                                                                  4
Semi-­‐Structured	
  Data:	
  RSS
•   Real	
  Simple	
  Syndica>on	
  or	
  Rich	
  Site	
  Summary
•   Publish	
  it.	
  Subscribe	
  to	
  it.	
  Pull	
  it	
  into	
  other	
  websites.	
  
•   RSS	
  is	
  a	
  standardized	
  XML	
  file	
  format.




                                                                                               5
WordPress	
  As	
  Database
•   Instead	
  of	
  a	
  series	
  of	
  HTML	
  files,	
  WordPress	
  offers	
  a	
  system	
  that	
  allows	
  for	
  the	
  
    organiza>on	
  and	
  efficient	
  storage	
  &	
  retrieval	
  of	
  informa>on.
     – Structured	
  data	
  can	
  be	
  exported	
  into	
  semi-­‐structured	
  data	
  (RSS,	
  XML)




                                                                                                                                   6
RSS	
  is	
  XML
•   eXtensible	
  Markup	
  Language	
  (XML)	
  is	
  a	
  markup	
  language	
  that	
  defines	
  a	
  set	
  of	
  rules	
  
    for	
  encoding	
  documents	
  in	
  a	
  format	
  that	
  is	
  machine-­‐	
  and	
  human-­‐readable.
•   RSS,	
  XHTML	
  (unzipped	
  EPUB)	
  and	
  ONIX	
  (ONline	
  Informa>on	
  eXchange—standard	
  
    for	
  sharing	
  bibliographic	
  data)	
  are	
  some	
  of	
  the	
  100s	
  of	
  XML-­‐based	
  languages	
  that	
  
    have	
  been	
  developed.
•   How	
  might	
  we	
  use	
  XML	
  for	
  the	
  Tech	
  Project?	
  




                                                                                                                                 7
Current db



              Export
              to XML




             Rename /
              Modify
               XML
 New db


               Import
             from XML




                        8
9
ONIX	
  is	
  XML
•   Interna>onal	
  standard	
  for	
  represen>ng	
  and	
  communica>ng	
  book	
  and	
  product	
  info	
  
    in	
  electronic	
  form
     – text-­‐readable	
  (human	
  &	
  computer)
     – tagged/markup
     – transferred	
  by	
  email	
  or	
  rp	
  (file	
  transfer	
  protocol)
     – More	
  info	
  Bisg.org




                                                                                                              10
Publisher db
                  Export
                to ONIX &
                FTP file to
                  Server


                   Server




Bookseller db


                   Grab
                 file from
                 Server &
                  Import
                from ONIX




                             11
Publisher db
                  Export
                to ONIX &
                FTP file to
                  Server


                   Server




Bookseller db


                   Grab
                 file from
                 Server &
                  Import
                from ONIX




                             12
EDI:	
  Electronic	
  Data	
  Interchange
•   structured	
  (db	
  to	
  db)	
  transmission	
  of	
  data
•   Oren	
  XML	
  tagged	
  format




                                                                   Source
                                                                            13
Ques>ons	
  on	
  XML?
• Data,	
  database	
  ques>ons?
• Tech	
  project?




                                     14
WEBCAST

A Roadmap to Efficiently Producing
Multi-Format/Multi-Screen eBooks

Lessons from Market Innovators




November 8, 2012
Speakers

§ Thad McIlroy
   – Electronic publishing analyst and author
     The Future of Publishing

§ Stephen Driver
   – Vice President, Production Services
     The Rowman & Littlefield Publishing Group
XML	
  Workflows	
  for	
  eBooks




                                   17
XML Adoption by Sector




    STM   Educational   Trade
XML Defined
XML is:
n A device-independent, system-
   independent method of storing and
   processing electronic text
      n   Markup for form and/or meaning
n   A data interchange format used by many
     applications on the Web.
XML Provides Real Solutions
n   But it is a big, ugly, unwieldy bear
n   And its conceptual metaphors bear little
     resemblance for book publishers
n   It’s based on 25-year-old thinking about
     technical documents and ecommerce
n   Yet it’s the only real game in town
n   ONIX book metadata is enabled by XML
The Importance of XML
n   XML enables content management
n   Separates form from content
n   Combines of style sheets with the power
     of databases in an extensible language
n   Its long-term killer feature is semantic
     markup – marking up meaning, making
     text discoverable
n   Future-proofing content
XML Tagging
Semantic tagging requires human judgment
but offers the benefit of meaning
<book price=“49.95" ISBN="string" publicationdate="2012-12-09">
   <title>string</title>
   <author>
      <first-name>string</first-name>
      <last-name>string</last-name>
   </author>
   <genre>string</genre>
 </book>
Structured Tagging
         by Authors?




24                Typéfi sample approach
If you show this to editors...
“They’re going to start
drinking at their desks”
Templated Designs
How much book content fits
into automatic composition?
The Human Factor
 New Internal Skills & Positions
n   The production skill set changes
     substantially
      n   Much of the existing knowledge base
           changes or obsoletes
      n   The move from design & composition &
           production management to content &
           product architecting and engineering
      n   There is an enormous training challenge
           ahead
Key Takeaways
n   XML is complex, but packed with value
n   XML is not an all-or-nothing deal
      n   Your should start with small steps
n   XML’s complexity demands outside help
      n   Services, consultants, trainers, associations
n   The rapid proliferation of output formats
     can only be mastered with a structured
     approach like XML
Obstacles	
  to	
  using	
  XML

• XML	
  is	
  in>mida>ng,	
  full	
  of	
  jargon


• We’re	
  editors,	
  not	
  programmers


• And	
  what	
  about	
  the	
  authors?


• You	
  mean	
  I	
  can’t	
  move	
  that	
  line	
  of	
  text	
  half	
  a	
  pica?!	
  And	
  
  other	
  design	
  concerns


• Editorial,	
  or	
  “my	
  book’s	
  too	
  good	
  for	
  a	
  template”
So	
  how’d	
  we	
  solve	
  it?

• We	
  manipulated	
  XML	
  to	
  our	
  uses,	
  not	
  the	
  other	
  way	
  around

• We	
  s>ll	
  used	
  authors’	
  Word	
  documents	
  as	
  the	
  source

• Template	
  interiors	
  were	
  something	
  we	
  had	
  already	
  been	
  doing	
  
  for	
  years

• XML	
  coding	
  was	
  translated	
  into	
  a	
  coding	
  structure	
  virtually	
  all	
  
  produc>on	
  people	
  know:	
  	
  typeseung	
  short	
  tags

• We	
  adapted	
  exis>ng	
  XML	
  approaches	
  to	
  our	
  specific	
  needs	
  by	
  
  discarding	
  coding	
  that	
  didn’t	
  fit	
  our	
  content
But	
  weren’t	
  there	
  problems?
A	
  Mul>-­‐Channel	
  
Workflow	
  Example
1.	
  Word	
  document	
  received	
  from	
  author
2.	
  Word	
  file	
  coded	
  for	
  XML	
  conversion	
  
	
  	
  	
  	
  	
  (resembles	
  standard	
  typeseung	
  short	
  tags)
 	
  	
  	
  	
  3.	
  	
  Typeseung	
  short	
  tags	
  replaced	
  with	
  XML	
  via
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  conversion	
  process	
  (some	
  file	
  edi>ng	
  required.)
 4.	
  Final	
  PDF	
  generated	
  
	
  	
  	
  	
  	
  arer	
  style	
  template
	
  	
  	
  	
  	
  applied	
  to	
  XML	
  file.

	
  	
  	
  	
  	
  EPUB,	
  .mobi	
  and	
  
	
  	
  	
  	
  	
  WebPDF	
  generated.
Insider	
  Tips
• Know	
  your	
  staff
  Who	
  can	
  adjust	
  and	
  how	
  will	
  you	
  address	
  those	
  who	
  can’t?

• Know	
  your	
  content
  Using	
  the	
  right	
  tool	
  for	
  the	
  job	
  is	
  cri>cal,	
  not	
  all	
  content	
  is	
  suitable	
  for	
  
  XML	
  composi>on

• Be	
  realisCc	
  about	
  the	
  learning	
  curve
  If	
  you’re	
  s>ll	
  paper	
  edi>ng,	
  making	
  the	
  leap	
  straight	
  to	
  XML	
  may	
  be	
  
  too	
  great,	
  so	
  start	
  small

• Be	
  flexible
  You’ll	
  likely	
  revisit	
  several	
  core	
  values	
  of	
  your	
  publishing	
  program,	
  
  iden>fy	
  the	
  most	
  important	
  things	
  and	
  be	
  honest	
  about	
  the	
  less	
  
  important	
  ones
Insider	
  Tips,	
  cont.

• XML	
  need	
  not	
  be	
  an	
  off-­‐the-­‐shelf	
  product
  You	
  can	
  and	
  should	
  work	
  to	
  customize	
  it	
  to	
  your	
  own	
  produc>on	
  
  needs


• See	
  it	
  through
  It’s	
  taken	
  us	
  two	
  years	
  to	
  arrive	
  at	
  a	
  point	
  where	
  we’re	
  comfortable,	
  
  and	
  we’re	
  s>ll	
  making	
  changes


• Partner	
  with	
  the	
  right	
  vendors
  Find	
  someone	
  willing	
  and	
  capable	
  of	
  adap>ng	
  to	
  your	
  publishing	
  needs


• When	
  you	
  need	
  a	
  hammer,	
  use	
  a	
  hammer
  Remember	
  XML	
  is	
  just	
  another	
  tool,	
  it	
  shouldn’t	
  be	
  your	
  only	
  tool.	
  
Ques>ons?




            38
What’s	
  Next
Tech	
  Course	
  802
1. Chris>ne	
  on	
  Tues	
  15th:	
  coming	
  in	
  to	
  talk	
  templates	
  and	
  wordpress
2. Next	
  Tues	
  22nd:	
  Chloe	
  and	
  Stacey	
  coming	
  in	
  to	
  talk	
  about	
  ebooks,	
  and	
  xml
3. Following	
  Mon	
  28	
  and	
  Tues	
  29:	
  Brenda	
  J	
  Walker	
  and	
  Haig	
  Armen	
  on	
  apps


Tech	
  Project	
  607
1. This	
  Wed	
  16th:	
  Content	
  to	
  present	
  assignment	
  to	
  Design	
  &	
  Tech	
  so	
  we	
  can	
  all	
  be	
  on	
  
   the	
  same	
  page	
  and	
  on	
  Thurs	
  carry	
  on	
  with	
  wireframes/design	
  mockups	
  (Design),	
  
   plaworm	
  set	
  up	
  (Tech)	
  and	
  discoverability/ed	
  calendar	
  (Content)
2. Following	
  Wed	
  23rd:	
  Present	
  to	
  Alan	
  and	
  David	
  designs	
  and	
  ideas	
  so	
  far.	
  

More Related Content

What's hot

TextFix: Text Editor and Processor
TextFix: Text Editor and ProcessorTextFix: Text Editor and Processor
TextFix: Text Editor and ProcessorAppmattus Limited
 
NOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the CloudNOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the Cloudboorad
 
SHOW104 - Buried treasure: Finding the Hidden Gold in Lotus Notes Data
SHOW104 - Buried treasure: Finding the Hidden Gold in Lotus Notes DataSHOW104 - Buried treasure: Finding the Hidden Gold in Lotus Notes Data
SHOW104 - Buried treasure: Finding the Hidden Gold in Lotus Notes Datapanagenda
 
Show104 buried treasure
Show104 buried treasureShow104 buried treasure
Show104 buried treasureMark Myers
 
Indic threads pune12-comparing hadoop data storage
Indic threads pune12-comparing hadoop data storageIndic threads pune12-comparing hadoop data storage
Indic threads pune12-comparing hadoop data storageIndicThreads
 

What's hot (6)

TextFix: Text Editor and Processor
TextFix: Text Editor and ProcessorTextFix: Text Editor and Processor
TextFix: Text Editor and Processor
 
NOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the CloudNOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the Cloud
 
SHOW104 - Buried treasure: Finding the Hidden Gold in Lotus Notes Data
SHOW104 - Buried treasure: Finding the Hidden Gold in Lotus Notes DataSHOW104 - Buried treasure: Finding the Hidden Gold in Lotus Notes Data
SHOW104 - Buried treasure: Finding the Hidden Gold in Lotus Notes Data
 
Show104 buried treasure
Show104 buried treasureShow104 buried treasure
Show104 buried treasure
 
REST in Practice
REST in PracticeREST in Practice
REST in Practice
 
Indic threads pune12-comparing hadoop data storage
Indic threads pune12-comparing hadoop data storageIndic threads pune12-comparing hadoop data storage
Indic threads pune12-comparing hadoop data storage
 

Viewers also liked

$50, $500, $5000: 3 Ways to Run a Successful Facebook Campaign (and how to m...
$50, $500, $5000: 3 Ways to Run a Successful Facebook Campaign (and how to m...$50, $500, $5000: 3 Ways to Run a Successful Facebook Campaign (and how to m...
$50, $500, $5000: 3 Ways to Run a Successful Facebook Campaign (and how to m...somisguided
 
The "Busines" of Being an Author
The "Busines" of Being an AuthorThe "Busines" of Being an Author
The "Busines" of Being an Authorsomisguided
 
Pub355: Press Releases Review
Pub355: Press Releases ReviewPub355: Press Releases Review
Pub355: Press Releases Reviewsomisguided
 
Introduction to Marketing Data: What to measure and why
Introduction to Marketing Data: What to measure and whyIntroduction to Marketing Data: What to measure and why
Introduction to Marketing Data: What to measure and whysomisguided
 
Pub 355 Advancing The 7-Sentence Marketing Plan
Pub 355 Advancing The 7-Sentence Marketing PlanPub 355 Advancing The 7-Sentence Marketing Plan
Pub 355 Advancing The 7-Sentence Marketing Plansomisguided
 
Getting to +1: A Social Approach to Winning Support for Search Marketing
Getting to +1: A Social Approach to Winning Support for Search Marketing Getting to +1: A Social Approach to Winning Support for Search Marketing
Getting to +1: A Social Approach to Winning Support for Search Marketing somisguided
 
Student Presentation: Ablaze chamber orchestra
Student Presentation: Ablaze chamber orchestraStudent Presentation: Ablaze chamber orchestra
Student Presentation: Ablaze chamber orchestrasomisguided
 
Universal Analytics for Book Publishers: Knowing a Little Bit About Everything
Universal Analytics for Book Publishers: Knowing a Little Bit About EverythingUniversal Analytics for Book Publishers: Knowing a Little Bit About Everything
Universal Analytics for Book Publishers: Knowing a Little Bit About Everythingsomisguided
 
Marketing Playbook
Marketing PlaybookMarketing Playbook
Marketing Playbooksomisguided
 
Future of eReading
Future of eReadingFuture of eReading
Future of eReadingsomisguided
 
Pub 355W: Components of a Press Release
Pub 355W: Components of a Press ReleasePub 355W: Components of a Press Release
Pub 355W: Components of a Press Releasesomisguided
 
Pub355 Measuring Goals
Pub355 Measuring GoalsPub355 Measuring Goals
Pub355 Measuring Goalssomisguided
 
The Cluetrain Manifesto & The Open Brand
The Cluetrain Manifesto & The Open BrandThe Cluetrain Manifesto & The Open Brand
The Cluetrain Manifesto & The Open Brandsomisguided
 

Viewers also liked (13)

$50, $500, $5000: 3 Ways to Run a Successful Facebook Campaign (and how to m...
$50, $500, $5000: 3 Ways to Run a Successful Facebook Campaign (and how to m...$50, $500, $5000: 3 Ways to Run a Successful Facebook Campaign (and how to m...
$50, $500, $5000: 3 Ways to Run a Successful Facebook Campaign (and how to m...
 
The "Busines" of Being an Author
The "Busines" of Being an AuthorThe "Busines" of Being an Author
The "Busines" of Being an Author
 
Pub355: Press Releases Review
Pub355: Press Releases ReviewPub355: Press Releases Review
Pub355: Press Releases Review
 
Introduction to Marketing Data: What to measure and why
Introduction to Marketing Data: What to measure and whyIntroduction to Marketing Data: What to measure and why
Introduction to Marketing Data: What to measure and why
 
Pub 355 Advancing The 7-Sentence Marketing Plan
Pub 355 Advancing The 7-Sentence Marketing PlanPub 355 Advancing The 7-Sentence Marketing Plan
Pub 355 Advancing The 7-Sentence Marketing Plan
 
Getting to +1: A Social Approach to Winning Support for Search Marketing
Getting to +1: A Social Approach to Winning Support for Search Marketing Getting to +1: A Social Approach to Winning Support for Search Marketing
Getting to +1: A Social Approach to Winning Support for Search Marketing
 
Student Presentation: Ablaze chamber orchestra
Student Presentation: Ablaze chamber orchestraStudent Presentation: Ablaze chamber orchestra
Student Presentation: Ablaze chamber orchestra
 
Universal Analytics for Book Publishers: Knowing a Little Bit About Everything
Universal Analytics for Book Publishers: Knowing a Little Bit About EverythingUniversal Analytics for Book Publishers: Knowing a Little Bit About Everything
Universal Analytics for Book Publishers: Knowing a Little Bit About Everything
 
Marketing Playbook
Marketing PlaybookMarketing Playbook
Marketing Playbook
 
Future of eReading
Future of eReadingFuture of eReading
Future of eReading
 
Pub 355W: Components of a Press Release
Pub 355W: Components of a Press ReleasePub 355W: Components of a Press Release
Pub 355W: Components of a Press Release
 
Pub355 Measuring Goals
Pub355 Measuring GoalsPub355 Measuring Goals
Pub355 Measuring Goals
 
The Cluetrain Manifesto & The Open Brand
The Cluetrain Manifesto & The Open BrandThe Cluetrain Manifesto & The Open Brand
The Cluetrain Manifesto & The Open Brand
 

Similar to Tech 802: Data, Databases & XML

The XML Forms Architecture
The XML Forms ArchitectureThe XML Forms Architecture
The XML Forms ArchitectureiText Group nv
 
Building bridges - Plone Conference 2015 Bucharest
Building bridges   - Plone Conference 2015 BucharestBuilding bridges   - Plone Conference 2015 Bucharest
Building bridges - Plone Conference 2015 BucharestAndreas Jung
 
Facebook architecture
Facebook architectureFacebook architecture
Facebook architecturedrewz lin
 
Facebook architecture
Facebook architectureFacebook architecture
Facebook architecturemysqlops
 
Qcon 090408233824-phpapp01
Qcon 090408233824-phpapp01Qcon 090408233824-phpapp01
Qcon 090408233824-phpapp01jgregory1234
 
Facebook的架构
Facebook的架构Facebook的架构
Facebook的架构yiditushe
 
6 3 tier architecture php
6 3 tier architecture php6 3 tier architecture php
6 3 tier architecture phpcefour
 
Ruby on Rails (RoR) as a back-end processor for Apex
Ruby on Rails (RoR) as a back-end processor for Apex Ruby on Rails (RoR) as a back-end processor for Apex
Ruby on Rails (RoR) as a back-end processor for Apex Espen Brækken
 
The Role of XML in an Information Society with Barry Schaeffer
The Role of XML in an Information Society with Barry SchaefferThe Role of XML in an Information Society with Barry Schaeffer
The Role of XML in an Information Society with Barry Schaefferdclsocialmedia
 
Building an XML workflow: Tools and key considerations
Building an XML workflow: Tools and key considerationsBuilding an XML workflow: Tools and key considerations
Building an XML workflow: Tools and key considerationstoc
 
Business Strategies for Content Management - Part 3: Publishing Web Content U...
Business Strategies for Content Management - Part 3: Publishing Web Content U...Business Strategies for Content Management - Part 3: Publishing Web Content U...
Business Strategies for Content Management - Part 3: Publishing Web Content U...TJ O'Connor
 
RDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use itRDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use itJose Luis Lopez Pino
 
CRUD Operation of images through XML
CRUD Operation of images through XMLCRUD Operation of images through XML
CRUD Operation of images through XMLAnshudha Maheshwari
 
Content Management: No Mystery
Content Management: No MysteryContent Management: No Mystery
Content Management: No MysteryClearPath, LLC
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!gagravarr
 

Similar to Tech 802: Data, Databases & XML (20)

The XML Forms Architecture
The XML Forms ArchitectureThe XML Forms Architecture
The XML Forms Architecture
 
Building bridges - Plone Conference 2015 Bucharest
Building bridges   - Plone Conference 2015 BucharestBuilding bridges   - Plone Conference 2015 Bucharest
Building bridges - Plone Conference 2015 Bucharest
 
Java Web Services
Java Web ServicesJava Web Services
Java Web Services
 
Web technology today
Web technology todayWeb technology today
Web technology today
 
Unit iv xml dom
Unit iv xml domUnit iv xml dom
Unit iv xml dom
 
Facebook architecture
Facebook architectureFacebook architecture
Facebook architecture
 
Facebook architecture
Facebook architectureFacebook architecture
Facebook architecture
 
Qcon 090408233824-phpapp01
Qcon 090408233824-phpapp01Qcon 090408233824-phpapp01
Qcon 090408233824-phpapp01
 
Facebook的架构
Facebook的架构Facebook的架构
Facebook的架构
 
6 3 tier architecture php
6 3 tier architecture php6 3 tier architecture php
6 3 tier architecture php
 
Ruby on Rails (RoR) as a back-end processor for Apex
Ruby on Rails (RoR) as a back-end processor for Apex Ruby on Rails (RoR) as a back-end processor for Apex
Ruby on Rails (RoR) as a back-end processor for Apex
 
The Role of XML in an Information Society with Barry Schaeffer
The Role of XML in an Information Society with Barry SchaefferThe Role of XML in an Information Society with Barry Schaeffer
The Role of XML in an Information Society with Barry Schaeffer
 
Building an XML workflow: Tools and key considerations
Building an XML workflow: Tools and key considerationsBuilding an XML workflow: Tools and key considerations
Building an XML workflow: Tools and key considerations
 
Business Strategies for Content Management - Part 3: Publishing Web Content U...
Business Strategies for Content Management - Part 3: Publishing Web Content U...Business Strategies for Content Management - Part 3: Publishing Web Content U...
Business Strategies for Content Management - Part 3: Publishing Web Content U...
 
RDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use itRDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use it
 
CRUD Operation of images through XML
CRUD Operation of images through XMLCRUD Operation of images through XML
CRUD Operation of images through XML
 
XML Interfaces to the popular Nessus Scanner
XML Interfaces to the popular Nessus ScannerXML Interfaces to the popular Nessus Scanner
XML Interfaces to the popular Nessus Scanner
 
Xml interfaces to the popular nessus scanner
Xml interfaces to the popular nessus scannerXml interfaces to the popular nessus scanner
Xml interfaces to the popular nessus scanner
 
Content Management: No Mystery
Content Management: No MysteryContent Management: No Mystery
Content Management: No Mystery
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!
 

Tech 802: Data, Databases & XML

  • 1. Data,  Databases  &  XML A  Crash  Course.     Monique  Sherre8 monique@boxcarmarke>ng.com
  • 2. 3  Types  of  Data Unstructured  Data • eg.  Word  documents,  PDFs,  audio/video  files,  emails,   • No  search • No  version  control Structured  Data • eg.  Inventory  management  database,  wordpress • Searchable • Version  and  user  control  (secure  access) • Rela>onship  structures  (show  everything  tagged  “winter”) • Import  /  Export • Display  op>ons • Machine  readable;  run  queries  against  the  data Semi-­‐Structured  Data • eg.  xml  (html,  onix,  rss)   • formal/standardized  data   2
  • 3. Structured  Data:  Wordpress • Open  Source  content  management  system  based  on  PHP  and  MySQL – Open  Source:  source  code  is  freely  available,  which  encourages  development   by  many  independent  programmers.   – CMS:  a  database  +  presenta>on  layer  (set  of  templates) – MySQL:  a  type  of  database – PHP:  a  scrip>ng  language  designed  to  produce  dynamic  web  pages • Plugin  architecture  (Akismet  for  spam,  SEO  by  Yoast,  WP  to  Twi8er,  etc.) • Pages  &  Posts • Categories  &  Tags 3
  • 4. Pages  vs  Posts Page  (~unstructured) • Sta>c  content,  won’t  change  frequently • eg.  About  page • Can  be  organized  manually  a  hierarchy.   Page  (parent)  and  subpages  (child) – About  Us  >  Team;  About  Us  >  History Post  (~structured) • Frequently  updated  content  dynamically  organized  in  a  hierarchy  (chronological,   category),  plus  archive – News  ar>cles,  Event  informa>on – Frequently  published  in  an  RSS  feed  that  is  subscribed  to  by  users 4
  • 5. Semi-­‐Structured  Data:  RSS • Real  Simple  Syndica>on  or  Rich  Site  Summary • Publish  it.  Subscribe  to  it.  Pull  it  into  other  websites.   • RSS  is  a  standardized  XML  file  format. 5
  • 6. WordPress  As  Database • Instead  of  a  series  of  HTML  files,  WordPress  offers  a  system  that  allows  for  the   organiza>on  and  efficient  storage  &  retrieval  of  informa>on. – Structured  data  can  be  exported  into  semi-­‐structured  data  (RSS,  XML) 6
  • 7. RSS  is  XML • eXtensible  Markup  Language  (XML)  is  a  markup  language  that  defines  a  set  of  rules   for  encoding  documents  in  a  format  that  is  machine-­‐  and  human-­‐readable. • RSS,  XHTML  (unzipped  EPUB)  and  ONIX  (ONline  Informa>on  eXchange—standard   for  sharing  bibliographic  data)  are  some  of  the  100s  of  XML-­‐based  languages  that   have  been  developed. • How  might  we  use  XML  for  the  Tech  Project?   7
  • 8. Current db Export to XML Rename / Modify XML New db Import from XML 8
  • 9. 9
  • 10. ONIX  is  XML • Interna>onal  standard  for  represen>ng  and  communica>ng  book  and  product  info   in  electronic  form – text-­‐readable  (human  &  computer) – tagged/markup – transferred  by  email  or  rp  (file  transfer  protocol) – More  info  Bisg.org 10
  • 11. Publisher db Export to ONIX & FTP file to Server Server Bookseller db Grab file from Server & Import from ONIX 11
  • 12. Publisher db Export to ONIX & FTP file to Server Server Bookseller db Grab file from Server & Import from ONIX 12
  • 13. EDI:  Electronic  Data  Interchange • structured  (db  to  db)  transmission  of  data • Oren  XML  tagged  format Source 13
  • 14. Ques>ons  on  XML? • Data,  database  ques>ons? • Tech  project? 14
  • 15. WEBCAST A Roadmap to Efficiently Producing Multi-Format/Multi-Screen eBooks Lessons from Market Innovators November 8, 2012
  • 16. Speakers § Thad McIlroy – Electronic publishing analyst and author The Future of Publishing § Stephen Driver – Vice President, Production Services The Rowman & Littlefield Publishing Group
  • 17. XML  Workflows  for  eBooks 17
  • 18. XML Adoption by Sector STM Educational Trade
  • 19. XML Defined XML is: n A device-independent, system- independent method of storing and processing electronic text n Markup for form and/or meaning n A data interchange format used by many applications on the Web.
  • 20. XML Provides Real Solutions n But it is a big, ugly, unwieldy bear n And its conceptual metaphors bear little resemblance for book publishers n It’s based on 25-year-old thinking about technical documents and ecommerce n Yet it’s the only real game in town n ONIX book metadata is enabled by XML
  • 21. The Importance of XML n XML enables content management n Separates form from content n Combines of style sheets with the power of databases in an extensible language n Its long-term killer feature is semantic markup – marking up meaning, making text discoverable n Future-proofing content
  • 22. XML Tagging Semantic tagging requires human judgment but offers the benefit of meaning <book price=“49.95" ISBN="string" publicationdate="2012-12-09"> <title>string</title> <author> <first-name>string</first-name> <last-name>string</last-name> </author> <genre>string</genre> </book>
  • 23. Structured Tagging by Authors? 24 Typéfi sample approach
  • 24. If you show this to editors... “They’re going to start drinking at their desks”
  • 25. Templated Designs How much book content fits into automatic composition?
  • 26. The Human Factor New Internal Skills & Positions n The production skill set changes substantially n Much of the existing knowledge base changes or obsoletes n The move from design & composition & production management to content & product architecting and engineering n There is an enormous training challenge ahead
  • 27. Key Takeaways n XML is complex, but packed with value n XML is not an all-or-nothing deal n Your should start with small steps n XML’s complexity demands outside help n Services, consultants, trainers, associations n The rapid proliferation of output formats can only be mastered with a structured approach like XML
  • 28. Obstacles  to  using  XML • XML  is  in>mida>ng,  full  of  jargon • We’re  editors,  not  programmers • And  what  about  the  authors? • You  mean  I  can’t  move  that  line  of  text  half  a  pica?!  And   other  design  concerns • Editorial,  or  “my  book’s  too  good  for  a  template”
  • 29. So  how’d  we  solve  it? • We  manipulated  XML  to  our  uses,  not  the  other  way  around • We  s>ll  used  authors’  Word  documents  as  the  source • Template  interiors  were  something  we  had  already  been  doing   for  years • XML  coding  was  translated  into  a  coding  structure  virtually  all   produc>on  people  know:    typeseung  short  tags • We  adapted  exis>ng  XML  approaches  to  our  specific  needs  by   discarding  coding  that  didn’t  fit  our  content
  • 30. But  weren’t  there  problems?
  • 32. 1.  Word  document  received  from  author
  • 33. 2.  Word  file  coded  for  XML  conversion            (resembles  standard  typeseung  short  tags)
  • 34.          3.    Typeseung  short  tags  replaced  with  XML  via                    conversion  process  (some  file  edi>ng  required.)
  • 35.  4.  Final  PDF  generated            arer  style  template          applied  to  XML  file.          EPUB,  .mobi  and            WebPDF  generated.
  • 36. Insider  Tips • Know  your  staff Who  can  adjust  and  how  will  you  address  those  who  can’t? • Know  your  content Using  the  right  tool  for  the  job  is  cri>cal,  not  all  content  is  suitable  for   XML  composi>on • Be  realisCc  about  the  learning  curve If  you’re  s>ll  paper  edi>ng,  making  the  leap  straight  to  XML  may  be   too  great,  so  start  small • Be  flexible You’ll  likely  revisit  several  core  values  of  your  publishing  program,   iden>fy  the  most  important  things  and  be  honest  about  the  less   important  ones
  • 37. Insider  Tips,  cont. • XML  need  not  be  an  off-­‐the-­‐shelf  product You  can  and  should  work  to  customize  it  to  your  own  produc>on   needs • See  it  through It’s  taken  us  two  years  to  arrive  at  a  point  where  we’re  comfortable,   and  we’re  s>ll  making  changes • Partner  with  the  right  vendors Find  someone  willing  and  capable  of  adap>ng  to  your  publishing  needs • When  you  need  a  hammer,  use  a  hammer Remember  XML  is  just  another  tool,  it  shouldn’t  be  your  only  tool.  
  • 38. Ques>ons? 38
  • 39. What’s  Next Tech  Course  802 1. Chris>ne  on  Tues  15th:  coming  in  to  talk  templates  and  wordpress 2. Next  Tues  22nd:  Chloe  and  Stacey  coming  in  to  talk  about  ebooks,  and  xml 3. Following  Mon  28  and  Tues  29:  Brenda  J  Walker  and  Haig  Armen  on  apps Tech  Project  607 1. This  Wed  16th:  Content  to  present  assignment  to  Design  &  Tech  so  we  can  all  be  on   the  same  page  and  on  Thurs  carry  on  with  wireframes/design  mockups  (Design),   plaworm  set  up  (Tech)  and  discoverability/ed  calendar  (Content) 2. Following  Wed  23rd:  Present  to  Alan  and  David  designs  and  ideas  so  far.