1
Enabling Live
Newsfeeds using RSS,
Servlets and
Transformations
Russell Castagnaro
Russell@4charity.com
Introduction
Pre...
2
Introduction
4Charity.com
? Application Service Provider for the Non-
Profit industry
? Pure Java development
? Http://w...
3
What’s the deal?
Newsfeeds are becoming a requirement
for portal sites.
Easy integration with existing web
services is a...
4
Code Example
Needed different ‘ParsSpec’for each
content provider
URLToXMLConsumer.java
SpaceProducer.java
These worked ...
5
Java Code
URLToXMLProducer.xml and subclasses
Nice Features
All search providers content was
converted to one XML docume...
6
Document Type Definition
<?xml version="1.0" encoding="US-ASCII" ?>
<!-- Newsfeed.dtd -->
<!-- Simple DTD that defines a...
7
Transforming the
Newsfeed
Make the news feed human readable:
? Create a Stylesheet using the XML
DOCTYPE rules
? Transfo...
8
HTML Content
Then the Display Format
Changed
Simple changes in the format from any
site required significant changes
Cha...
9
Interesting Points
I was not interested in manipulating XML
documents within Java*
I did not want to deal with DOM or SA...
10
Enter RDF Site Summary
Preliminary format was v .91 from
Netscape (remember them?)
Resource Definition Format Summary
(...
11
RSS Stylesheet Example
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transf...
12
Access to RSS Feeds
Where do you find providers???
Directory of open RSS providers:
? http://www.superopendirectory.com...
13
Transformation in a
Servletpublic void service(HttpServletRequest req, HttpServletResponse res)
throws IOException, Ser...
14
New Codepublic void doGet(HttpServletRequest req, HttpServletResponse res) {
try {
PrintWriter out = res.getWriter(); r...
15
Setting up your servlet
Most Appservers or Webservers
support WAR’s and Deployment
Descriptors
You create a WebApp whic...
16
Deployment Descriptor
<init-param>
<param-name>STYLESHEET</param-name>
<param-value>/xsl/rss.xsl</param-value>
<descrip...
17
Moving Forward
RSS version 1.0 has been
recommended by the w3c
1.0 Uses has more flexibility
Once more providers suppor...
18
Finally
Thanks for attending
Source Code Available
? http://www.synctank.com/xmldevcon
? russell@4charity.com
Aloha
Upcoming SlideShare
Loading in …5
×

2001: Bridging the Gap between RSS and Java Old School Style

410 views

Published on

Before things had really caught on with Atom, RSS etc. There were many people looking for ways to handle Syndicated content. This was a pretty successful talk that I ended up giving quite a bit.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
410
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

2001: Bridging the Gap between RSS and Java Old School Style

  1. 1. 1 Enabling Live Newsfeeds using RSS, Servlets and Transformations Russell Castagnaro Russell@4charity.com Introduction Presenter ? Russell Castagnaro ? Chief Mentor ? 4Charity.com ? SyncTank Solutions, Inc ? russell@4charity.com ? Experience
  2. 2. 2 Introduction 4Charity.com ? Application Service Provider for the Non- Profit industry ? Pure Java development ? Http://www.4charity.com ? Locations: ? San Francisco,CA (HQ) ? Honolulu, HI (Tech Team) Goals? Leverage the Servlet 2.2 API Employ XML for data and configuration Use Resource Definition Format for content data Format XML using XSL Transformation Eliminate hard-coding values
  3. 3. 3 What’s the deal? Newsfeeds are becoming a requirement for portal sites. Easy integration with existing web services is a key requirement? How can we avoid writing custom code for information providers? Can we avoid applets!!? Background In 1999 I wrote an information portal application. Live newsfeeds seemed like a good idea I wrote custom parsers and employed an open-source tool called Cocoon Every time the html changed, I had to recode!
  4. 4. 4 Code Example Needed different ‘ParsSpec’for each content provider URLToXMLConsumer.java SpaceProducer.java These worked great for 2 months... ‘ParseSpec’ #HeadlineEntry cacheTime=6000 HeadlineEntry=start=n,end=<p>,attributes=Link,URL,Headline,Source,Date HeadlineEntry.Link=start=<a href=",end="> HeadlineEntry.Headline=start=">,end=</a> HeadlineEntry.Source=start=<font size="-1">,end=</font> #HeadlineEntry.Description=start=<br>,end=<br> HeadlineEntry.Date=start=- <i>,end=</i> HeadlineEntry.DTD="http://space.synctank.com/dtds/newsfeed.dtd " HeadlineEntry.Doctype=Newsfeed HeadlineEntry.URL=http://search.news.yahoo.com/search/news?p=space+aerospace&n= HeadlineEntry.QTY=10 HeadlineEntry.XML=version="1.0" HeadlineEntry.Header= <?xml-stylesheet href="http://space.synctank.com/xsl/spacenews.xsl" type="text/xsl"?>n <?cocoon-process type="xslt"?>n <!-- ============================================================ -->n <!-- spacenews.xml -->n <!-- Simple XML file that uses the Newsfeed DTD. -->n <!-- Author: XML Loader Russell Castagnaro Thu Nov 18 22:59:07 HST 1999 ->n <!-- ============================================================ -->n
  5. 5. 5 Java Code URLToXMLProducer.xml and subclasses Nice Features All search providers content was converted to one XML document type Once the XML was created all search engines results were handled easily with XSLT
  6. 6. 6 Document Type Definition <?xml version="1.0" encoding="US-ASCII" ?> <!-- Newsfeed.dtd --> <!-- Simple DTD that defines a grammar for news Feeds. --> <!-- Author: Russell Castagnaro Nov 15 1999 --> <!ELEMENT Newsfeed (HeadlineEntry)+> <!ELEMENT HeadlineEntry (Link, Headline, Source, Description, Date)> <!ELEMENT Link (#PCDATA)> <!ELEMENT Headline (#PCDATA)> <!ELEMENT Source (#PCDATA)> <!ELEMENT Description (#PCDATA)> <!ELEMENT Date (#PCDATA)> NewsFeed Content (XML) <?xml version="1.0"?> <?xml-stylesheet href="spacenews.xsl" type="text/xsl"?> <?cocoon-process type="xslt"?> <Newsfeed> <HeadlineEntry> <Link>http://dailynews.yahoo.com/h/ap/19991222/sc/space_shuttle_77.html</Link> <Headline>Shuttle Astronauts Begin <b>Space</b>walk</Headline> <Source>(Associated Press)</Source> <Date>Dec 22 6:08 PM EST</Date> </HeadlineEntry> <HeadlineEntry> <Link>http://biz.yahoo.com/rf/991222/xr.html</Link> <Headline>RESEARCH ALERT - Boeing raised to buy</Headline> <Source>(Reuters)</Source> <Date>Dec 22 12:03 PM EST</Date> </HeadlineEntry> </Newsfeed>
  7. 7. 7 Transforming the Newsfeed Make the news feed human readable: ? Create a Stylesheet using the XML DOCTYPE rules ? Transform the XML Document Using the XSL Document * Specifics on transformations coming soon! The StyleSheet <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html" indent="no"/> <xsl:template match="/"> <TABLE width="100%" cellpadding="0" cellspacing="0" border="0"> <TR><TD bgcolor="#3366CC" align="left" valign="middle"> <font face="helvetica, arial" size="2" color="#FFFFFF"> <nobr><b>News</b></nobr></font> </TD><TD align="right" bgcolor="#3366CC" valign="top" > <a href="/space/news/spacenews.xml"> <font face="helvetica, arial" size="1" color="#FFFFFF">View</font></a> <IMG SRC="/space/images/spacer2.gif" BORDER="0" WIDTH="5" HEIGHT="2"/> </TD></TR> <TR><TD> <font size="2" face="Arial, Helvetica, sans-serif"> <b>Space and Aerospace News</b></font><BR/> <xsl:apply-templates/> </TD></TR></TABLE> </xsl:template> <xsl:template match="HeadlineEntry"> <B><FONT face="helvetica, arial" size="1"> <A HREF="{Link}"><xsl:value-of select="Headline"/></A> </FONT></B> - <I> <FONT size="-2" face="Arial, Helvetica, sans-serif"> <xsl:value-of select="Source"/></FONT></I><BR/> </xsl:template> </xsl:stylesheet>
  8. 8. 8 HTML Content Then the Display Format Changed Simple changes in the format from any site required significant changes Changing the parsing rules was not trivial Eventually this became boring and tiresome
  9. 9. 9 Interesting Points I was not interested in manipulating XML documents within Java* I did not want to deal with DOM or SAX I was interested in displaying data in a clean, efficient manner The producer code I created was a bit embarrassing *I was not lazy. I had a very full schedule at the time… . Sheesh! Time Warp (Oct 2000) None of my parsing instructions still worked ? I had no interest in using the old code There had to be a better way I heard about O’reilly’s merkat project…
  10. 10. 10 Enter RDF Site Summary Preliminary format was v .91 from Netscape (remember them?) Resource Definition Format Summary (RSS .91) http://my.netscape.com/publish/formats/rss-0.91.dtd Eliminates the need to parse through HTML for content. Standard - now WC3 has recommended version 1.0 RSS Example<?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> <channel> <title> Space science news</title> <link>http://www.moreover.com</link> <description>Space science news - news headlines from around the web, refreshed every 15 minutes</description> <language>en-us</language> <image> <title>moreover...</title> <url>http://i.moreover.com/pics/rss.gif</url> <link>http://www.moreover.com</link> <width>144</width> <height>16</height> <description>News headlines from more than 1,800 sources, harvested every 15 minutes...</description> </image> <item> <title>NASA releases space station crew logs</title> <link>http://c.moreover.com/click/here.pl?r16768175</link> <description>floridatoday.com Mar 22 2001 12:20AM ET</description> </item> <item> <title>Tough love but support for space by George W. Bushs team</title> <link>http://c.moreover.com/click/here.pl?r16768185</link> <description>floridatoday.com Mar 22 2001 12:20AM ET</description> </item> </channel> </rss> C:developmentCastagnarospacespace-moreover.xml
  11. 11. 11 RSS Stylesheet Example <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="rss"> <Foo bar="{version}"> <xsl:apply-templates/> </Foo> </xsl:template> <xsl:template match="channel"> <TABLE width="100%" cellpadding="0" cellspacing="0" border="0"> <TR> <TH align="left" bgcolor="#3366CC" valign="top" > <a alt="{description}" href="{link}"><font face="helvetica, arial" size="2" color="#FFFFFF"><nobr><xsl:value-of select="title"/></nobr></font></a></TH> </TR> <xsl:apply-templates select="image"/> <xsl:apply-templates select="item"/> </TABLE> </xsl:template> <xsl:template match="image"> <TR><TD align="right"><a href="{link}"> <IMG SRC="{url}" BORDER="0" WIDTH="{width}" HEIGHT="{height}"/></a></TD></TR> </xsl:template> <xsl:template match="item"> <TR><TD colspan="2"><B><FONT face="helvetica, arial" size="1"> <A HREF="{link}"><xsl:value-of select="title"/></A></FONT></B> - <I> <FONT size="-2" face="Arial, Helvetica, sans-serif"><xsl:value-of select="description"/></FONT></I></TD></TR> </xsl:template> </xsl:stylesheet> Newsfeed HTML
  12. 12. 12 Access to RSS Feeds Where do you find providers??? Directory of open RSS providers: ? http://www.superopendirectory.com/directory/4/standards/rss/sources RSS Providers ? 10.am ? http://10.am/search/-rss?search=<your term here> ? List of topics: http://10.am/extra/ocsdirectory.xml ? echofactor ? http://www.echofactor.com/feed_categories.html?format=RSS ? MoreOver ? http://w.moreover.com/categories/category_list.html Now we need to make this content readable! Transforming XML to HTML We have many options on performing XSL Transformations: ? Depend on the client’s browser to transform the XML ? Write a Servlet to handle the transformation ? Use software that is widely available and standards based Issues: ? IE 5.x is one of the few browsers that support XSL transformations ? Publicly available software has many merits too ? Servlets are easy enough. Transformations can be done in < 10 lines
  13. 13. 13 Transformation in a Servletpublic void service(HttpServletRequest req, HttpServletResponse res) throws IOException, ServletException { PrintWriter out = res.getWriter(); res.setContentType("text/html"); File xmlFile = new File(sourcePath, req.getParameter("XML")); File xslFile = new File(sourcePath, req.getParameter("XSL")); try { XSLTProcessor processor = XSLTProcessorFactory.getProcessor(); processor.process(new XSLTInputSource(new FileReader(xmlFile)), new XSLTInputSource(new FileReader(xslFile)), new XSLTResultTarget(out)); } catch (Exception e) { out.println("Error: " + e.getMessage()); } out.flush(); } One Problem We have to get the XML (RSS) file from the content provider! Use the networking classes to access the URL Be considerate of your provider!
  14. 14. 14 New Codepublic void doGet(HttpServletRequest req, HttpServletResponse res) { try { PrintWriter out = res.getWriter(); res.setContentType("text/html"); URLConnection con; DataInputStream in; URL url = new URL(sourceURL); con = url.openConnection(); con.connect(); String type = null; in = new DataInputStream(con.getInputStream()); FileReader fr = new FileReader(xslsrc); try { XSLTProcessor processor = XSLTProcessorFactory.getProcessor(); processor.process(new XSLTInputSource(in), new XSLTInputSource(fr), new XSLTResultTarget(out)); } catch (Exception e) { log("Error: " + e.getMessage()); } finally { in.close(); fr.close(); } out.flush(); } catch (Exception e) { … } XSLT Model Request Response Servlet URL Loaded XML XSLT Processor XSL Document HTML NewsFeed
  15. 15. 15 Setting up your servlet Most Appservers or Webservers support WAR’s and Deployment Descriptors You create a WebApp which has servlets, parameters and servlet mappings Deployment Descriptor <web-app> <servlet> <servlet-name>newsServlet</servlet-name> <servlet-class>com.synctank.http.servlets.RSSServlet</servlet-class> <init-param> <param-name>ERROR_URL</param-name> <param-value>/error.jsp</param-value> <description>The error page for this app.</description> </init-param> <init-param> <param-name>SOURCE_SERVLET_URI</param-name> <param-value>http://www.moreover.com/cgi- local/page?o=rss&c=Space%20science%20news</param-value> <description>An absolute url that points to your XML</description> </init-param>
  16. 16. 16 Deployment Descriptor <init-param> <param-name>STYLESHEET</param-name> <param-value>/xsl/rss.xsl</param-value> <description>The Stylesheet for presentation of the headlines. Should be a subdirectory of the war. The default is /xsl/rss.xsl </description> </init-param> <load-on-startup>0</load-on-startup> </servlet> <servlet-mapping> <servlet-name>newsServlet</servlet-name> <url-pattern>/newsy</url-pattern> </servlet-mapping> <welcome-file-list> <welcome-file>/foo/news.html</welcome-file> </welcome-file-list> <error-page> <error-code>404</error-code> <location>/error.jsp</location> </error-page> </web-app> War directory structure Root ? WEB-INF ? Web.xml ? classes ? comsynctankhttpservletsRSSServlet.class ? xsl ? rss.xsl ? docs ? Index.html ? error.jsp
  17. 17. 17 Moving Forward RSS version 1.0 has been recommended by the w3c 1.0 Uses has more flexibility Once more providers support Review Don’t do the time! Leverage RSS and open content providers Use XSL to transform XML content to your format of choice Cache requests to content providers (keep them free!)
  18. 18. 18 Finally Thanks for attending Source Code Available ? http://www.synctank.com/xmldevcon ? russell@4charity.com Aloha

×