Transcript of "Semantic web, python, construction industry"
Setting for this talk
I assume you’re either a researcher or someone interested in the
semantic web. And that you’re interested in the fun to be had with
python and plone in that regard.
If you’re a researcher, you might be a bit strapped for time. You’re
supposed to be doing ac tual research, not using Emacs typing in code.
Another issue might be the quality of your prototypes. Making it look
good takes a lot of effort, likewise error checking and so on. What you
can take with you out of this talk is perhaps some ideas on using plone
for your demos. Hey, maybe you’ll just make a UML model and have
your work done for you by the archgenxml code generator!
If you’re attracted to the semantic web, you might be either attracted or
shocked by the simple way in which I’m using it. At least, you might say
I advocate using plone in a more data-oriented instead of the common
Enough about you all, now about me :-) I’ve worked now for five years
as a researcher in construction information. Mostly information
exchange in building projects, a bit semantic web based. Now, the
construction industry has much the same problems as the software
industry. Almost the same picture was hanging in the coffee room at the
university and I was one of the very few programmers, so this was a
joke on the construction industry.
Likewise, the construction industry is often a shining light to which the
programmer must look. Ah, no, actually. How much effort goes into a
building’s specification? The drawing takes a lot of time, but you’ve got
36 hours for a good written documentation of the non-drawn things.
And actually following the instructions? Getting everything delivered on
time? Large infrastructural projects normally go way over budget. Partly
because the politicians want things done differently, but is that any
different from clients of the software industry? So you see.
Well, automatic couplings between the drawing and the cost estimation.
Would be logical, don’t you think? Ouch, effectively 95% of the
computer drawings are just lines. For the cost estimation program, it’s
almost the same as a scanned-in hand-made drawing. Those written
specifications? Text. Pure text. Perhaps a classification system.
Problem is that there are no large players. The top 5 companies in the
Netherlands combined have about 5% market share. All those small
little companies you find in every town… When boeing or airbus wants
their suppliers to use a certain system, it happens. If the top1
construction player says it, it doesn’t happen. Too costly, as #2 wants
something else and #3 also.
Semantic web - simply
“Semantic web” sounds very Artificial Intelligence like. Well, python
borrowed some things from Lisp, so that’s not so out of place at this
conference :-) I myself am treating the semantic web in a very simple
way: on a document or application level, make your data available as
XML or RDF, downloadable via http. Perhaps password protected if
needed. Now, how bad can that be?
In plone, it is often just a case of adding another page template. If you
look at the screen you see a part of my application. A template that
creates such a page takes some work with getting everything in the
right place with the right icons in front of it and so on. But exporting it as
a bit of XML takes 4 times less time. Just select the items, print the right
tag and slap the title into the tag. That’s it.
One week after doing that I got an email from a fellow researcher at a
different institute saying he’s made a java desktop client to view the
same data. WHAT?! Yes, that’s pretty easy to do, just download an xml
file and you’ve got all the data you need. As a researcher, this ease of
accessing your data might mean the difference between seeing your
research used by others or just lingering in your papers and on just
One thing that’s more core semantic web that I’d like to advertise here
is the use of ontologies. An ontology is basically a set of terms or
definitions that can be used by other applications. Defined in a
computer-readable way. Every single item in the ontology has a unique
ID, which is an URL. I’ll talk about that ID mechanism later, it’s really a
great mechanism to get your data out of an isolated position. Back to
the ontology. If application 1 says “this object of mine is an Overpass
according to that ontology” and application 2 says the same about an
object they’ve got stored, they’re compatible. If application is the design
app you’re working in and application 2 is a cost estimation app, you
can suddently get a good cost estimation for your overpass
The previous screen you saw was the plone collaborative ontology
editor I made. What you see here is the UML model of the application.
I’ll explain the reasoning behind it a bit. First things first: this was used
as the input for the ArchGenXML code generator. An older version, so
things would ‘ve been a bit cleaner if I did it again from scratch.
[FU/TS explanation] [draw a hamburger diagram on the board]
Now back to the time-strapped researcher’s neck-saving measure:
ArchGenXML generates a complete archetypes-based plone product
out of this. The entire directory structure, the config.py, the Install.py,
the .py files for the classes, empty templates ready to fill for extra
screens you defined on your classes. Containment/aggregation is
mapped to folder structures, with a possibility to restrict the allowed
types inside the folder. All automatically. You’re wasting time if you’re
doing it all by hand at the moment!
XML is much more well-known, but RDF is also quite handy. What you
see on the screen is pretty much the basic principle of RDF. You’ve got
sets of information (files) with classes within them, all identified by an
URL. Classes in other files can refer to any URL, just like in the normal
web: you can hyperlink to any file. With RDF every part of information
gets a URL, so every bit of information is suddenly referenceable.
The fun part with RDF is that those links have a URL too. So you can
have a different URL for “is father of” and for “goes to that conference”.
All of RDF eventually boils down to subject/verb/object, so “I” “am
presenting at” “europython”. Can’t be any simpler. You can make your
own vocabulary! And everybody can join in!
For this, I had to integrate the python rdflib (which is pretty nice) with
zope (which I love). But zope’s database has extensionclasses which
bite python2.2’s new style classes (which are used by rdflib). So I had
to manually mutulate the code. Not pretty. Not a pretty result. But… I
could load RDF files into zope. The new Zemantic project does the
same, but in a much cleaner way. So don’t ask for my product! Bug
Plone is good for a researcher, especially with archgenxml code
generation. You get a good user interface which is collaborative in
nature. With relatively low effort. On the screen you see a collaborative
ontology editor in the lower left corner, a PloneMall powered catalog
application in the top left and a project creation/administration tool on
the right. The lower left one kept me busy for three months but the
other two took me a week each from start to finish.
All three applications export or import http-provided data. Just serve or
download a file and your program gets much more interesting. Catalog
and “object tree” (a project management application) both use the
ontology and therefore can understand eachother. So… Make your
application data-friendly and export your stuff.
Reinout van Rees
This research was done at D elft University at Technology; since 1 april I'm
working at Zest softw are, where I'm trying to make medium-sized
organisations happy w ith a shiny plone website.
Couple of things not mentioned which might interest you, so if you want,
ask a question: python for d ata conversion, importance of project
automation (also when w riting 200 page dissertations), data openness.