My day job is that of internet services manager of a medium sized UK company which builds proprietary solutions involving web service and xml technologies; My primary interests initially revolved around XSLT and xml technologies; started up EXSLT along with Dave Pawson and Jeni Tennison which now enjoys widespread adoption/implementation, have done quite a bit of technical reviewing and co-authoring for the now defunct WROX, and as with many authors I have now moved over to their benefactors, e.g. now writing a tome for APRESS provisionally titled Ant Gems which is due to go to print march next year I am a typical application developer, having refreshed my skill set as the times have changed.
This talk will outline a journey which actually started with a desire to answer XSLT list member questions to a personal rediscovery of the genetic algorithm. Over the years of answering peoples questions on the XSLT list, I noticed that most of the questions were simple mapping transformations…e.g. I have this source xml and what to transform it into target xml. If the user knew the source and target xml, what automated methods could be brought to bear ? I wont be talking about REST, not because I don’t think it important or that it can’t be considered as the valid I see both REST and SOAP based web services existing quite happily together. xml technologies and some
Specifications are being finalized to handle complex messaging: orchestration, coordination, and routing
Due to the lack of critically adopted WS orchestration, composition, coordination standards, many are finding the MVC architecture approach a good match Instead of exposing a large variety of web services, expose one controller web service, which places the importance into the message body, simple interactions, complex messages. List MVC examples
SOAP response could be styled by XSLT This technique lies between typical Web Services and REST
I personally use Systinet WASP server, it takes care of everything a developer would want to not deal with….especially security. The folks who made WASP have a deep heritage with CORBA Instantly solve some problems…stateful web services Anchor is usually used in negative connotations, e.g. boat anchor antipatterns. With many specifications still being worked out, having an anchor in a storm is useful
By integrating Amazon Web Services, Amazon.com Research Services for Microsoft Office System will provide Microsoft Office System users with convenient and seamless access to Amazon.com from within Microsoft productivity applications via the Research Task Pane. Users will be able to access Amazon.com information and make purchases without launching a browser or leaving their document, e-mail message or presentation. For example, a customer reading a bibliography in a Word document could easily click on a book title and purchase it from within the Research Task Pane without having to leave the Word document. Alternatively, a user will be able to add a footnote, bibliography entry and even cover art for books without needing to manually enter the information into a document. The Research Task Pane, a feature in the Microsoft Office 2003 Edition desktop applications (Word, Excel, the Outlook messaging and collaboration client, the PowerPoint presentation graphics program and Access) and in Microsoft Office System products OneNote (TM) note-taking program, Publisher and Visio drawing and diagramming software, uses industry-standard XML to enable users to retrieve and navigate relevant internal or external Web-based information, all from within Office programs. &quot;Amazon.com is breaking new ground in its use of XML-enabled Web Services that connect data from disparate systems, allow greater access to content, and create a more valuable experience for Web users,&quot; said Gytis Barzdukas, director of Office Product Management at Microsoft. &quot;By using the advancements of the Microsoft Office System, Microsoft and Amazon.com are transforming the desktop into a dynamic interface for Office customers everywhere.&quot; &quot;We are excited to help make this new service available to our customers,&quot; said Jeff Barr, Web services technical program manager at Amazon.com. &quot;This Microsoft Office System solution adds significant convenience for our Microsoft users and Amazon.com customers in finding and discovering products. We look forward to receiving feedback from users and to adding more features in the future.&quot;
Google (file:wsdl, file:wsil ) Look for inspection.wsil Refer to xmethods or well known UDDI registries The importance of a human understandable description of a web service should not be underestimated, What if the human description is in a different language ? Is the interface enough for automatic composition methods ?
With unlimited processing power and network bandwidth random search is fine. Intelligent software agents must have knowledge of the problem domain, either gained via learning ( neural network ) or through experts embedding knowledge As you will find out, GA does not need any specialist knowledge to solve a problem, and is quicker then random/linear search of large problem domains
This approach is not specific to any problem domain…can be applied to anything different partial effective gene combinations or “schemata” are searched in parallel manner Analogies are good in computing, but can be dangerous and can cloud over some of the more subtle aspects. It may so happen, what happens actually in nature is completely irrevelent, it just so happens that for some groups of problems this technique is potentially useful. Just because an analogy ‘feels right’ does not mean it explains ‘how’ something works…analogies are good for illustration purposes, not for explanation purposes.
where M(H, t) number of strings in population 't' with the schema 'H'. f(H) average fitness of the strings with the schema 'H'. F average fitness of the entire population. p1 probability of the schema being destroyed by crossover. p2 probability of the schema being destroyed by mutation. There are many variations
There are primary and secondary operations in the genetic operation
fitness is usually encompasses domain specific factors Primary operation: reproduction / recombination Secondary operation: mutation / editing / encapsulation
I was reviewing a book by WROX, called Beginning Databases….and since I was xml through and through I was forced to re-examine the differences between hierarchical data models with relational, etc… . Somehow this investigation led me to S BOX structures in LISP…..which re-introduced me to the genetic algorithm…and the idea of partial schemata being used to solve problems. The xslt guru David Carlisle probably didn’t know it, but him and his lot at XSLT UK caused me to investigate the fp approach using XSLT
LISP Symbolic expressions contain lists or atoms Use polish notation LISP is good at Programs and data have the same form A lisp program is its own parse tree EVAL function for lisp easy way to chain execution LISP facilitates the programming of hierarchical structures LISP is not a special GA language, in my opinion working with hierarchical computer programs is more expressive
Most programming languages internally convert to a parse tree, xml and especially xslt is akin to LISP in that we have direct access to the ‘tree’. Since XSLT is xml, we can easily manipulate computer programs as if it were data, this is important in the genetic operations. Since XSLT is the language for transforming XML, we could use it to transform XSLT programs. In practice there is a performance hit to this approach. In any event, this talk focuses on the strategy, and not the precise implementation method.
There will be reasons why I use ANT revealed later on, for one this was a natural choice as this talk is the final chapter in the previously mentioned book. Ant is a natural for dealing with lots of files, as we will be generating lots of xslt populations, applying transformations and various processes on them….was a no-brainer I have been using SAXON from the beginning, it is the only XSLT processor that implements XSLT 2.0.
This type of equivalency problem was chosen to make the prototype’s output easy to validate In addition, looking for logical equivalency, not worried about whitespace at the moment
* Comes from processing specific xslt individual with source.xml
500 xslt documents Going to generate 51 generations
Can supply with parameters to define nodedepth, repeats, supply a random seed, weight odds for certain elements or attributes to be generated. Uses a DTD to define allowable elements. As you can see the example template really does nothing useful, it is typical that starting populations consistently have a low fitness for its ultimate purpose
I wanted to reduce complexity in my early experiments so I avoided what I call early taxonomisation .
We indirectly measure the fitness of an XSLT program by checking its output with a desired target xml. Transformation to each xslt individual in the population Best Fitness for our purposes is defined as an exact match between result and target xml. Fitness does not have to be the result of a single metric, we could have multiple tests for a fitness of an individual Source and target xml were supplied as part of the problem formulation
Note that we have added a <root/> element, this is to ensure that XSLT that returned nothing, at least returned a valid xml document with one root node. There were situations where logically the fitness metric was not sufficient for certain special cases, in actuality having a number of source and target xml solved this issue.
IBM’s is based on some novel thinking, though I have not used it ( commercial ) Microsoft’s is fine and fast
Can choose the same individual for multiple operations, any number of times better fitness individuals have larger slice of the pie, so they will be selected more There can be some additional fitness penalties, for example in generation 0 many xslt files maybe invalid and not process at all.
Raw fitness is a metric in terms of the problem, for example if you are trying to optimise some business process that sells products. The number of products sold could be the fitness ranking ( more the better ). Fitness could be calculated over a series of values and event outcomes, e.g. we could have multiple source and target xmls and the overall ranking of an individual would be its ranking
From the selected population an individual is selected to be perfectly reproduced into the new generation
Normally creates 2 offspring, though in nature this is not the case.
Secondary operations tend to speed up convergence towards a solution, though if used too much will restrict convergence to ever occur.
Pick a point and randomly mutate Asexual In xslt this must run XML generator again to obtain nodeset to augment. a form of crossover
A random node is selected and its arguments are reorganized. Since ordering in xml is rarely important this operation has been omitted from our process Asexual
If any function has no side effects, and is not context dependent, has only constant atoms as arguments the editing operation will evaluate that function and replace it with a value. <xsl:if test=“true()”> <a></a> </xsl:if> <xsl:value select=“count(//a)”/> should always return the same amount if the source xml remains the same, so editing would resolve this and replace the xsl instruction with a 1.
Identify useful subtrees by searching high fitness individuals for common subtrees. The effect of encapsulation is that the selected subtree is no longer subject to the potentially disruptive effects of crossover.
Variety in a population drops quickly after generation 0, because GA focuses on marginally better fitness. To improve genetic diversity apply decimation, a set of rules which removes very poor fitness individuals. The example shows a 1 node XSLT, which is indeed very poor for solving our problem. An empty stylesheet is no use to us.
There are situations where convergence around a single version never occurs
Compiling xslt templates
Its hard to apply genetic operations to languages that do not have any discreteness, like xml has with angle brackets demarcating each instruction. This is why s-boxes and the functional approach was the AI choice, because it was easy to
* Comes from processing specific xslt individual with source.xml
Harvesting program is found at www.semantic-web.co.uk wsil solved UDDI/WSDL umbrella
SOAP 1.1 would have these HTTP headers: Content-Type: text/xml SOAPAction: &quot;http://example.com/ticker“ SOAP 1.2 message would have the following: Content-Type: application/soap+_xml; action=http://example.com/ticker Moving all of the metadata into the one place where it should be is also a good thing.
<inspection/> top level element defines namespaces used <service/> contains a service referencedNamespace, location of wsdl, UDDI specific stuff <description/> and <link/> may contain other elements, known as extensibility elements
Shows how we can use with both UDDI and WSDL Link element imports more wsil service definitions 2 conventions of usage; place inspection.wsil in root web directory of web server or under current dir of the webservice itself with the root level wsil containing links to these encapsulated wsil docs. avoid the 2nd convention of using a meta tag and use a RDDL doc to describe <HEAD> <META name=&quot;serviceInspection&quot; content=&quot;localservices.wsil&quot;> <META name=&quot;serviceInspection&quot; content=&quot;http://www.example.com/calculators.wsil&quot;> <META name=&quot;serviceInspection&quot; content=&quot;ftp://www.anotherexample.com/translators.wsil&quot;> </HEAD> xml schema exists
Notice extension mechanism Very easy to extend, any description or link element can have extension element
500 xslt documents Going to generate 51 generations
higher order orchestration standards are striving to become established supporting standards for SOA should stabilize by Q2 2004, with heavy commercial uptake for Q4 2004 XML, XSLT, and XPATH are successful XML schema, RELAX NG and DTD primary forms of schema languages UDDI is struggling to make an impact with developers There are some key differences though between SOA and CORBA/DCOM/RMI that developers and architects are getting confused with. We are possibly occupying that no mans land between white box reuse and true black box components
Does a car build itself based on a set of criteria ? Do we expect it ? Nano technology ……. Allowing problem domain experts to formulate problems assists in direct requirements capture Will a functional approach be the true path to black box reuse ? In a world of unlimited processing, who cares if a computer program is elegantly constructed ? In a world of unlimited bandwidth who cares if we use XML as the preferred over the wire format ? Successful programmatic methods are useful because they assist in modeling the problem. If that model is then used to generate a million line program…..focus on model-led development
Proof of Concept: SOA Application Composition using the Genetic Algorithm Jim Fuller http://www.ruminate.co.uk http://www.slgchorus.com
Indirectly consume web services via WSDL / UDDI subsequent generation of stub code
Direct Consumption of SOAP via manual crafting of HTTP Request headers + SOAP envelope
Primary use cases: Integration and Interoperability
Emerging use cases: orchestration, higher level business processes, and automated application composition
MVC type architectures are popular Client Tier Presentation Tier Business Tier Integration Tier Resource Tier Data Repository, XML Binding, Persistence Model View Controller External web services Internal web services
WS MVC with the Browser Controller EventHandler SOAPEventHandler Model The Model receives events from the Controller and updates itself sending Data which gets transformed by our view components. View -IE web service client side processing -XSLT templates -CSS -Global.xml -Global.xsl HTTP GET HTTP POST REQUEST Internal web services External web services HTTP RESPONSE Internet Explorer Client
Step 0 . Randomly generate initial population of xslt documents
Step 1 . evaluate fitness using via xml diff of target.xml to result.xml
Step 2 . select individuals according to their fitness which can be used by step 3
Step 3 . Apply primary and secondary genetic operations to generate new offspring population from selected individuals
Step 4 . Repeat steps 1,2,3, to generate X number of generations
Step 5 . choose best fit individual of last generation
M=500, G=51 Parameters Same as raw fitness, approaching 0 is better fitness Standardized fitness One fitness case Fitness Cases Node count on xmldiff patch file difference between result xml and target xml Raw fitness Subset of xslt instructions Function Set <a/> <b/> <c/> <d/> Terminal Set Generate an xslt program that transforms source xml into result xml which is equivalent to target xml Objective
Mutation seeding ws:invoke statement vastly speeded up process
New timeout factors necessary
GA process significantly slowed down due to inclusion of web services
GA process was more effective with better fitness evaluation; e.g. ranking fitness consisted of 3 source and targets
M=1000, G=51 Parameters three fitness cases Fitness Cases Node count on xmldiff patch file difference between result xml and target xml Raw fitness Subset of xslt instructions + ws:invoke Function Set <a/>, <b/> ( 2 numbers ) Terminal Set Generate an xslt program that multiplies 2 numbers, converts to Celsius and returns number in Chinese Objective