Your SlideShare is downloading. ×
Implementing the Genetic Algorithm in XSLT: PoC
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Implementing the Genetic Algorithm in XSLT: PoC


Published on

Implementing the Genetic Algorithm in XSLT from 2002

Implementing the Genetic Algorithm in XSLT from 2002

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • My day job is that of internet services manager of a medium sized UK company which builds proprietary solutions involving web service and xml technologies; My primary interests initially revolved around XSLT and xml technologies; started up EXSLT along with Dave Pawson and Jeni Tennison which now enjoys widespread adoption/implementation, have done quite a bit of technical reviewing and co-authoring for the now defunct WROX, and as with many authors I have now moved over to their benefactors, e.g. now writing a tome for APRESS provisionally titled Ant Gems which is due to go to print march next year I am a typical application developer, having refreshed my skill set as the times have changed.
  • This talk will outline a journey which actually started with a desire to answer XSLT list member questions to a personal rediscovery of the genetic algorithm. Over the years of answering peoples questions on the XSLT list, I noticed that most of the questions were simple mapping transformations…e.g. I have this source xml and what to transform it into target xml. If the user knew the source and target xml, what automated methods could be brought to bear ? I wont be talking about REST, not because I don’t think it important or that it can’t be considered as the valid I see both REST and SOAP based web services existing quite happily together. xml technologies and some
  • Specifications are being finalized to handle complex messaging: orchestration, coordination, and routing
  • Due to the lack of critically adopted WS orchestration, composition, coordination standards, many are finding the MVC architecture approach a good match Instead of exposing a large variety of web services, expose one controller web service, which places the importance into the message body, simple interactions, complex messages. List MVC examples
  • SOAP response could be styled by XSLT This technique lies between typical Web Services and REST
  • I personally use Systinet WASP server, it takes care of everything a developer would want to not deal with….especially security. The folks who made WASP have a deep heritage with CORBA Instantly solve some problems…stateful web services Anchor is usually used in negative connotations, e.g. boat anchor antipatterns. With many specifications still being worked out, having an anchor in a storm is useful
  • By integrating Amazon Web Services, Research Services for Microsoft Office System will provide Microsoft Office System users with convenient and seamless access to from within Microsoft productivity applications via the Research Task Pane. Users will be able to access information and make purchases without launching a browser or leaving their document, e-mail message or presentation. For example, a customer reading a bibliography in a Word document could easily click on a book title and purchase it from within the Research Task Pane without having to leave the Word document. Alternatively, a user will be able to add a footnote, bibliography entry and even cover art for books without needing to manually enter the information into a document. The Research Task Pane, a feature in the Microsoft Office 2003 Edition desktop applications (Word, Excel, the Outlook messaging and collaboration client, the PowerPoint presentation graphics program and Access) and in Microsoft Office System products OneNote (TM) note-taking program, Publisher and Visio drawing and diagramming software, uses industry-standard XML to enable users to retrieve and navigate relevant internal or external Web-based information, all from within Office programs. " is breaking new ground in its use of XML-enabled Web Services that connect data from disparate systems, allow greater access to content, and create a more valuable experience for Web users," said Gytis Barzdukas, director of Office Product Management at Microsoft. "By using the advancements of the Microsoft Office System, Microsoft and are transforming the desktop into a dynamic interface for Office customers everywhere." "We are excited to help make this new service available to our customers," said Jeff Barr, Web services technical program manager at "This Microsoft Office System solution adds significant convenience for our Microsoft users and customers in finding and discovering products. We look forward to receiving feedback from users and to adding more features in the future."
  • Google (file:wsdl, file:wsil ) Look for inspection.wsil Refer to xmethods or well known UDDI registries The importance of a human understandable description of a web service should not be underestimated, What if the human description is in a different language ? Is the interface enough for automatic composition methods ?
  • With unlimited processing power and network bandwidth random search is fine. Intelligent software agents must have knowledge of the problem domain, either gained via learning ( neural network ) or through experts embedding knowledge As you will find out, GA does not need any specialist knowledge to solve a problem, and is quicker then random/linear search of large problem domains
  • This approach is not specific to any problem domain…can be applied to anything different partial effective gene combinations or “schemata” are searched in parallel manner Analogies are good in computing, but can be dangerous and can cloud over some of the more subtle aspects. It may so happen, what happens actually in nature is completely irrevelent, it just so happens that for some groups of problems this technique is potentially useful. Just because an analogy ‘feels right’ does not mean it explains ‘how’ something works…analogies are good for illustration purposes, not for explanation purposes.
  • where M(H, t) number of strings in population 't' with the schema 'H'. f(H) average fitness of the strings with the schema 'H'. F average fitness of the entire population. p1 probability of the schema being destroyed by crossover. p2 probability of the schema being destroyed by mutation. There are many variations
  • There are primary and secondary operations in the genetic operation
  • fitness is usually encompasses domain specific factors Primary operation: reproduction / recombination Secondary operation: mutation / editing / encapsulation
  • I was reviewing a book by WROX, called Beginning Databases….and since I was xml through and through I was forced to re-examine the differences between hierarchical data models with relational, etc… . Somehow this investigation led me to S BOX structures in LISP…..which re-introduced me to the genetic algorithm…and the idea of partial schemata being used to solve problems. The xslt guru David Carlisle probably didn’t know it, but him and his lot at XSLT UK caused me to investigate the fp approach using XSLT
  • LISP Symbolic expressions contain lists or atoms Use polish notation LISP is good at Programs and data have the same form A lisp program is its own parse tree EVAL function for lisp easy way to chain execution LISP facilitates the programming of hierarchical structures LISP is not a special GA language, in my opinion working with hierarchical computer programs is more expressive
  • Most programming languages internally convert to a parse tree, xml and especially xslt is akin to LISP in that we have direct access to the ‘tree’. Since XSLT is xml, we can easily manipulate computer programs as if it were data, this is important in the genetic operations. Since XSLT is the language for transforming XML, we could use it to transform XSLT programs. In practice there is a performance hit to this approach. In any event, this talk focuses on the strategy, and not the precise implementation method.
  • There will be reasons why I use ANT revealed later on, for one this was a natural choice as this talk is the final chapter in the previously mentioned book. Ant is a natural for dealing with lots of files, as we will be generating lots of xslt populations, applying transformations and various processes on them….was a no-brainer I have been using SAXON from the beginning, it is the only XSLT processor that implements XSLT 2.0.
  • This type of equivalency problem was chosen to make the prototype’s output easy to validate In addition, looking for logical equivalency, not worried about whitespace at the moment
  • * Comes from processing specific xslt individual with source.xml
  • 500 xslt documents Going to generate 51 generations
  • Can supply with parameters to define nodedepth, repeats, supply a random seed, weight odds for certain elements or attributes to be generated. Uses a DTD to define allowable elements. As you can see the example template really does nothing useful, it is typical that starting populations consistently have a low fitness for its ultimate purpose
  • I wanted to reduce complexity in my early experiments so I avoided what I call early taxonomisation .
  • We indirectly measure the fitness of an XSLT program by checking its output with a desired target xml. Transformation to each xslt individual in the population Best Fitness for our purposes is defined as an exact match between result and target xml. Fitness does not have to be the result of a single metric, we could have multiple tests for a fitness of an individual Source and target xml were supplied as part of the problem formulation
  • Note that we have added a <root/> element, this is to ensure that XSLT that returned nothing, at least returned a valid xml document with one root node. There were situations where logically the fitness metric was not sufficient for certain special cases, in actuality having a number of source and target xml solved this issue.
  • IBM’s is based on some novel thinking, though I have not used it ( commercial ) Microsoft’s is fine and fast
  • Can choose the same individual for multiple operations, any number of times better fitness individuals have larger slice of the pie, so they will be selected more There can be some additional fitness penalties, for example in generation 0 many xslt files maybe invalid and not process at all.
  • Raw fitness is a metric in terms of the problem, for example if you are trying to optimise some business process that sells products. The number of products sold could be the fitness ranking ( more the better ). Fitness could be calculated over a series of values and event outcomes, e.g. we could have multiple source and target xmls and the overall ranking of an individual would be its ranking
  • From the selected population an individual is selected to be perfectly reproduced into the new generation
  • Normally creates 2 offspring, though in nature this is not the case.
  • Secondary operations tend to speed up convergence towards a solution, though if used too much will restrict convergence to ever occur.
  • Pick a point and randomly mutate Asexual In xslt this must run XML generator again to obtain nodeset to augment. a form of crossover
  • A random node is selected and its arguments are reorganized. Since ordering in xml is rarely important this operation has been omitted from our process Asexual
  • If any function has no side effects, and is not context dependent, has only constant atoms as arguments the editing operation will evaluate that function and replace it with a value. <xsl:if test=“true()”> <a></a> </xsl:if> <xsl:value select=“count(//a)”/> should always return the same amount if the source xml remains the same, so editing would resolve this and replace the xsl instruction with a 1.
  • Identify useful subtrees by searching high fitness individuals for common subtrees. The effect of encapsulation is that the selected subtree is no longer subject to the potentially disruptive effects of crossover.
  • Variety in a population drops quickly after generation 0, because GA focuses on marginally better fitness. To improve genetic diversity apply decimation, a set of rules which removes very poor fitness individuals. The example shows a 1 node XSLT, which is indeed very poor for solving our problem. An empty stylesheet is no use to us.
  • There are situations where convergence around a single version never occurs
  • Compiling xslt templates
  • Its hard to apply genetic operations to languages that do not have any discreteness, like xml has with angle brackets demarcating each instruction. This is why s-boxes and the functional approach was the AI choice, because it was easy to
  • * Comes from processing specific xslt individual with source.xml
  • Harvesting program is found at wsil solved UDDI/WSDL umbrella
  • SOAP 1.1 would have these HTTP headers: Content-Type: text/xml SOAPAction: "“ SOAP 1.2 message would have the following: Content-Type: application/soap+_xml; action= Moving all of the metadata into the one place where it should be is also a good thing.
  • <inspection/> top level element defines namespaces used <service/> contains a service referencedNamespace, location of wsdl, UDDI specific stuff <description/> and <link/> may contain other elements, known as extensibility elements
  • Shows how we can use with both UDDI and WSDL Link element imports more wsil service definitions 2 conventions of usage; place inspection.wsil in root web directory of web server or under current dir of the webservice itself with the root level wsil containing links to these encapsulated wsil docs. avoid the 2nd convention of using a meta tag and use a RDDL doc to describe <HEAD> <META name="serviceInspection" content="localservices.wsil"> <META name="serviceInspection" content=""> <META name="serviceInspection" content=""> </HEAD> xml schema exists
  • Notice extension mechanism Very easy to extend, any description or link element can have extension element
  • 500 xslt documents Going to generate 51 generations
  • higher order orchestration standards are striving to become established supporting standards for SOA should stabilize by Q2 2004, with heavy commercial uptake for Q4 2004 XML, XSLT, and XPATH are successful XML schema, RELAX NG and DTD primary forms of schema languages UDDI is struggling to make an impact with developers There are some key differences though between SOA and CORBA/DCOM/RMI that developers and architects are getting confused with. We are possibly occupying that no mans land between white box reuse and true black box components
  • Does a car build itself based on a set of criteria ? Do we expect it ? Nano technology ……. Allowing problem domain experts to formulate problems assists in direct requirements capture Will a functional approach be the true path to black box reuse ? In a world of unlimited processing, who cares if a computer program is elegantly constructed ? In a world of unlimited bandwidth who cares if we use XML as the preferred over the wire format ? Successful programmatic methods are useful because they assist in modeling the problem. If that model is then used to generate a million line program…..focus on model-led development
  • Transcript

    • 1. Proof of Concept: SOA Application Composition using the Genetic Algorithm Jim Fuller
    • 2. Introduction
      • Technical Director / Internet Services Manager for Stuart Lawrence Group companies
      • on-IDLE ltd sponsored 1 st XSLT conference in the world: XSLT UK 2001 along with Dave Pawson
      • co-founder of the EXSLT effort, along with Dave Pawson, Jeni Tennison, Uche Obigu, et al.
      • Technical reviewer and author for now defunct WROX, on books dealing with XML, XSLT and web services
    • 3. Lecture Overview
      • How we use WS today
      • XSLT and S-expressions
      • Genetic Algorithm refresher
      • Early Genetic Experiments with XSLT
      • Application composition using Genetic Algorithm
      • Conclusions
    • 4. How we use WS in today's applications
      • Indirectly consume web services via WSDL / UDDI subsequent generation of stub code
      • Direct Consumption of SOAP via manual crafting of HTTP Request headers + SOAP envelope
      • Primary use cases: Integration and Interoperability
      • Emerging use cases: orchestration, higher level business processes, and automated application composition
    • 5. MVC type architectures are popular Client Tier Presentation Tier Business Tier Integration Tier Resource Tier Data Repository, XML Binding, Persistence Model View Controller External web services Internal web services
    • 6. WS MVC with the Browser Controller EventHandler SOAPEventHandler Model The Model receives events from the Controller and updates itself sending Data which gets transformed by our view components. View -IE web service client side processing -XSLT templates -CSS -Global.xml -Global.xsl HTTP GET HTTP POST REQUEST Internal web services External web services HTTP RESPONSE Internet Explorer Client
    • 7. SOA Anchor
      • Stability via web service server : BEA Weblogic, IBM Websphere, Systinet WASP, .NET, ColdFusionMX
      • versioning control of web services
      • Easy to deploy same web service through multiple transports
      • Smooth out learning curve for many of the underlying XML technologies ( SAML )
      • security integration with underlying PKI
      • Instant solution to some problems
      • Deploy existing code as web service, no need for ‘special’ web service code embedded in your own code
    • 8. Bazaar not opened yet
      • Currently developers ask how can *I* use them in *my* applications.
      • Web services live behind the firewall and solve integration problems; extraprise.
      • Google, Amazon and Microsoft are all examples of monolithic web services.
      • Many deployed web services are highly specific to a certain problem domain.
      • Who will bind a specific public web service with their precious application ? (Amazon in research pane).
    • 9. The world of ‘millions of web services’
      • The question is not ‘how will a developer find a web service?’ but how will a machine find and use the right web service ?
      • How will the developer/machine know it’s the right one ? That its stable, correct version, and it can be trusted…
      • The promise of SOA is real time application composition generating applications or components, based on a set of general evolving criteria
    • 10. Automatic application composition methods
      • One approach, not linked to any problem domain is to use the Genetic Algorithm…though there are obvious constraints using these methods
      Random search of the problem domain AI / intelligent Software agent methods
    • 11. Genetic Algorithm Refresher
      • The Genetic Algorithm ( GA ) is a model of the evolution of a population of artificial individuals.
      • Each individual is a chromosome which contains discrete units of information; in computers this can be a string, binary numbers, etc… .
      • With each generation the best fitness individuals are selected for genetic operations to create new generation
      • The driving force behind the search for new and better solutions is the retention and combination of good partial solutions to a problem
    • 12. Abridged Genetic Algorithm
      • The Fundamental Theorem of Genetic Algorithms
      • M(H, t) :# of individuals in population 't' with the schema 'H'.
      • f(H) : average fitness of the individuals with the schema 'H'.
      • F : average fitness of the entire population.
      • p1 :probability of the schema being destroyed by crossover.
      • p2 :probability of the schema being destroyed by mutation.
    • 13. GA operations
      • Reproduction : An individual is perfectly replicated to a new population
      • Crossover ( Recombination ) : Parental material is recombined to create offspring to join new population
      • Mutation : random changes
      • Permutation : reordering
      • Editing : evaluation to a terminal
      • Encapsulation : single indivisible function
      • Decimation : removal of individuals
    • 14. Genetic Programming Process
      • Step 0 . Create a random initial population of individuals
      • Step 1 . Evaluate the fitness of each individual
      • Step 2 . Select individuals according to their fitness, which will participate in generating offspring (moms+dads)
      • Step 3 . Apply primary and secondary genetic operations to generate new offspring population
      • Step 4 . Repeat the steps 1,2,3, to generate X number of generations
      • Step 5 . choose best fit individual
    • 15. Symbolic expressions and XSLT
      • XSLT List questions….I originally wanted to solve ‘I want to transform source xml to target xml using XSLT’. Could use generic templates or some other automated process.
      • Vestigial lisp memories of s expressions are similar to xslt / xml: data and programming in one
      • XSLT guru David Carlisle presence at XSLT UK 2001 opened my eyes to functional programming
      • My work with EXSLT defined the limitations of XSLT…which led me to build frameworks to implement complex MVC architectures
    • 16.
      • (+(* 2 3) 4) evaluates to 10 and symbolic expression looks like;
      Simplest Lisp Example 3 4 + * 2 Hierarchical computer programs are more expressive then manipulating linear strings
    • 17. XSLT are also general hierarchical computer programs
      • <xsl:stylesheet xmlns:xsl=&quot;; version=“2.0&quot;>
      • <xsl:template match=&quot;a&quot;>
      • <d/>
      • <c/>
      • </xsl:template>
      • </xsl:stylesheet>
      <d/> <c/> <xsl:template/> <xsl:stylesheet/> There are some differences, e.g. there are a variety of node types within XML
    • 18. Problem definition
      • Create a GA process that will discover an XSLT program which taken a source.xml generates a target.xml
      • Prototype uses ASF ANT to control the whole process
      • Michael Kay’s excellent SAXON xslt processor, XSLT 2.0 simplified situation by removal of dealing with RTF’s and node-set usage
      • Initially create a simple problem, e.g. that of transforming a source xml into a copy of itself
    • 19. Source XML
      • <a>
      • <b>
      • <c>
      • <d></d>
      • </c>
      • </b>
      • </a>
    • 20. Target XML
      • <a>
      • <b>
      • <c>
      • <d></d>
      • </c>
      • </b>
      • </a>
    • 21. Early Genetic Experiment
      • Step 0 . Randomly generate initial population of xslt documents
      • Step 1 . evaluate fitness using via xml diff of target.xml to result.xml
      • Step 2 . select individuals according to their fitness which can be used by step 3
      • Step 3 . Apply primary and secondary genetic operations to generate new offspring population from selected individuals
      • Step 4 . Repeat steps 1,2,3, to generate X number of generations
      • Step 5 . choose best fit individual of last generation
    • 22. M=500, G=51 Parameters Same as raw fitness, approaching 0 is better fitness Standardized fitness One fitness case Fitness Cases Node count on xmldiff patch file difference between result xml and target xml Raw fitness Subset of xslt instructions Function Set <a/> <b/> <c/> <d/> Terminal Set Generate an xslt program that transforms source xml into result xml which is equivalent to target xml Objective
    • 23. Step 0. Generate Initial Population
      • Used IBM xml generator: to generate a population of xslt documents.
      • <?xml version='1.0'?>
      • <!-- Created by IBM XML Generator
      • numberLevels=10, maxRepeats=3, Random seed=1060890913224
      • fixedOdds=1, impliedOdds=4, defaultOdds=4
      • maxIdRefs=3, maxEntities=3, maxNMTokens=3
      • isExplicitRoot=true, root element name is 'xsl:stylesheet'
      • entOdds=1 Entity list:[]
      • doctype declaration?false
      • -->
      • <xsl:stylesheet xmlns:xsl=&quot;; version=&quot;1.0&quot;>
      • <xsl:template match=&quot;a&quot;>
      • <c/>
      • <d/>
      • </xsl:template>
      • </xsl:stylesheet>
    • 24. Avoid ‘early taxonomisation’
      • No attributes
      • No namespaces
      • No schemas
      • Xmlgenerator DTD defines allowable terminals and functions e.g. xsl:apply-templates, xsl:for-each, xsl:value-of, xsl:copy-of, xsl:choose, xsl:if, xsl:copy.
      • used <a>, <b>, <c>, <d> as the only allowable elements
    • 25. Ant: generate_initial_population
      • <target name=“generate_initial_population&quot;>
      • <tempfile property=&quot;temp.file&quot; prefix=&quot;xslt_&quot; suffix=&quot;.xsl&quot; destdir=&quot;${dirs.src}&quot;/>
      • <!-- defines start.TODAY, start.DSTAMP, start.TSTAMP properties //-->
      • <tstamp prefix=&quot;start&quot;/>
      • <!-- current population number //-->
      • <property name=&quot;xslt.build_number&quot; value=&quot;${gen_count}&quot;/>
      • <!-- apply transforms using xslt //-->
      • <java classname=&quot;;
      • fork=&quot;true&quot;
      • failonerror=&quot;false&quot;
      • output=&quot;${temp.file}&quot;>
      • <arg value=&quot;${xslt.initial_dtd}&quot;/>
      • <arg value=&quot;-root&quot;/>
      • <arg value=&quot;${xslt.root_node}&quot;/>
      • <arg value=&quot;-nodecl&quot;/>
      • <arg value=&quot;-f&quot;/>
      • <arg value=&quot;1&quot;/>
      • <arg value=&quot;-l&quot;/>
      • <arg value=&quot;10&quot;/>
      • </java>
      • </target>
    • 26. Step 1: Evaluate Fitness XSLT generation xslt Source.xml result.xml Target.xml evaluate fitness transformation xml diff Each individual is ranked, by testing xslt program against a source xml
    • 27. Step 1. evaluate fitness (cont)
      • Could have chosen multiple source and target xml to use in fitness assessment
      • Output of transformation (result.xml) is xmldiff’ed with target xml
      • I used an extremely simple xml diff tool that just output xml patch
      • Converted Diff patch file into a number, which is the number of nodes contained in the patch file
    • 28. RESULT XML from XSLT individual transformation with SOURCE XML TREEDIFFMERGE DIFFERENCE PATCH <?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?><root><a> <b> <c> <d/> </c> </b> </a></root> <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <diff /> <?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?><root> <a/><a><a><c/><c><a><d/></a><c/></c></a><b><b/><a/><c/><b> <c> <d/> </c> </b></b><a/></a><d><a><c/><a/><a/></a><c/></d><c/> </root> <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <diff xmlns:diff=''> <diff:copy src=&quot;2&quot; dst=&quot;1&quot;> <diff:copy src=&quot;16&quot; dst=&quot;2&quot; /> </diff:copy> </diff> <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <root/> <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <diff xmlns:diff=''> <diff:insert dst=&quot;1&quot;> <a> <b> <c> <d /> </c> </b> </a> </diff:insert> </diff>
    • 29. XML Diff issues
      • Most diff algorithms are based on a paper published in 1976 by J. W. Hunt and M. D. McIlroy, An Algorithm for Differential File Comparison
      • XML is not just text, it has a structure , text based diff programs do not take this into accordance
      • simple example: <footie/> versus <footie></footie>logically these are equal
    • 30. Ant: transform_src
      • <target name=&quot;transform_src&quot;>
      • <java classname=&quot;net.sf.saxon.Transform&quot;
      • fork=&quot;true&quot;
      • failonerror=&quot;false&quot;
      • output=&quot;${current_xslt_file}.xml&quot;>
      • <arg value=&quot;${source_xml}&quot;/>
      • <arg value=&quot;${current_xslt_file}&quot;/>
      • </java>
      • </target>
    • 31. Ant: fitness_src
      • <target name=&quot;fitness_src&quot;>
      • <java classname=&quot;TreeDiffMerge&quot;
      • fork=&quot;true&quot;
      • failonerror=&quot;false&quot;
      • output=&quot;${current_xslt_file}.fitness.xml&quot;>
      • <arg value=&quot;-d&quot;/>
      • <arg value=&quot;${current_xslt_file}&quot;/>
      • <arg value=&quot;${target_xml}&quot;/>
      • </java>
      • </target>
    • 32. Step 2. Select individuals
      • Probabilistic selection to choose which individuals participate in genetic operation
      Selected XSLT population Select individuals for genetic operations, based on their fitness
    • 33. A word on fitness
      • Raw fitness : is the natural representation in terms of the specific problem
      • Standardized fitness : lower the better
      • Adjusted fitness : lies between 0-1
      • Normalized fitness : lies between 0-1 with sum of fitness values = 1
      • In our case the lower the number of ‘different’ nodes the better, use standardized fitness
    • 34. Step 3. Primary Genetic Operations Selected XSLT population New generation Reproduction Individual reproduced into new generation
    • 35. Step 3. Primary Genetic Operations Selected XSLT population New generation Creates 2 offspring ‘ Mom’ ‘ Dad’ Crossover ( Recombination ) Select parents then crossover creates 2 offspring
    • 36. Step 3. Primary Genetic Operations Crossover ( Recombination ) ‘ Dad XSLT’ ‘ Mom XSLT’ ‘ offspring xslt’ ‘ offspring xslt’ New generation Swap nodes between selected parent xslt
    • 37. Step 3. Secondary Genetic Operations
      • Mutation : is a form of random crossover
      • Permutation : Reorganize nodes
      • Editing : evaluate a set of nodes
      • Encapsulation : takes a branch and replaces with 1 indivisible node
      • Decimation : removes individual based on domain specific criteria
    • 38. Step 3. Secondary Genetic Operations mutation ‘ selected XSLT’ Pick a node and randomly mutate Completely new set of instructions ‘ offspring xslt’
    • 39. Step 3. Secondary Genetic Operations permutation ‘ selected XSLT’ ‘ offspring xslt’ Permutated node order
    • 40. Step 3. Secondary Genetic Operations editing ‘ selected XSLT’ ‘ offspring xslt’ Replace node with evaluated expression
    • 41. Step 3. Secondary Genetic Operations encapsulation ‘ selected XSLT’ ‘ define new function’ Identify useful subtrees and encapsulate by defining new function ‘ XSLT’
    • 42. Step 3. Secondary Genetic Operations decimation Identify very poor fitness individuals and remove from population <xsl:stylesheet xmlns:xsl=&quot;; version=&quot;1.0&quot;> </xsl:stylesheet> <xsl:stylesheet/>
    • 43. Ant: select, perform, and generate new population
      • <target name=&quot; select_crossover_population &quot;>
      • … .xslt transformation selected crossover using xslt
      • </target>
      • <target name=&quot; select_reproduction_population &quot;>
      • … .xslt transformation selected reproduction using xslt
      • </target>
      • <target name=“ perform_genetic_operation &quot;>
      • … .genetic operations were performed using xslt
      • </target>
      • <target name=&quot; generate_new_generation &quot;>
      • … new individuals were copied over to a new directory
      • </target>
    • 44. Step 4. Generate X populations
      • M= 500, g = 51
      • Set initial genetic operation probabilities:
      • 90% crossover on selected individuals
      • 10% reproduction on selected individuals
      • 0% secondary operations on selected individuals
      • Define termination criteria if you want an ongoing process until a desired fitness is obtained.
      • Iterate until done
    • 45. Ant properties
      • <project name=“early_genetic_trial&quot; default=&quot;build&quot; basedir=&quot;.&quot;>
      • <!-- setup ant-contrib//-->
      • <property name=&quot;ant-contrib.jar&quot; location=&quot;c:javaant-contrib-0.3.jar&quot;/>
      • <taskdef resource=&quot;net/sf/antcontrib/;
      • classpath=&quot;${ant-contrib.jar}&quot;/>
      • <!-- genetic parameters//-->
      • <property name=&quot;genetic.pop_size&quot; value=“500&quot;/>
      • <property name=&quot;genetic.number_of_generations&quot; value=“51&quot;/>
      • <property name=&quot;gen.reproduction_probability&quot; value=&quot;.10&quot;/>
      • <property name=&quot;gen.recombinate_probability&quot; value=&quot;.90&quot;/>
      • <property name=&quot;gen.mutation_probability&quot; value=“.0&quot;/>
      • <property name=&quot;gen.permuation_probability&quot; value=&quot;.0&quot;/>
      • <property name=&quot;gen.editing_probability&quot; value=&quot;.0&quot;/>
      • <property name=&quot;gen.encapsulation_probability&quot; value=&quot;.0&quot;/>
      • <property name=&quot;gen.decimation_probability&quot; value=&quot;.0&quot;/>
      • <!-- xml properties //-->
      • <property name=“source_xml&quot; value=&quot;c:geneticgenerate_initial_populationsource.xml&quot;/>
      • <property name=“target_xml&quot; value=&quot;c:geneticgenerate_initial_population arget.xml&quot;/>
      • <!-- xslt properties //-->
      • … contained xslt properties
      • <!-- directory properties //-->
      • … contained directory properties
      • <!-- report properties //-->
      • <property name=&quot;; value=&quot;C:javajakarta-ant-1.5.1etclog.xsl&quot;/>
    • 46. Simplified Ant Build Target
      • <target name=&quot;build&quot; depends=&quot;clean, create&quot;>
      • <antcall target=&quot; generate_initial_population &quot;>
      • <param name=“no_of_individuals&quot; value=&quot; ${genetic.pop_size}&quot;/>
      • </antcall>
      • <antcall target=“initiate_genetic_run“/>
      • <antcall target=“report “/>
      • <echo message=“successfully ran genetic run”/>
      • </target>
    • 47. Results
      • Non-normative results indicate ok processing time e.g. PIII 128 meg RAM approx 7 minutes to solve this problem
      • For simple mapping this was an effective technique
      • Many times best fit were poorly performing XSLT, needed to add criteria to fitness that timed processing time
    • 48. Ruminations
      • Early success with XSLT approach proved the applicability of GA with xml based technologies
      • Was easy to let people define a source and target xml
      • Issues of speed and efficiency can be addressed later on
      • How could I involve web services into such a process ?
    • 49. GA Strategies to Consider
      • Could directly apply the genetic algorithm directly with another language; java or c# ?
      • Leverage existing XSLT approach and add SOAP as a new function/terminal via XSLT extension
    • 50. Enhance existing Prototype
      • augment XSLT approach and introduce web services into terminal/function set
      • Needed a local repository of Web Services to add to existing function set
      • Needed to enhance XSLT with a generic SOAP XSLT Extension which indirectly invokes a web services via WSDL definition
      • Adjust generate initial population to include soap extension
    • 51. Simple application composition
      • Step 0 . Randomly generate initial population of xslt documents, this is now a 2 stage process to include web services via new function
      • Step 1 . evaluate fitness using via xml diff of target.xml to result.xml
      • Step 2 . select individuals according to their fitness which can be used by step 3
      • Step 3 . Apply primary and secondary genetic operations to generate new offspring population from selected individuals
      • Step 4 . Repeat steps 1,2,3, to generate X number of generations
      • Step 5 . choose best fit individual of last generation
    • 52. Web Services Search Engine
      • Long term storage in WSIL format
      • Data was stored in XML Xindice XML Repository
      • Which is accessible via WebDav and HTTP Get
      • Can query using XPATH
      • Harvested by a combination of google, scanning and general web robot techniques
    • 53. Manual Harvesting of Web Services
      • Google ‘file: wsil’ or inspection.wsil
      • Google ‘file: wsdl’
      • Scanning common Application Server ports, sending simple SOAP messages
      • Xmethods and general registries
      • Did not want to bind to either WSDL or UDDI….
    • 54. Simple WSIL example
      • <?xml version=&quot;1.0&quot;?>
      • <inspection xmlns=&quot;;>
      • <service>
      • <description referencedNamespace=&quot;; location=&quot;; />
      • </service>
      • </inspection>
    • 55. WSIL with 2 services
      • <?xml version=&quot;1.0&quot;?>
      • <inspection xmlns=&quot;;
      • xmlns:wsiluddi=&quot;;>
      • <service>
      • <abstract>A stock quote service with two descriptions</abstract>
      • <description referencedNamespace=&quot;;
      • location=&quot;;/>
      • <description referencedNamespace=&quot;urn:uddi-org:api&quot;>
      • <wsiluddi:serviceDescription location=&quot;;>
      • <wsiluddi:serviceKey>4FA28580-5C39-11D5-9FCF-BB3200333F79</wsiluddi:serviceKey>
      • </wsiluddi:serviceDescription>
      • </description>
      • </service>
      • <service>
      • <description referencedNamespace=&quot;;
      • location=&quot;;/>
      • </service>
      • <link referencedNamespace=&quot;;
      • location=&quot;;/>
      • </inspection>
    • 56. inspection.wsil at XMETHODS
      • <?xml version='1.0' encoding='UTF-8'?>
      • <inspection xmlns='' xmlns:wsiluddi='' xmlns:wsilxmethods=' '>
      • <service>
      • <abstract>Get the Barnes &amp; Noble price by ISBN</abstract>
      • <description referencedNamespace='' location=''/>
      • <description referencedNamespace=''>
      • <wsilxmethods:serviceDetailPage location=''>
      • <wsilxmethods:serviceID>272507</wsilxmethods:serviceID>
      • </wsilxmethods:serviceDetailPage>
      • </description>
      • </service>
      • … ..
      • </inspection>
    • 57. XSLT Generic SOAP client
      • Created extension function in SAXON, which grew out of a SOAP debugging tool effort ( another talk ! )
      • Ability to invoke a web service via WSDL and randomly choose web service
      • Web service invocation called during xslt transformation
      • Function prototype: ws:invoke(wsdl,methodname,nodeset)
    • 58. Example of using a web service in XSLT
      • <xsl:stylesheet xmlns:xsl=&quot;;
      • xmlns:ws=“”
      • version=“2.0&quot;>
      • <xsl:template match=&quot;a&quot;>
      • <xsl:value-of select=“ws:invoke(‘http://somewsdlfile.wsdl’,’getGUID’,a)”/>
      • <b/>
      • </xsl:template>
      • </xsl:stylesheet>
    • 59. Issues
      • Step 0 generation required additional stages, to introduce ws:invoke combined with WSIL information
      • Encapsulation was applied to xslt statements that contained ws:invoke function, so crossover would not change the statement
      • Always choose 1 st method ( in order ) in WSDL
      • Step 0 consistently generated highly unfit programs, required larger population size
      • Mutation seeding ws:invoke statement vastly speeded up process
      • New timeout factors necessary
      • GA process significantly slowed down due to inclusion of web services
      • GA process was more effective with better fitness evaluation; e.g. ranking fitness consisted of 3 source and targets
    • 60. M=1000, G=51 Parameters three fitness cases Fitness Cases Node count on xmldiff patch file difference between result xml and target xml Raw fitness Subset of xslt instructions + ws:invoke Function Set <a/>, <b/> ( 2 numbers ) Terminal Set Generate an xslt program that multiplies 2 numbers, converts to Celsius and returns number in Chinese Objective
    • 61. Results
      • Multiply 2 numbers convert to Celsius and result should be in Chinese: average 2 hours
      • Tried a variety of more complicated problems, with many runs never converging to a solution; It is apparent that there is not enough ‘genetic material’ online yet
      • Prototype proved that GA can be applied
      • Assisting GA always speeded up the process
      • Many optimization opportunities
    • 62. Enhancement
      • Could have used Dimitri Novachtev’s FXSL, though this would have imposed a pure fp viewpoint on process
      • Use UDDI as web services repository
      • Applied GA to ANT or xml pipeline, or even to BPEL, WS-CAF or any xml vocabulary
      • Prototyping with ANT was successful, but eventually will embed in a software framework
    • 63. The Internet as a maturing Software Framework
      • Inheritance versus composition resuse mechanism
      • Hierarchical versus relational data models
      • Synchronous versus asynchronous
      • Stateful versus stateless
      • Declarative versus OO versus procedural
      • Coarse grained versus RPC versus Object based web services
    • 64. Conclusion
      • In 5 years time will there be advances in hardware processing to make GA techniques viable?
      • problem domain experts can formulate representation of a problem to be solved using simple xml
      • Coders become farmers
      • Its counter intuitive to generate a million line ‘messy’ program to solve a problem
      • Are there any amends/changes to key specifications that will assist or restrict the GA method ?
      • Thank you, any questions ?
    • 65. References
      • JOHN R KOZA, Genetic Programming , MIT Press 1992
      • W3C, SOAP Version 1.2
      • W3C, XML Version 1
      • W3C, XSLT Version 2:
      • W3C, WSDL Version 1:
      • WSIL Version 1
      • J. W. Hunt and M. D. McIlroy , An Algorithm for Differential File Comparison published in 1976
      • SAXON XSLT PROCESSOR by Michael Kay,
      • ASF ANT,
      • FXSL, Dimitre Novatchev