Your SlideShare is downloading. ×

XML-Motor

1,584

Published on

A new compact XML algorithm without any dependencies. Its implemented as a rubygem to provide Non-native XML parser for particular usages. RubyGem at http://rubygems.org/gems/xml-motor and …

A new compact XML algorithm without any dependencies. Its implemented as a rubygem to provide Non-native XML parser for particular usages. RubyGem at http://rubygems.org/gems/xml-motor and https://github.com/abhishekkr/rubygem_xml_motor

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,584
On Slideshare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. aXML-Motor XML Document Parsing Algorithm version 2011.11.04 Abhishek Kumar ~=ABK=~ http://github.com/abhishekkr http://www.twitter.com/abionicAlgorithm, Ruby Source and Gem:[axml-motor] @GitHub: http://github.com/abhishekkr/axml-motor.gitrubygems src @GitHub: http://github.com/abhishekkr/rubygem_xml_motor.gitgem install @RubyGems:http://rubygems.org/gems/xml-motor Algorithm-Walk-throughExample XML Content: <BODY> <DIV id=banner> <H1>aXML-Motor</H1> <H5>A new algorithm based compact XML Parser with <I>no dependencies</I>. </H5> </DIV> <DIV id=details> <SPAN class=github>@github: <A href=http://github.com/abhishekkr/axml-motor.git> axml-motor</A> </SPAN> <DIV class=gem> <SPAN id=source class=github>@github: <A href=http://github.com/abhishekkr/rubygem-xml- motor.git>rubygem-xml-motor</A> </SPAN> <SPAN class=rubygems>@rubygems: <A href=http://rubygems.org/gems/xml-motor.git>xml-motor</A> </SPAN> </DIV> <I> Its a new algorithm implemented to build a real compact parser (v0.0.2 has less than 200 ruby source code lines) without any dependencies.</I> </DIV> </BODY>
  • 2. [Step.1] Split the XML Content (1.1) Split by < store as XMLNodes [0] BODY> [1] DIV id=banner> [2] H1>aXML-Motor [3] /H1> [4] H5>A new algorithm based compact XML Parser with [5] I>no dependencies [6] /I>. [7] /H5> [8] /DIV> [9] DIV id=details> [10] SPAN class=github>@github: [11] A href=http://github.com/abhishekkr/axml-motor.git>axml-motor< [12] /A> [13] /SPAN> [14] DIV class=gem> [15] SPAN id=source class=github>@github: [16] A href=http://github.com/abhishekkr/rubygem-xml-motor.git>rubygem- xml-motor [17] /A> [18] /SPAN> [19] SPAN class=rubygems>@rubygems: [20] A href=http://rubygems.org/gems/xml-motor.git>xml-motor [21] /A> [22] /SPAN> [23] /DIV> [24] I> Its a new algorithm implemented to build a real compact parser (v0.0.2 has less than 200 ruby source code lines) without any dependencies. [25] /I> [26] /DIV> [27] /BODY> (1.2) Split previous step1.1 result by > update XMLNodes [0] [ BODY, ] [1] [DIV id=banner, ] [2] [H1, aXML-Motor ] [3] [/H1, ] [4] [H5, A new algorithm based compact XML Parser with ] [5] [I, no dependencies] [6] [/I, .] [7] [/H5, ] [8] [/DIV, ] [9] [DIV id=details, ] [10] [SPAN class=github, @github: ] [11] [A href=http://github.com/abhishekkr/axml-motor.git, axml-motor]
  • 3. [12] [/A, ] [13] [/SPAN, ] [14] [DIV class=gem, ] [15] [SPAN id=source class=github, @github: ] [16] [A href=http://github.com/abhishekkr/rubygem-xml-motor.git, rubygem-xml-motor] [17] [/A, ] [18] [/SPAN, ] [19] [SPAN class=rubygems, @rubygems: ] [20] [A href=http://rubygems.org/gems/xml-motor.git, xml-motor] [21] [/A, ] [22] [/SPAN, ] [23] [/DIV, ] [24] [I, Its a new algorithm implemented to build a real compact parser (v0.0.2 has less than 200 ruby source code lines) without any dependencies.] [25] [/I, ] [26] [/DIV, ] [27] [/BODY, ](1.3) Split first element per line by space/tab, mark 1 st part as tag_name and split latter part by =, iterating to make key=value pair per attribute... turning XMLNodes to update XMLNodes [0] [ [BODY, {}], ] [1] [ [DIV, {id=>banner}], ] [2] [ [H1, {}], aXML-Motor ] [3] [ [/H1, {}], ] [4] [ [H5, {}], A new algorithm based compact XML Parser with ] [5] [ [I, {}], no dependencies] [6] [ [/I, {}], .] [7] [ [/H5, {}], ] [8] [ [/DIV, {}], ] [9] [ [DIV, {id=>details}], ] [10] [ [SPAN, {class=>github}], @github: ] [11] [ [A, {href=http://github.com/abhishekkr/axml-motor.git}], axml-motor] [12] [ [/A, {}], ] [13] [ [/SPAN, {}], ] [14] [ [DIV, {class=>gem}], ] [15] [ [SPAN, {id=>source, class=>github}], @github: ] [16] [ [A, {href=>http://github.com/abhishekkr/rubygem-xml-motor.git}], rubygem-xml-motor] [17] [ [/A, {}], ] [18] [ [/SPAN, {}], ] [19] [ [SPAN, {class=>rubygems}], @rubygems: ] [20] [ [A, {href=>http://rubygems.org/gems/xml-motor.git}], xml-motor] [21] [ [/A, {}], ] [22] [ [/SPAN, {}], ] [23] [ [/DIV, {}], ] [24] [ [I, {}], Its a new algorithm implemented to build a real
  • 4. compact parser (v0.0.2 has less than 200 ruby source code lines) without any dependencies.] [25] [ [/I, {}], ] [26] [ [/DIV, {}], ] [27] [ [/BODY, {}], ] Here, we have the XMLNodes as we wanted them. Now its turn to Indexify them.[Step.2] Index the processed XMLNodesThere are three things involved in Indexing of XMLNodes Tag_Name : Iterating through all elements of XMLNodes, every element has three components including Tag Name, which is available at XMLNodes.all[ [TAG_NAMES, *], *] Depth: The place/level of the Node in XML Node Tree starting from 0. Index: The index value of Node as per depending upon the XMLNode Array How to Index-ify? There will be an element per Tag_Name with a Hash of Keys as the Depth where it is found which has array of 2*number_of_nodes (starting and ending Index for that same Node) Example: From above XMLNodes, the [DIV] would hold {1=>[1,8, 9,26], 2=>[14,26]} Because Tag_Name DIV has Index set of 1,8 and 9,26 for Depth of 1. Similarly Index set of 14,26 for Depth of 2.Indexed XMLTags for above processed XMLNodes will be as follows:calculated XMLTags [BODY] = {0=>[0,27]} [DIV] = {1=>[1,8, 9,26], 2=>[14,23]} [H1] = {2=>[2,3]} [H5] = {2=>[4,7]} [I] => {3=>[5,6], 2=>[24,25]} [SPAN] => {2=>[10,13], 3=>[15,18, 19,22]} [A] => {3=>[11,12], 4=>[16,17, 20,21]}[Step.3] Grab My Node from processed XMLNodes using XMLTagsNow suppose, I aim for a Tag_Name XYZ..... then look for XMLTags[XYZ], iteratethrough all of its depths and extract 2 indexes at a time. These two indexes per timeindicate the start and end node, fetch all value within those nodes from XMLNode.
  • 5. This will return set of values held by Tag_Name XYZ.Suppose a tree form is provided as ABC.XYZ, then start from top nodes as ABC inthis context.Grab all its node. Now move on to lower nodes and filter the Indexes found only withinthe Node Index ranges provided by the earlier node. This would end with the filtered setof Indexes for XYZ falling only under the Index-Range of ABC.To check for a Tag_Name with attribute, for every filtered Index-Range, just check if ithas the required attribute as its key-value pair.Example:Case: Grabbing SPAN, with attribute “class=github”Its a single node, grab all its Index-Range (10,13), (15,18) and (19,22).Here, just XMLNodes[10] and XMLNodes[15] have required attribute.Now, grab all data between XMLNodes[10][1] to XMLNodes[13-1][1] andXMLNodes[15][1] to XMLNodes[18-1][1].Result:[@github: <A href=http://github.com/abhishekkr/axml-motor.git>axml-motor</A> ,@github: <A href=http://github.com/abhishekkr/rubygem-xml-motor.git>rubygem-xml-motor</A>]Case: Grabbing H5.ITop node is H5, grab all its Index-Range (4,7).Second node I, grab all falling between ranges from previous node (5,6).Now, grab all data between XMLNodes[5][1] to XMLNodes[6-1][1]..Result:[no dependencies]Below, youll also see that you need not give entire hierarchy to fetch anydescendant from child tree of any node. Just giving the major scope nodes would dothe work as fine as providing exact hierarchy.Case: Grabbing DIV.ATop node is DIV, grab all its Index-Range (1,8), (9,26) and (14,23).Second node A, grab all falling between ranges from previous node (11,12), (16,17)and (20,21).Now, grab all data between XMLNodes[5][1] to XMLNodes[6-1][1]..Result:[axml-motor, rubygem-xml-motor, xml-motor]

×