Your SlideShare is downloading. ×
Native XML processing in C++ (BoostCon'11)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Native XML processing in C++ (BoostCon'11)

6,266
views

Published on

XML programming has emerged as a powerful data processing paradigm with its own rules for abstracting, partitioning, programming styles, and idioms. Seasoned XML programmers expect, and their …

XML programming has emerged as a powerful data processing paradigm with its own rules for abstracting, partitioning, programming styles, and idioms. Seasoned XML programmers expect, and their productivity depends on the availability of languages and tools that allow usage of the patterns and practices native to the domain of XML programming. The object-oriented community, however, prefers XML data binding tools over dedicated XML languages because these tools automatically generate a statically-typed, vocabulary-specific object model from a given XML schema. Unfortunately, these tools often sidestep the expectations of seasoned XML programmers because of the difficulties in synthesizing abstractions of XML programming using purely object-oriented principles. This talk demonstrates how this prevailing gap can be significantly narrowed by a novel application of multi-paradigm programming capabilities of C++. In particular, how generic programming, meta-programming, generative programming, strategic programming, and operator overloading supported by C++ together enable native and typed XML programming.

Published in: Technology

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
6,266
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
18
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. LEESA: Toward Native XML Processing Using Multi-paradigm Design in C++ May 16, 2011 Dr. Sumant Tambe Dr. Aniruddha Gokhale Software Engineer Associate Professor of EECS Dept. Real-Time Innovations Vanderbilt University www.dre.vanderbilt.edu/LEESA 1 / 54
  • 2.  XML Programming in C++. Specifically, data binding What XML data binding stole from us! Restoring order: LEESA LEESA by examples LEESA in detail  Architecture of LEESA  Type-driven data access  XML schema representation using Boost.MPL  LEESA descendant axis and strategic programming  Compile-time schema conformance checking  LEESA expression templates Evaluation: productivity, performance, compilers C++0x and LEESA LEESA in future 2 / 54
  • 3. XML Infoset Cɷ 3 / 54
  • 4.  Type system  Regular types  Anonymous complex elements  Repeating subsequence XML data model  XML information set (infoset)  E.g., Elements, attributes, text, comments, processing instructions, namespaces, etc. etc. Schema languages  XSD, DTD, RELAX NG Programming Languages  XPath, XQuery, XSLT Idioms and best practices  XPath: Child, parent, sibling, descendant axes; wildcards 4 / 54
  • 5.  Predominant categories & examples (non-exhaustive) DOM API  Apache Xerces-C++, RapidXML, Tinyxml, Libxml2, PugiXML, lxml, Arabica, MSXML, and many more … Event-driven APIs (SAX and SAX-like)  Apache SAX API for C++, Expat, Arabica, MSXML, CodeSynthesis XSD/e, and many more … XML data binding  Liquid XML Studio, Code Synthesis XSD, Codalogic LMX, xmlplus, OSS XSD, XBinder, and many more … Boost XML??  No XML library in Boost (as of May 16, 2011)  Issues: very broad requirements, large XML specifications, good XML libraries exist already, encoding issues, round tripping issues, and more … 5 / 54
  • 6. XML query/traversal program XML Uses Schema XML Schema C++ Object-oriented i/p Compiler Generate Data Access Layer i/p Generate Executable Compiler (Code Generator) C++ Code Process  Automatically generate vocabulary-specific classes from the schema  Develop application code using generated classes  Parse an XML into an object model at run-time  Manipulate the objects directly (CRUD)  Serialize the objects back to XML 6 / 54
  • 7.  Example: Book catalog xml and xsd<catalog> <xs:complexType name=“book”> <book> <xs:sequence> <name>The C++ Programming Language</name> <xs:element name="name" type="xs:string" /> <price>71.94</price> <xs:element name="price" type="xs:double" /> <xs:element name="author" maxOccurs="unbounded"> <author> <xs:complexType> <name>Bjarne Stroustrup</name> <xs:sequence> <country>USA</country> <xs:element name="name" type="xs:string" /> </author> <xs:element name="country" type="xs:string" /> </book> </xs:sequence> <book> </xs:complexType> <name>C++ Coding Standards</name> </xs:element> <price>36.41</price> </xs:sequence> <author> </xs:complexType> <name>Herb Sutter</name> <country>USA</country> <xs:element name="catalog"> <xs:complexType> </author> <xs:sequence> <author> <xs:element name=“book” <name>Andrei Alexandrescu</name> type=“lib:book" <country>USA</country> maxOccurs="unbounded"> </author> </xs:element> </book> </xs:sequence></catalog> </xs:complexType> </xs:element> 7 / 54
  • 8.  Example: Book catalog xsd and generated C++ code<xs:complexType name=“book”> class author { <xs:sequence> private: <xs:element name="name" std::string name_; type="xs:string" /> std::string country_; <xs:element name="price" public: type="xs:double" /> std::string get_name() const; <xs:element name="author" maxOccurs="unbounded"> void set_name(std::string const &); <xs:complexType> std::string get_country() const; <xs:sequence> void set_country(std::string const &); <xs:element name="name" }; type="xs:string" /> <xs:element name="country" class book { type="xs:string" /> private: std::string name_; </xs:sequence> double price_; </xs:complexType> std::vector<author> author_sequence_; </xs:element> public: std::string get_name() const; </xs:sequence> void set_name(std::string const &);</xs:complexType> double get_price() const;<xs:element name="catalog"> void set_price(double); <xs:complexType> std::vector<author> get_author() const; <xs:sequence> void set_author(vector<author> const &); <xs:element name=“book” }; type=“lib:book" class catalog { maxOccurs="unbounded"> private: </xs:element> std::vector<book> book_sequence_; </xs:sequence> public: </xs:complexType> std::vector<book> get_book() const;</xs:element> void set_book(std::vector<book> const &); }; 8 / 54
  • 9.  Book catalog application program  Example: Find all author names std::vector<std::string> get_author_names (const catalog & root) { std::vector<std::string> name_seq; for (catalog::book_const_iterator bi (root.get_book().begin ()); bi != root.get_book().end (); ++bi) { for (book::author_const_iterator ai (bi->get_author().begin ()); ai != bi->get_author().end (); ++ai) { name_seq.push_back(ai->name()); } } return name_seq; } Advantages of XML data binding  Easy to use  C++ programming style and idioms  Vocabulary-specific API  Efficient  Type safety 9 / 54
  • 10.  We lost something along the way. A lot actually! Loss of succinctness  XML child axis replaced by nested for loops  Example: Find all author names Using XML data binding (20 lines) Using XPath (1 line)/book/author/name/text() std::vector<std::string> get_author_names (const catalog & root) { std::vector<std::string> name_seq; for (catalog::book_const_iterator bi = root.get_book().begin (); bi != root.get_book().end (); ++bi) { for (book::author_const_iterator ai = bi->get_author().begin ()); ai != bi->get_author().end (); ++ai) { name_seq.push_back(ai->name()); } } return name_seq; } 10 / 54
  • 11.  Loss of expressive power  Example: “Find all names recursively”  What if catalogs are recursive too!  Descendant axis replaced by manual recursion. Hard to maintain. Using XPath (1 line) Using XML data binding using//name/text() BOOST_FOREACH (20+ lines) std::vector<std::string> get_author_names (const catalog & c)<catalog> { <catalog> std::vector<std::string> name_seq; <catalog> BOOST_FOREACH(const book &b, c.get_book()) <catalog> { <book><name>...</name></book> BOOST_FOREACH(const author &a, b.get_author()) <book><name>...</name></book> { </catalog> name_seq.push_back(a.name()); <book>...</book> } } <book>...</book> return name_seq; </catalog> } <book> <name>...</name> std::vector<std::string> get_all_names (const catalog & root) <price>...</price> { <author> std::vector<std::string> name_seq(get_author_names(root)); <name>...</name> BOOST_FOREACH (const catalog &c, root.get_catalog()) <country>...</country> { </author> std::vector<std::string> names = get_all_names(c); name_seq.insert(names.begin(), names.end()); </book> } <book>...</book> return name_seq; <book>...</book> } </catalog></catalog> 11 / 54
  • 12.  Loss of XML programming idioms  Cannot use “wildcard” types  Example: Without spelling “Catalog” and “Book”, find names that are exactly at the third level. Using XPath (1 line) Using XML data binding/*/*/name/text() std::vector<std::string> get_author_names (const catalog & root) { std::vector<std::string> name_seq; . . . . . . return name_seq; } Also known as structure-shyness  Descendant axis and wildcards don’t spell out every detail of the structure Casting Catalog to Object class isn’t good enough  object.get_book()  compiler error!  object.get_children()  Inevitable casting! 12 / 54
  • 13.  Hybrid approach: Pass XPath expression as a string Using XML data binding + XPath  No universal support  Boilerplate setup codeDOMElement* root (static_cast<DOMElement*> (c._node ()));DOMDocument* doc (root->getOwnerDocument ());  DOM, XML namespaces,dom::auto_ptr<DOMXPathExpression> expr ( doc->createExpression ( xml::string ("//author").c_str (), resolver.get ())); Memory managementdom::auto_ptr<DOMXPathResult> r ( expr->evaluate (  Casting is inevitable  Look and feel of two doc, DOMXPathResult::ITERATOR_RESULT_TYPE, 0)); APIs is (vastly) differentwhile (r->iterateNext ()){ DOMNode* n (r->getNodeValue ()); author* a (  iterateNext() Vs. static_cast<author*> ( n->getUserData (dom::tree_node_key))); begin()/end()} cout << "Name : " << a->get_name () << endl;  Can’t use predicates on data outside xml  E.g. Find authors of highest selling books “/book[?condition?]/author/name” 13 / 54
  • 14.  Schema-specificity (to much object-oriented bias?)  Each class has a different interface (not generic)  Naming convention of XML data binding tools vary Catalog Book Author +get_Book() +get_Author() +get_Name() +get_Price() +get_Country() +get_name() Lost succinctness (axis-oriented expressions) Lost structure-shyness (descendant axis, wildcards) Can’t use Visitor design pattern (stateful traversal) with XPath 14 / 54
  • 15. Language for Embedded QuEry and TraverSAl Multi-paradigm Design in C++ 15 / 54
  • 16. * Catalog  A book catalog xsd +get_Book() +get_Catalog() 1  Generated six C++ classes 1 *  Catalog 1 1 Book  Book Complex classes Price +get_Author()  Author +get_Price() 1 +get_Name() Name  Price Simple classes 1 1 * 1  Country Country 1 1 Author  Name +get_Name() +get_Country() 1  Price, Country, and Name<catalog> <catalog> are simple wrappers  Catalogs are recursive <catalog> <catalog>...</catalog> </catalog> <book> <name>...</name> <price>...</price> <author> <name>...</name> <country>...</country> </author> </book> </catalog></catalog> 16 / 54
  • 17. * Restoring succinctness Catalog +get_Book() 1  Example: Find all author names +get_Catalog() 1  Child axis traversal * Book Price 1 1 +get_Author() +get_Price() 1 +get_Name() Name 1 1 Using XPath (1 line) * 1 1 1 Author Country /book/author/name/text() +get_Name() 1 +get_Country()Using LEESA (3 lines) Catalog croot = load_catalog(“catalog.xml”); std::vector<Name> author_names = evaluate(croot, Catalog() >> Book() >> Author() >> Name()); 17 / 54
  • 18. * Restoring expressive power Catalog +get_Book() 1  Example: Find all names recursively +get_Catalog() 1  Descendant axis traversal * Book Price 1 1 +get_Author() +get_Price() 1 +get_Name() Name 1 1Using XPath (1 line) * 1 1 1 Author Country//name/text() +get_Name() 1 +get_Country()Using LEESA (2 lines)Catalog croot = load_catalog(“catalog.xml”);std::vector<Name> names = DescendantsOf(Catalog(), Name())(croot); Fully statically typed execution Efficient: LEESA “knows” where Names are! 18 / 54
  • 19.  Restoring xml programming * Catalog idioms (structure-shyness) +get_Book() +get_Catalog() 1  Example: Without spelling intermediate 1 types, find names that are exactly at * Book the third level. Price 1 1 +get_Author()  Wildcards in a typed query! +get_Price() 1 +get_Name() Name 1 1Using XPath (1 line) * 1 1 1 Author Country/*/*/name/text() +get_Name() 1 +get_Country()Using LEESA (3 lines)namespace LEESA { struct Underbar {} _; }Catalog croot = load_catalog(“catalog.xml”);std::vector<Name> names = LevelDescendantsOf(Catalog(), _, _, Name())(croot); Fully statically typed execution Efficient: LEESA “knows” where Books, Authors, and Names are! 19 / 54
  • 20. * User-defined filters Catalog  Example: Find names of authors from +get_Book() +get_Catalog() 1 Country == USA 1 *  Basically unary functors Book 1 1  Supports free functions, function Price +get_Author() objects, boost::bind, C++0x lambda +get_Price() 1 +get_Name() Name 1 1 * 1 1 1 Author Country +get_Name() 1Using XPath (1 line) +get_Country()//author[country/text() = ‘USA’]/name/text()Using LEESA (6 lines)Catalog croot = load_catalog(“catalog.xml”);std::vector<Name> author_names = evaluate(croot, Catalog() >> DescendantsOf(Catalog(), Author()) >> Select(Author(), [](const Author &a) { return a.get_Country() == “USA"; }) >> Name()); 20 / 54
  • 21. * Tuplefication!! Catalog  Example: Pair the name and country of +get_Book() +get_Catalog() 1 all the authors 1 *  std::vector of Book Price 1 1 boost::tuple<Name *, Country *> +get_Author() +get_Price() 1 +get_Name() Name 1 1 * 1 1 1 Author Country +get_Name() 1Using XPath +get_Country()???????????????????????????????Using LEESA (5 lines)Catalog croot = load_catalog(“catalog.xml”);std::vector<boost::tuple<Name *, Country *> > tuples =evaluate(croot, Catalog() >> DescendantsOf(Catalog(), Author()) >> MembersAsTupleOf(Author(), make_tuple(Name(), Country()))); 21 / 54
  • 22. * Using visitors MyVisitor Catalog +visit_Catalog()  Gang-of-four Visitor design pattern +visit_Book() +get_Book() 1 +visit_Author() +get_Catalog() +visit_Name()  Visit methods for all Elements +visit_Country() 1 +visit_Price() *  Example: Visit catalog, books, authors, Price 1 1 Book and names in that order +get_Author() +get_Price() 1  Stateful, statically typed traversal +get_Name() Name 1 1  fixed depth child axis * 1 1 1 Author Country +get_Name() 1Using XPath +get_Country()??????????????????????????????? CatalogUsing LEESA (7 lines) Book1 Book2Catalog croot = load_catalog(“catalog.xml”);MyVisitor visitor;std::vector<Country> countries = A1 A2 A3 A4evaluate(croot, Catalog() >> visitor >> Book() >> visitor >> Author() >> visitor C1 C4 C2 C3 >> Country() >> visitor); 22 / 54
  • 23. * Using visitors (depth-first) MyVisitor Catalog +visit_Catalog()  Gang-of-four Visitor design pattern +visit_Book() +get_Book() 1 +visit_Author() +get_Catalog() +visit_Name()  Visit methods for all Elements +visit_Country() 1 +visit_Price() *  Example: Visit catalog, books, authors, Price 1 1 Book and names in depth-first manner +get_Author() +get_Price() 1  Stateful, statically typed traversal +get_Name() Name 1  fixed depth child axis 1 * 1 1 1 Author Country +get_Name()Using XPath 1 +get_Country()??????????????????????????????? Catalog Default precedence.Using LEESA (7 lines) No parenthesis needed. Book1 Book2Catalog croot = load_catalog(“catalog.xml”);MyVisitor visitor;std::vector<Book> books =evaluate(croot, Catalog() >> visitor A1 A2 A3 A4 >>= Book() >> visitor >>= Author() >> visitor >>= Country() >> visitor); C1 C2 C3 C4 23 / 54
  • 24. Visited Child Axis Child Axis Parent Axis Parent Axis (breadth-first) (depth-first) (depth-first) (breadth-first)Catalog() >> Book() >> v >> Author() >> vCatalog() >>= Book() >> v >>= Author() >> v Default precedence.Name() << v << Author() << v << Book() << v No parenthesis needed.Name() << v <<= Author() << v <<= Book() << v 24 / 54
  • 25. * Composing named queries MyVisitor Catalog +visit_Catalog()  Queries can be named, composed, and +get_Book() +get_Catalog() 1 +visit_Book() +visit_Author() passed around as executable +visit_Name() +visit_Country() 1 expressions * +visit_Price() Book  Example: Price 1 1 +get_Author() For each book +get_Price() +get_Name() 1 Name print(country of the author) 1 1 * 1 print(price of the book) Country 1 1 Author +get_Name()Using XPath 1 +get_Country()???????????????????????????????Using LEESA (6 lines)Catalog croot = load_catalog(“catalog.xml”);MyVisitor visitor;BOOST_AUTO(v_country, Author() >> Country() >> visitor);BOOST_AUTO(v_price, Price() >> visitor);BOOST_AUTO(members, MembersOf(Book(), v_country, v_price));evaluate(croot, Catalog() >>= Book() >> members); 25 / 54
  • 26.  Using visitors (recursively)  Hierarchical Visitor design pattern  Visit and Leave methods for all elements  Depth awareness  Example: Visit everything!!  Stateful, statically typed traversal  Descendant axis = recursive  AroundFullTD = AroundFullTopDownUsing XPath???????????????????????????????Using LEESA (3 lines!!)Catalog croot = load_catalog(“catalog.xml”);MyHierarchicalVisitor v;AroundFullTD(Catalog(), VisitStrategy(v), LeaveStrategy(v)))(croot); 26 / 54
  • 27.  LEESA 1. Is not an xml parsing library XML data binding tool 2. Does not validate xml files can do both 3. Does not replace/compete with XPath 4. Does not resolve X/O impedance mismatch  More reading: “Revealing X/O impedance mismatch”, Dr. R Lämmel LEESA 1. Is a query and traversal library for C++ 2. Validates XPath-like queries at compile-time (schema conformance) 3. Is motivated by XPath 4. Goes beyond XPath 5. Simplifies typed XML programming 6. Is an embedded DSEL (Domain-specific embedded language) 7. Is applicable beyond xml (E.g., Google Protocol Buffers, model traversal, hand coded class hierarchies, etc.) 27 / 54
  • 28.  XML Programming in C++, specifically data-binding What XML data binding stole from us! Restoring order: LEESA LEESA by examples LEESA in detail  Architecture of LEESA  Type-driven data access  XML schema representation using Boost.MPL  LEESA descendant axis and strategic programming  Compile-time schema conformance checking  LEESA expression templates Evaluation: productivity, performance, compilers C++0x and LEESA LEESA in future 28 / 54
  • 29.  The Process LEESA Expressions Written by Programmers Axes-oriented Recursive Traversal Traversal Expressions (Strategic Programming) Ch ec ke d es ag ai Us ns t XML i/p Schema Static Generate meta-information i/p Extended Schema Type-driven i/p Compiler Generate Data Access Layer C++ i/p Generate Executable (Code Compiler Generator) Object-oriented Generate Data Access Layer i/p C++ Code 29 / 54
  • 30. XML Schema Static Type-driven Visitor meta- Data Access Declarations information Layer C++ (.h, .cpp) Object-oriented Meta-data Schema XML ALL Data Access Doxygen XML XML XML XSLT Generator Compiler XML Layer C++ (.h) LEESA’s gen-meta.py script Extended schema compiler = 4 step process  XML schema language (XSD) specification is huge and complex  Don’t reinvent the wheel: xml data binding tools already process it  Naming convention of xml data binding tools vary  Applicability beyond xml data binding  E.g. Google Protocol Buffers (GPB), hand written class hierarchies  Meta-data generator script inserts visitor declaration in the C++ classes 30 / 54
  • 31.  To fix  Different interface of each class  Generic API “children” wrappers to navigate aggregation  Generated by the Python script  More amenable to compositionstd::vector<Book> children (Catalog &c, Book const *) { return c.get_Book();}std::vector<Catalog> children (Catalog &c, Catalog const *) { return c.get_Catalog();}std::vector<Author> children (Book &b, Author const *) { return b.get_Author();}Price children (Book &b, Price const *) { return b.get_Price();}Name children (Book &b, Name const *) { return b.get_Name();}Country children (Author &a, Country const *) { return a.get_Country();}Name children (Author &a, Name const *) { return a.get_Name();} 31 / 54
  • 32.  Ambiguity!  Simple elements and attributes are mapped to built-in types  “children” function overloads become ambiguous<xs:complexType name=“Author”> <xs:sequence> <xs:element name=“first_name" type="xs:string" /> Mapping <xs:element name=“last_name“ type="xs:string" /> </xs:sequence></xs:complexType> gen-meta.py std::string children (Author &a, std::string const *) { return a.get_first_name(); } std::string children (Author &a, std::string const *) { return a.get_last_name(); } 32 / 54
  • 33.  Solution 1: Automatic schema transformation  Force data binding tools to generate unique C++ types  gen-meta.py can transforms input xsd while preserving semantics<xs:complexType name=“Author”> <xs:sequence> <xs:element name=“first_name" type="xs:string" /> Mapping <xs:element name=“last_name“ type="xs:string" /> </xs:sequence></xs:complexType> Transformation (gen-meta.py)<xs:complexType name=“Author”> <xs:sequence> <xsd:element name=“first_name"> Mapping <xsd:simpleType> <xsd:restriction base="xsd:string" /> </xsd:simpleType> </xsd:element> <xsd:element name=“last_name"> <xsd:simpleType> <xsd:restriction base="xsd:string" /> </xsd:simpleType> </xsd:element> </xs:sequence></xs:complexType> 33 / 54
  • 34.  Solution 1 limitations: Too many types! Longer compilation times. Solution 2: Generate placeholder types  Create unique type aliases using a template and integer literals  Not implemented! <xs:complexType name=“Author”> <xs:sequence> <xs:element name=“first_name" type="xs:string" /> <xs:element name=“last_name“ type="xs:string" /> </xs:sequence> </xs:complexType> Code generation (gen-meta.py) namespace LEESA { template <class T, unsigned int I> struct unique_type { typedef T nested; }; } namespace Library { typedef LEESA::unique_type<std::string, 1> first_name; typedef LEESA::unique_type<std::string, 2> last_name; } 34 / 54
  • 35.  A key idea in LEESA  Externalize structural meta-information using Boost.MPL  LEESA’s meta-programs traverse the meta-information at compile-time template <class Kind> * struct SchemaTraits { Catalog typedef mpl::vector<> Children; // Empty sequence }; +get_Book() 1 +get_Catalog() template <> 1 struct SchemaTraits <Catalog> * { Book typedef mpl::vector<Book, Catalog> Children; Price 1 1 }; +get_Author() template <> +get_Price() 1 +get_Name() Name struct SchemaTraits <Book> 1 { 1 typedef mpl::vector<Name, Price, Author> Children; * 1 }; Country 1 1 Author template <> struct SchemaTraits <Author> +get_Name() { 1 +get_Country() typedef mpl::vector<Name, Country> Children; }; 35 / 54
  • 36.  A key idea in LEESA  Externalize structural meta-information using Boost.MPL  Descendant meta-information is a transitive closure of Children template <class Kind> struct SchemaTraits { typedef mpl::vector<> Children; // Empty sequence * }; template <> struct SchemaTraits <Catalog> { Catalog typedef mpl::vector<Book, Catalog> Children; }; +get_Book() +get_Catalog() 1 template <> struct SchemaTraits <Book> { typedef mpl::vector<Name, Price, Author> Children; 1 }; * template <> struct SchemaTraits <Author> { Book typedef mpl::vector<Name, Country> Children; Price 1 1 }; +get_Author() typedef boost::mpl::true_ True; +get_Price() 1 +get_Name() Name typedef boost::mpl::false_ False; 1 template<class A, class D> struct IsDescendant : False {}; 1 template<> struct IsDescendant<Catalog, Catalog> : True {}; * 1 template<> struct IsDescendant<Catalog, Book> : True {}; 1 1 Author template<> struct IsDescendant<Catalog, Name> : True {}; Country template<> struct IsDescendant<Catalog, Price> : True {}; +get_Name() template<> struct IsDescendant<Catalog, Author> : True {}; 1 +get_Country() template<> struct IsDescendant<Catalog, Country> : True {}; template<> struct IsDescendant<Book, Name> : True {}; template<> struct IsDescendant<Book, Price> : True {}; template<> struct IsDescendant<Book, Author> : True {}; template<> struct IsDescendant<Book, Country> : True {}; template<> struct IsDescendant<Author, Name> : True {}; template<> struct IsDescendant<Author, Country> : True {}; 36 / 54
  • 37. std::vector<Country> countries = DescendantsOf(Catalog(), Country())(croot); Algorithm (conceptual)1. IsDescendant<Catalog, Country>::value Catalog2. Find all children types of Catalog SchemaTraits<Catalog>::Children = boost::mpl::vector<Book, Catalog>3. Iterate over Boost.MPL vector Book Catalog4. IsDescendant<Book, Country>::value5. Use type-driven data access on each Catalog std::vector<Book>=children(Catalog&, Book*) Name Author Price For Catalogs repeat step (1)6. Find all children types of Book SchemaTraits<Book>::Children = boost::mpl::vector<Name, Author, Price> Country Name7. Iterate over Boost.MPL vector8. IsDescendant<Name, Country>::value9. IsDescendant<Price, Country>::value10. IsDescendant<Author, Country>::value11. Use type drive data access on each Book std::vector<Author>=children(Book&, Author*)12. Find all children types of Author SchemaTraits<Author>::Children = boost::mpl::vector<Country, Name>13. Repeat until Country objects are found 37 / 54
  • 38.  Strategic Programming Paradigm  A systematic way of creating recursive tree traversal  Developed in 1998 as a term rewriting language: Stratego Why LEESA uses strategic programming  Generic  LEESA can be designed without knowing the types in a xml tree  Recursive  LEESA can handles mutually and/or self recursive types  Reusable  LEESA can be reused as a library for any xsd  Composable  LEESA can be extended by its users using policy-based templates Basic combinators  Identity, Fail, Sequence, Choice, All, and One 38 / 54
  • 39. fullTD(node) fullTD(node) All(node, strategy){ { { visit(node); visit(node); forall children c of node forall children c of node All(node, fullTD); strategy(c); fullTD(c); } }} Pre-order traversal pseudo-code (fullTopDown) fullTD(node) { Recursive seq(node, visit, All(fullTD)); traversal (1 out of many) } seq(node,strategy1,strategy2) { strategy1(node); strategy2(node); } Basic All(node, strategy) Combinators { (2 out of 6) forall children c of node strategy(c); } 39 / 54
  • 40. template <class Strategy1, template <class Strategy> class Strategy2> class All Boost.MPLclass Seq { Meta-information{ template <class Data> template <class Data> void operator()(Data d) void operator()(Data d) { { foreach T in SchemaTraits<Data>::Children Strategy1(d); std::vector<T> t = children(d, (T *)0); Strategy2(d); Strategy(t); } } Type-driven}; }; Data Access Sequence + All = FullTD template <class Strategy> class FullTD { template <class data> void operator()(Data d) { Seq<Strategy,All<FullTD>>(d); } };Note: Objects and constructors omitted for brevity 40 / 54
  • 41. * BOOST_AUTO(prices, DescendantsOf(Catalog(), Price())); Catalog  LEESA uses FullTopDown<Accumulator<Price>> +get_Book() 1  But schema unaware recursion in every sub-structure +get_Catalog() is inefficient 1 *  We know that Authors do not contain Prices Book Price 1 1 +get_Author() LEESA’s +get_Price() +get_Name()FullTD may be schema-aware 1 inefficient traversal is optimal * 1 1 Author Country +get_Name() +get_Country() IsDescendant <Catalog,Price> = True IsDescendant <Author,Price> = False Bypass unnecessary sub-structures (Author) using meta-programming 41 / 54
  • 42.  LEESA has compile-time schema conformance checking  LEESA queries compile only if they agree with the schema  Uses externalized schema and meta-programming  Error message using BOOST_MPL_ASSERT  Tries to reduce long and incomprehensible error messages  Shows assertion failures in terms of concepts  ParentChildConcept, DescendantKindConcept, etc.  Originally developed for C++0x concepts  Examples DescendantKindConcept Failure ParentChildConcept Failure 1. BOOST_AUTO(prices, DescendantsOf(Author(), Price()); 2. BOOST_AUTO(books, Catalog() >> Book() >> Book()); 3. BOOST_AUTO(countries, LevelDescendantsOf(Catalog(),_,Country()); LevelDescendantKindConcept Failure 42 / 54
  • 43.  Country is at least 2 “steps” away from a CatalogLevelDescendantsOf(Catalog(),_,Country());1>------ Build started: Project: library, Configuration: Release Win32 ------1> driver.cxx1> using native typeof1>C:mySVNLEESAincludeLEESA/SP_Accumulation.cpp(112): error C2664: boost::mpl::assertion_failed : cannot convertparameter 1 from boost::mpl::failed************LEESA::LevelDescendantKindConcept<ParentKind,DescendantKind,SkipCount,Custom>::* *********** toboost::mpl::assert<false>::type1> with1> [1> ParentKind=library::Catalog,1> DescendantKind=library::Country,1> SkipCount=1,1> Custom=LEESA::Default1> ]1> No constructor could take the source type, or constructor overload resolution was ambiguous1> driver.cxx(155) : see reference to class template instantiationLEESA::LevelDescendantsOp<Ancestor,Descendant,SkipCount,Custom> being compiled1> with1> [1> Ancestor=LEESA::Carrier<library::Catalog>,1> Descendant=LEESA::Carrier<library::Country>,1> SkipCount=1,1> Custom=LEESA::Default1> ]1>C:mySVNLEESAincludeLEESA/SP_Accumulation.cpp(112): error C2866:LEESA::LevelDescendantsOp<Ancestor,Descendant,SkipCount,Custom>::mpl_assertion_in_line_130 : a const static data memberof a managed type must be initialized at the point of declaration1> with1> [1> Ancestor=LEESA::Carrier<library::Catalog>,1> Descendant=LEESA::Carrier<library::Country>,1> SkipCount=1,1> Custom=LEESA::Default1> ]1> Generating Code...========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ========== 43 / 54
  • 44.  (Nearly) all LEESA queries are expression templates  Hand rolled. Not using Boost.Proto Catalog() >> Book() >> Author() >> Name() 3 ChainExpr n io LResultType ut ec Ex 2 ChainExpr GetChildren<Author, Name> of er LResultType rd O 1 ChainExpr GetChildren<Book, Author> LResultType Catalog GetChildren<Catalog, Book> template <class L, class H> ChainExpr<L, GetChildren<typename ExpressionTraits<L>::result_type, H> > operator >> (L l, H h) { typedef typename ExpressionTraits<L>::result_type LResultType; typedef GetChildren<LResultType, H> GC; return ChainExpr<L, GC>(l, h); } 44 / 54
  • 45.  (Nearly) all LEESA queries are expression templates  Hand rolled. Not using Boost.Proto  Every LEESA expression becomes a unary function object  LEESA query  Systematically composed unary function objects Catalog() >>= Book() >> Author() >> Name() ChainExpr 1 2 Catalog DepthFirstGetChildren<Catalog, Book> Catalog 2b ChainExpr 2a ChainExpr GetChildren<Author, Name> Book GetChildren<Book, Author> 45 / 54
  • 46.  XML Programming in C++, specifically data-binding What XML data binding stole from us! Restoring order: LEESA LEESA by examples LEESA in detail  Architecture of LEESA  Type-driven data access  XML schema representation using Boost.MPL  LEESA descendant axis and strategic programming  Compile-time schema conformance checking  LEESA expression templates Evaluation: productivity, performance, compilers C++0x and LEESA LEESA in future 46 / 54
  • 47.  Reduction in boilerplate traversal code  Results from the 2009 paper in the Working Conference on Domain-Specific Languages, Oxford, UK 87% reduction in traversal code 47 / 54
  • 48.  CodeSynthesis xsd data binding tool on the catalog xsd Abstraction penalty from construction, copying, and destruction of internal containers (std::vector<T> and LEESA::Carrier<T>) GNU Profiler: Highest time spent in std::vector<T>::insert and iterator dereference functions (data binding) 33 seconds for parsing, validating, and object model construction 48 / 54
  • 49.  Compilation time affects programmer productivity Experiment  An XML schema containing 300 types (4 recursive)  gcc 4.5 (with and without variadic templates) (data binding) 49 / 54
  • 50.  Experiment: Total time to build an executable from an xsd on 4 compilers  XML schema containing 300 types (4 recursive)  5 LEESA expressions (all using descendant axis)  Tested on Intel Core 2 Duo 2.67 GHz, 4 GB laptop 79 44 18 15 54 126 112 101 60 95 95 95 95 50 / 54
  • 51.  Readability improvements  Lambdas!  LEESA actions (e.g., Select, Sort) can use C++0x lambdas  static_assert for improved error reporting  auto for naming LEESA expressions Performance improvements (run-time)  Rvalue references and move semantics  Optimize away internal copies of large containers Performance improvements (Compile-time)  Variadic templates  Faster schema conformance checking  No need to use BOOST_MPL_LIMIT_VECTOR_SIZE and Boost.Preprocessor tricks Simplifying LEESA’s implementation  Trailing return-type syntax and decltype  Right angle bracket syntax 51 / 54
  • 52.  Become a part of the Boost libraries!? Extend LEESA to support  Google Protocol Buffers (GPB)  Apache Thrift  Or any “schema-first” data binding in C++ Better support from data binding tools? Parallelization on multiple cores  Parallelize query execution on multiple cores behind LEESA’s high-level declarative programming API Co-routine style programming model  LEESA expressions return containers  Expression to container  expensive!  Expression to iterator  cheap!  Compute result only when needed (lazy) XML literal construction  Checked against schema at compile-time 52 / 54
  • 53. LEESA  Native XML Processing Using Multi-paradigm Design in C++ XML Programming Concerns Representation Traversal Static Schema and access to (up, down, conformance richly-typed sideways) checking hierarchical data Statically Structure-shy fixed depth Breadth-first Depth-first Object-oriented Generative Programming Programming Metaprogramming Generic Strategic programming Programming C++ Multi-paradigm Solution 53 / 54
  • 54. 54 / 54