This is an MSc project dissertation on a multi-threaded XML parser written in C++ (presented at the University of Oxford for obtaining the degree of MSc in Software Engineering).
The dissertation navigates the elements of programs optimization (such as branch and cache optimization, and the principle of locality), the elements of program concurrency (concurrent objects, lock-free synchronization, and the speedup) and the concurrency design patterns.
The above notions are applied to design an XML SAX-compliant parser that uses concurrency to speed up the parsing. The parser is written in modern C++ since it offers a comprehensive library for writing multi-threaded applications.
This document provides an overview of working with JSON and XML. It discusses handling AJAX requests using jQuery methods like load(), get(), post(), and ajax(). It then covers identifying and working with JSON, including its syntax and structure. XML is introduced and differences between JSON and XML are highlighted. The document also demonstrates reading and writing data from JSON objects using JSON.parse() and JSON.stringify() methods in JavaScript.
This document outlines the history and evolution of ASP.NET, including the initial release of Active Server Pages (ASP) in 1996, ASP.NET in 2002, ASP.NET MVC in 2008, and ASP.NET Web Pages in 2010. It also mentions additional releases and features added in 2012, 2014, and beyond, such as ASP.NET Web API, SignalR, and ASP.NET 5.
Oracle Application Express (APEX) is shipped with several JavaScript libraries, jQuery being the best known one of them. And on top of these libraries the APEX Development Team created their own. You probably used a couple of these API's already, like $s, $v etc.
But there are way more and some of them are extremely useful. But first you have to be aware they exists. And secondly you have to know how to use the properly.
This session will cover the most valuable JavaScript API's with some real world examples.
Most developers stick to the standard $s and $v functions - even without knowing there is also a $v2 and $s can have more parameters.
The focus will be on the namespaced apex API's, like apex.server.process and apex.event.trigger.
Vancouver style para publicar en revistas cientificas Paola De Castro
El documento describe el estilo de Vancouver, que proporciona directrices para la citación bibliográfica y publicación de documentos científicos a nivel internacional. El estilo de Vancouver surgió de una reunión de editores médicos en 1978 en Vancouver y ahora es mantenido por el Comité Internacional de Editores de Revistas Médicas. El estilo de Vancouver cubre aspectos éticos como la autoría, revisión por pares, conflictos de interés, y protección de pacientes y animales en la investigación. Ahora es un estándar ampliamente adoptado por
Este documento introduce los elementos sintácticos fundamentales del lenguaje Delphi. Describe tokens como identificadores, números, cadenas de caracteres y etiquetas. Explica cómo estos tokens se combinan para formar expresiones, declaraciones y sentencias. También define palabras reservadas, directivas y comentarios de Delphi, y describe la sintaxis para declarar y usar estos elementos del lenguaje.
This document provides an overview of working with JSON and XML. It discusses handling AJAX requests using jQuery methods like load(), get(), post(), and ajax(). It then covers identifying and working with JSON, including its syntax and structure. XML is introduced and differences between JSON and XML are highlighted. The document also demonstrates reading and writing data from JSON objects using JSON.parse() and JSON.stringify() methods in JavaScript.
This document outlines the history and evolution of ASP.NET, including the initial release of Active Server Pages (ASP) in 1996, ASP.NET in 2002, ASP.NET MVC in 2008, and ASP.NET Web Pages in 2010. It also mentions additional releases and features added in 2012, 2014, and beyond, such as ASP.NET Web API, SignalR, and ASP.NET 5.
Oracle Application Express (APEX) is shipped with several JavaScript libraries, jQuery being the best known one of them. And on top of these libraries the APEX Development Team created their own. You probably used a couple of these API's already, like $s, $v etc.
But there are way more and some of them are extremely useful. But first you have to be aware they exists. And secondly you have to know how to use the properly.
This session will cover the most valuable JavaScript API's with some real world examples.
Most developers stick to the standard $s and $v functions - even without knowing there is also a $v2 and $s can have more parameters.
The focus will be on the namespaced apex API's, like apex.server.process and apex.event.trigger.
Vancouver style para publicar en revistas cientificas Paola De Castro
El documento describe el estilo de Vancouver, que proporciona directrices para la citación bibliográfica y publicación de documentos científicos a nivel internacional. El estilo de Vancouver surgió de una reunión de editores médicos en 1978 en Vancouver y ahora es mantenido por el Comité Internacional de Editores de Revistas Médicas. El estilo de Vancouver cubre aspectos éticos como la autoría, revisión por pares, conflictos de interés, y protección de pacientes y animales en la investigación. Ahora es un estándar ampliamente adoptado por
Este documento introduce los elementos sintácticos fundamentales del lenguaje Delphi. Describe tokens como identificadores, números, cadenas de caracteres y etiquetas. Explica cómo estos tokens se combinan para formar expresiones, declaraciones y sentencias. También define palabras reservadas, directivas y comentarios de Delphi, y describe la sintaxis para declarar y usar estos elementos del lenguaje.
Introduction to graphs and their ability to represent imagesAnyline
This document introduces graphs and their ability to represent images. It defines what a graph is and different types of graphs. It discusses common data structures used to store graphs like adjacency lists, adjacency matrices, and incidence matrices. It then covers how graphs can represent images by having vertices represent pixels and edges represent pixel adjacency. Region adjacency graphs and dual graphs are introduced as ways to represent image regions and relationships. Combinatorial maps are presented as another way to represent graphs and implicitly store dual graphs. Finally, applications of using graphs for image segmentation are mentioned.
Finally, in javaScript 2015 we get 2 new built-in data structures that makes our life a little bit easier. On this lecture, we will explore various implementations of common data structures in javaScript using Arrays, Objects and the new members in javaScript 2015: Maps and Sets.
This document discusses Oracle Application Express themes and templates. It provides an overview of Apex 4.0 improvements to themes, describes how to manage themes and customise templates, and explains common substitution variables used in templates. Template types and classes are defined. The document also discusses alternative approaches to styling and references data dictionary views related to themes and templates.
With the advent of asyncio, the python community started to build new performant web frameworks and servers for asynchronous backends. In this way, ASGI specification appeared as a successor to WSGI.
In this talk, we will take a closer look into the details of this new specification and consider its implementation in the uvicorn server and Starlette framework.
PHP is a server-side scripting language that can be embedded into HTML. It is used to dynamically generate client-side code sent as the HTTP response. PHP code is executed on the web server and allows variables, conditional statements, loops, functions, and arrays to dynamically output content. Key features include PHP tags <?php ?> to delimit PHP code, the echo command to output to the client, and variables that can store different data types and change types throughout a program.
REPRESENTACION DE RELACIONES Y DIGRAFOS EN LA COMPUTADORADavid Hernandez
Representación de las relaciones y digrafos en la computadora .
Tercer semestre Ingenieria de sistemas y Computacion.
Universidad Del Quindio Armenia 2014
El documento explica el algoritmo de codificación de Huffman, el cual asigna códigos binarios de longitud variable a símbolos basado en su frecuencia de aparición. Primero se construye un árbol de Huffman ordenando los símbolos de menor a mayor frecuencia y uniendo los nodos. Luego, los códigos binarios de cada símbolo se obtienen recorriendo el árbol de raíz a hoja. Esto permite codificar cadenas de texto de manera más compacta.
Demystifying Docker & Kubernetes
The document provides an overview of container networking standards and models including Docker's Container Network Model (CNM) and Kubernetes' Container Networking Interface (CNI). It discusses Docker networking drivers like bridge, overlay, and host networking. It also covers Kubernetes networking fundamentals like pods, services, ingress, and network policies. The agenda includes a dive into CNM and CNI standards as well as examples of container networking in Docker and Kubernetes.
El documento presenta un modelo entidad-relación para una florería que describe las entidades como sucursales, oficinistas, floristas, arreglos y pedidos y las relaciones entre ellas. Las entidades se conectan mediante claves primarias y foráneas y las relaciones pueden ser de uno a uno, uno a muchos o muchos a muchos.
Cette présentation brosse une revue détaillée des nouveautés qui sont proposées en standard, en preview et en incubation dans la version 21 LTS du JDK issues des projets d’OpenJDK :
- Amber : Record Patterns, Pattern Matching for switch, String Templates, Unnamed Patterns and Variables, Unnamed Classes and Instance Main Methods
- Loom : Virtual Threads, Structured Concurrency, Scoped Values
- et Panama : Foreign Function & Memory API, Vector API
mais aussi de fonctionnalités dans les API de Java Core qui ne font pas l'objet de JEP et dans la JVM HotSpot.
This document provides an overview of ASP.NET Core 1.0 and discusses its evolution from previous ASP.NET technologies. It covers the ASP.NET architecture, Model-View-Controller pattern, ASP.NET MVC and Web API project templates, tag helpers, consuming Web APIs, and using JavaScript frameworks with ASP.NET Core.
This document provides an introduction to the Java programming language. It discusses that Java is an object-oriented programming language used to write computer programs. It also describes the basic elements of the Java language including commands, variables, data types, control statements, and functions/methods. Additionally, it explains that the basic building block of Java is the class, and that a Java program or application consists of multiple classes organized into packages.
This thesis examines machine learning approaches using Hadoop in the cloud. It implements a distributed machine learning infrastructure in the cloud without dependence on distributed file systems or shared memory. This infrastructure learns and configures a distributed network of learners. The results are then filtered, fused and visualized. The thesis also develops a machine learning infrastructure using Python and compares the two approaches. It uses real-world immigration and GDP datasets from a government database to test the frameworks. The cloud-based approach is able to scale to petabytes of data with minimal configuration.
This document discusses applying machine learning techniques including text retrieval, association rule mining, and decision tree learning using R. It introduces the movie review dataset and preprocessing steps like removing stopwords and stemming. Text retrieval is performed to create a document-term matrix from the reviews. Association rules are generated from a sample of negative reviews using the Apriori algorithm. Decision trees are built on the combined document-term matrix and sentiment labels to classify review sentiment.
This document discusses issues with processing large volumes of data and proposes an enterprise data warehouse architecture capable of handling big data. It aims to explain integrating Hadoop into existing data warehouses.
The first chapter introduces challenges of increased data volume, variety and velocity. It discusses skill shortages in big data and analytics. Existing data warehouses are built for reporting but not analyzing large, unaggregated data.
The second chapter outlines requirements for a new architecture and proposes a multi-platform data warehouse environment incorporating Hadoop. It describes Hadoop components like HDFS, YARN, Hive and tools like Sqoop.
The third chapter focuses on integrating Hadoop into existing data warehouses by implementing star schemas in Hive, addressing security,
Python code for Artificial Intelligence: Foundations of Computational AgentsADDI AI 2050
This document summarizes Python code for artificial intelligence foundations and computational agents. It describes Python as a language well-suited for AI due to its readability and efficiency. The document provides instructions for downloading and installing Python and related libraries. It also outlines some key Python features for AI like lists, tuples, sets, dictionaries and list comprehensions and warns of potential pitfalls around side effects.
This document provides an overview of a group's work on the Build and Deployment (B&D) subproject of the Giraf project over 4 sprints. The group worked to improve the development environment, mapped dependencies between apps and libraries, changed an app into a standalone library, managed Google Play and Google Analytics, spearheaded a renaming process, and maintained the product backlog as product owners. Their efforts focused on streamlining the build process and deployment of the Giraf apps to make the project ready for future semesters.
The document is a master's thesis titled "Automated ANTLR Tree walker Generation" that describes research into automatically generating tree walkers from ANTLR parser specifications. The thesis introduces ANTLRT G, an extension to ANTLR that allows a tree walker to be automatically generated based on an ANTLR parser specification. The author developed an algorithm using a tree pattern algebra to determine a tree walker that can parse all possible trees generated by a given parser. ANTLRT G has been implemented and demonstrated through a case study implementing a compiler for the Triangle programming language.
This document provides a biography and overview of a textbook about programming on parallel machines by Norm Matloff. It discusses that the book focuses on practical parallel programming using platforms like OpenMP, CUDA and MPI. It is aimed at students who are reasonably proficient in programming and linear algebra. The book uses examples in C/C++ and R to illustrate fundamental parallelization principles.
This document summarizes a project that implements function call parallelism within the LLVM compiler framework. The project analyzes serial programs at compile time and automatically adds parallelism by running certain function calls in separate threads while speculatively continuing the main thread. This speculation is made safe using software transactional memory to roll back threads if memory conflicts occur between threads. The implementation finds suitable functions and call sites, parallelizes the calls using pthreads and STM, and includes a merging procedure to enforce correct commit ordering. Evaluation shows the implementation provides performance gains of up to 3.5x on some benchmarks.
Thesis - Nora Szepes - Design and Implementation of an Educational Support Sy...Nóra Szepes
This document describes the design and implementation of a new educational support system portal and thin client. It discusses the specification phase where user requirements were gathered. The Mithril JavaScript framework was chosen for implementing the student client module. The design follows a Model-View-Controller pattern. Testing was done using Cucumber, Zombie and Istanbul to validate the design and implementation.
Introduction to graphs and their ability to represent imagesAnyline
This document introduces graphs and their ability to represent images. It defines what a graph is and different types of graphs. It discusses common data structures used to store graphs like adjacency lists, adjacency matrices, and incidence matrices. It then covers how graphs can represent images by having vertices represent pixels and edges represent pixel adjacency. Region adjacency graphs and dual graphs are introduced as ways to represent image regions and relationships. Combinatorial maps are presented as another way to represent graphs and implicitly store dual graphs. Finally, applications of using graphs for image segmentation are mentioned.
Finally, in javaScript 2015 we get 2 new built-in data structures that makes our life a little bit easier. On this lecture, we will explore various implementations of common data structures in javaScript using Arrays, Objects and the new members in javaScript 2015: Maps and Sets.
This document discusses Oracle Application Express themes and templates. It provides an overview of Apex 4.0 improvements to themes, describes how to manage themes and customise templates, and explains common substitution variables used in templates. Template types and classes are defined. The document also discusses alternative approaches to styling and references data dictionary views related to themes and templates.
With the advent of asyncio, the python community started to build new performant web frameworks and servers for asynchronous backends. In this way, ASGI specification appeared as a successor to WSGI.
In this talk, we will take a closer look into the details of this new specification and consider its implementation in the uvicorn server and Starlette framework.
PHP is a server-side scripting language that can be embedded into HTML. It is used to dynamically generate client-side code sent as the HTTP response. PHP code is executed on the web server and allows variables, conditional statements, loops, functions, and arrays to dynamically output content. Key features include PHP tags <?php ?> to delimit PHP code, the echo command to output to the client, and variables that can store different data types and change types throughout a program.
REPRESENTACION DE RELACIONES Y DIGRAFOS EN LA COMPUTADORADavid Hernandez
Representación de las relaciones y digrafos en la computadora .
Tercer semestre Ingenieria de sistemas y Computacion.
Universidad Del Quindio Armenia 2014
El documento explica el algoritmo de codificación de Huffman, el cual asigna códigos binarios de longitud variable a símbolos basado en su frecuencia de aparición. Primero se construye un árbol de Huffman ordenando los símbolos de menor a mayor frecuencia y uniendo los nodos. Luego, los códigos binarios de cada símbolo se obtienen recorriendo el árbol de raíz a hoja. Esto permite codificar cadenas de texto de manera más compacta.
Demystifying Docker & Kubernetes
The document provides an overview of container networking standards and models including Docker's Container Network Model (CNM) and Kubernetes' Container Networking Interface (CNI). It discusses Docker networking drivers like bridge, overlay, and host networking. It also covers Kubernetes networking fundamentals like pods, services, ingress, and network policies. The agenda includes a dive into CNM and CNI standards as well as examples of container networking in Docker and Kubernetes.
El documento presenta un modelo entidad-relación para una florería que describe las entidades como sucursales, oficinistas, floristas, arreglos y pedidos y las relaciones entre ellas. Las entidades se conectan mediante claves primarias y foráneas y las relaciones pueden ser de uno a uno, uno a muchos o muchos a muchos.
Cette présentation brosse une revue détaillée des nouveautés qui sont proposées en standard, en preview et en incubation dans la version 21 LTS du JDK issues des projets d’OpenJDK :
- Amber : Record Patterns, Pattern Matching for switch, String Templates, Unnamed Patterns and Variables, Unnamed Classes and Instance Main Methods
- Loom : Virtual Threads, Structured Concurrency, Scoped Values
- et Panama : Foreign Function & Memory API, Vector API
mais aussi de fonctionnalités dans les API de Java Core qui ne font pas l'objet de JEP et dans la JVM HotSpot.
This document provides an overview of ASP.NET Core 1.0 and discusses its evolution from previous ASP.NET technologies. It covers the ASP.NET architecture, Model-View-Controller pattern, ASP.NET MVC and Web API project templates, tag helpers, consuming Web APIs, and using JavaScript frameworks with ASP.NET Core.
This document provides an introduction to the Java programming language. It discusses that Java is an object-oriented programming language used to write computer programs. It also describes the basic elements of the Java language including commands, variables, data types, control statements, and functions/methods. Additionally, it explains that the basic building block of Java is the class, and that a Java program or application consists of multiple classes organized into packages.
This thesis examines machine learning approaches using Hadoop in the cloud. It implements a distributed machine learning infrastructure in the cloud without dependence on distributed file systems or shared memory. This infrastructure learns and configures a distributed network of learners. The results are then filtered, fused and visualized. The thesis also develops a machine learning infrastructure using Python and compares the two approaches. It uses real-world immigration and GDP datasets from a government database to test the frameworks. The cloud-based approach is able to scale to petabytes of data with minimal configuration.
This document discusses applying machine learning techniques including text retrieval, association rule mining, and decision tree learning using R. It introduces the movie review dataset and preprocessing steps like removing stopwords and stemming. Text retrieval is performed to create a document-term matrix from the reviews. Association rules are generated from a sample of negative reviews using the Apriori algorithm. Decision trees are built on the combined document-term matrix and sentiment labels to classify review sentiment.
This document discusses issues with processing large volumes of data and proposes an enterprise data warehouse architecture capable of handling big data. It aims to explain integrating Hadoop into existing data warehouses.
The first chapter introduces challenges of increased data volume, variety and velocity. It discusses skill shortages in big data and analytics. Existing data warehouses are built for reporting but not analyzing large, unaggregated data.
The second chapter outlines requirements for a new architecture and proposes a multi-platform data warehouse environment incorporating Hadoop. It describes Hadoop components like HDFS, YARN, Hive and tools like Sqoop.
The third chapter focuses on integrating Hadoop into existing data warehouses by implementing star schemas in Hive, addressing security,
Python code for Artificial Intelligence: Foundations of Computational AgentsADDI AI 2050
This document summarizes Python code for artificial intelligence foundations and computational agents. It describes Python as a language well-suited for AI due to its readability and efficiency. The document provides instructions for downloading and installing Python and related libraries. It also outlines some key Python features for AI like lists, tuples, sets, dictionaries and list comprehensions and warns of potential pitfalls around side effects.
This document provides an overview of a group's work on the Build and Deployment (B&D) subproject of the Giraf project over 4 sprints. The group worked to improve the development environment, mapped dependencies between apps and libraries, changed an app into a standalone library, managed Google Play and Google Analytics, spearheaded a renaming process, and maintained the product backlog as product owners. Their efforts focused on streamlining the build process and deployment of the Giraf apps to make the project ready for future semesters.
The document is a master's thesis titled "Automated ANTLR Tree walker Generation" that describes research into automatically generating tree walkers from ANTLR parser specifications. The thesis introduces ANTLRT G, an extension to ANTLR that allows a tree walker to be automatically generated based on an ANTLR parser specification. The author developed an algorithm using a tree pattern algebra to determine a tree walker that can parse all possible trees generated by a given parser. ANTLRT G has been implemented and demonstrated through a case study implementing a compiler for the Triangle programming language.
This document provides a biography and overview of a textbook about programming on parallel machines by Norm Matloff. It discusses that the book focuses on practical parallel programming using platforms like OpenMP, CUDA and MPI. It is aimed at students who are reasonably proficient in programming and linear algebra. The book uses examples in C/C++ and R to illustrate fundamental parallelization principles.
This document summarizes a project that implements function call parallelism within the LLVM compiler framework. The project analyzes serial programs at compile time and automatically adds parallelism by running certain function calls in separate threads while speculatively continuing the main thread. This speculation is made safe using software transactional memory to roll back threads if memory conflicts occur between threads. The implementation finds suitable functions and call sites, parallelizes the calls using pthreads and STM, and includes a merging procedure to enforce correct commit ordering. Evaluation shows the implementation provides performance gains of up to 3.5x on some benchmarks.
Thesis - Nora Szepes - Design and Implementation of an Educational Support Sy...Nóra Szepes
This document describes the design and implementation of a new educational support system portal and thin client. It discusses the specification phase where user requirements were gathered. The Mithril JavaScript framework was chosen for implementing the student client module. The design follows a Model-View-Controller pattern. Testing was done using Cucumber, Zombie and Istanbul to validate the design and implementation.
This document provides a programmer's guide and reference for the SPiiPlus C library version 6.50. The guide describes how to use the C library to communicate with SPiiPlus motion controllers over various communication channels like serial, Ethernet, and PCI. It gives an overview of the library concepts and functions. Key functions allow opening communications, sending and receiving data, performing transactions with the controller, and closing connections. Revision details are provided for version 6.50.
This document provides a programmer's guide and reference for the SPiiPlus C library version 6.50. The guide begins with an introduction and overview of the library, describing its operation environment, communication capabilities, controller simulation support, and key features. It then covers using the library, including building applications, redistributing files, and registering the kernel mode driver. The bulk of the document is a reference for the C library functions, organized into sections for communication functions and service communication functions.
This document is a doctoral thesis that examines bringing more intelligence to the web and beyond through semantic web technologies. It discusses the motivation for more intelligent web applications, provides an overview of semantic web technologies and languages. It then presents the H-DOSE semantic platform and its logical architecture for semantic resource retrieval. Several case studies that implemented the H-DOSE platform are also described. The thesis concludes with discussions on related works and potential future directions.
This master's thesis document outlines a proposed social networking web app called "Go Green" that aims to promote environmentally friendly behaviors through gamification. The document provides background on relevant topics like gamification, recommender systems, social networks and carbon footprint analysis. It then describes the proposed "Go Green" concept and contributions, including an overview, use case diagram, entity relationship diagram, proposed game elements and design. Evaluation methods and future work are also discussed. The goal of "Go Green" is to motivate green behaviors through a gamified social app that provides personalized recommendations and tracks users' environmental impact.
This thesis describes the development of a language, toolchain and experimental evaluation for the SubLeqXorShr (SLXS) single-instruction processor architecture. An assembly-like language was created to facilitate programming for the SLXS. A toolchain including a macro assembler and simulator was implemented to compile and run SLXS code. Two algorithms (AES and PRESENT) were implemented on the SLXS language to evaluate the toolchain. The AES implementation required similar clock cycles as a lower-level version, showing the language achieves the same efficiency. While the PRESENT implementation had a sevenfold increase in cycles compared to a microcontroller, the SLXS architecture's higher clock frequency means results are promising for its use in
This document provides a summary of a student's seminar paper on resource scheduling algorithms. The paper discusses the need for resource scheduling algorithms in cloud computing environments. It then describes several types of algorithms commonly used for resource scheduling, including genetic algorithms, bee algorithms, ant colony algorithms, workflow algorithms, and load balancing algorithms. For each algorithm type, it provides a brief introduction, overview of the basic steps or concepts, and some examples of applications where the algorithm has been used. The paper was submitted by a student named Shilpa Damor to fulfill requirements for a degree in information technology.
The document is Nathaniel Knapp's master's thesis titled "Parasite: Local Scalability Profiling for Parallelization" submitted to Technische Universität München. The thesis presents Parasite, a tool that measures the parallelism of function call sites in programs parallelized using Pthreads. Parasite calculates the parallelism ratio, which is an upper bound on potential speedup and useful for evaluating scalability. The thesis demonstrates Parasite on sorting algorithms, molecular dynamics simulations, and other programs to analyze parallelism and identify factors limiting scalability.
This document specifies the Linked Media Layer architecture and describes its key components. The architecture includes a repository layer for media storage and metadata, an integration layer, and a service layer. It also describes modules for unstructured search using Apache Nutch/Solr, media collection from social networks, searching media resources with latent semantic indexing, and participation in the MediaEval 2013 benchmarking initiative for video search and hyperlinking tasks.
The document contains reviews and endorsements of the book "Practical Programming" from several readers. One review says the book teaches programming concepts through short interactive Python scripts that encourage experimentation. Another review praises the book for building computational skills while empowering readers to immediately apply their new Python skills. A third review commends the book for its "fearless romp" through relevant programming concepts and techniques.
The document contains reviews and endorsements of the book "Practical Programming" from several readers. One review says the book teaches programming concepts through short interactive Python scripts that encourage experimentation. Another review praises the book for building computational skills while empowering readers to immediately apply their new Python skills. A third review commends the book for its "fearless romp" through relevant programming concepts and techniques.
Similar to A multi-threaded XML parser in C++ (MSc project dissertation) (20)
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio, Inc.
Alluxio Webinar
June. 18, 2024
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Jianjian Xie (Staff Software Engineer, Alluxio)
As Trino users increasingly rely on cloud object storage for retrieving data, speed and cloud cost have become major challenges. The separation of compute and storage creates latency challenges when querying datasets; scanning data between storage and compute tiers becomes I/O bound. On the other hand, cloud API costs related to GET/LIST operations and cross-region data transfer add up quickly.
The newly introduced Trino file system cache by Alluxio aims to overcome the above challenges. In this session, Jianjian will dive into Trino data caching strategies, the latest test results, and discuss the multi-level caching architecture. This architecture makes Trino 10x faster for data lakes of any scale, from GB to EB.
What you will learn:
- Challenges relating to the speed and costs of running Trino in the cloud
- The new Trino file system cache feature overview, including the latest development status and test results
- A multi-level cache framework for maximized speed, including Trino file system cache and Alluxio distributed cache
- Real-world cases, including a large online payment firm and a top ridesharing company
- The future roadmap of Trino file system cache and Trino-Alluxio integration
A neural network is a machine learning program, or model, that makes decisions in a manner similar to the human brain, by using processes that mimic the way biological neurons work together to identify phenomena, weigh options and arrive at conclusions.
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Paul Brebner
Closing talk for the Performance Engineering track at Community Over Code EU (Bratislava, Slovakia, June 5 2024) https://eu.communityovercode.org/sessions/2024/why-apache-kafka-clusters-are-like-galaxies-and-other-cosmic-kafka-quandaries-explored/ Instaclustr (now part of NetApp) manages 100s of Apache Kafka clusters of many different sizes, for a variety of use cases and customers. For the last 7 years I’ve been focused outwardly on exploring Kafka application development challenges, but recently I decided to look inward and see what I could discover about the performance, scalability and resource characteristics of the Kafka clusters themselves. Using a suite of Performance Engineering techniques, I will reveal some surprising discoveries about cosmic Kafka mysteries in our data centres, related to: cluster sizes and distribution (using Zipf’s Law), horizontal vs. vertical scalability, and predicting Kafka performance using metrics, modelling and regression techniques. These insights are relevant to Kafka developers and operators.
What is Continuous Testing in DevOps - A Definitive Guide.pdfkalichargn70th171
Once an overlooked aspect, continuous testing has become indispensable for enterprises striving to accelerate application delivery and reduce business impacts. According to a Statista report, 31.3% of global enterprises have embraced continuous integration and deployment within their DevOps, signaling a pervasive trend toward hastening release cycles.
Enhanced Screen Flows UI/UX using SLDS with Tom KittPeter Caitens
Join us for an engaging session led by Flow Champion, Tom Kitt. This session will dive into a technique of enhancing the user interfaces and user experiences within Screen Flows using the Salesforce Lightning Design System (SLDS). This technique uses Native functionality, with No Apex Code, No Custom Components and No Managed Packages required.
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSISTier1 app
Are you ready to unlock the secrets hidden within Java thread dumps? Join us for a hands-on session where we'll delve into effective troubleshooting patterns to swiftly identify the root causes of production problems. Discover the right tools, techniques, and best practices while exploring *real-world case studies of major outages* in Fortune 500 enterprises. Engage in interactive lab exercises where you'll have the opportunity to troubleshoot thread dumps and uncover performance issues firsthand. Join us and become a master of Java thread dump analysis!
Penify - Let AI do the Documentation, you write the Code.KrishnaveniMohan1
Penify automates the software documentation process for Git repositories. Every time a code modification is merged into "main", Penify uses a Large Language Model to generate documentation for the updated code. This automation covers multiple documentation layers, including InCode Documentation, API Documentation, Architectural Documentation, and PR documentation, each designed to improve different aspects of the development process. By taking over the entire documentation process, Penify tackles the common problem of documentation becoming outdated as the code evolves.
https://www.penify.dev/
In this infographic, we have explored cost-effective strategies for iOS app development, focusing on building high-quality apps within a budget. Key points covered include prioritizing essential features, leveraging existing tools and libraries, adopting cross-platform development approaches, optimizing for a Minimum Viable Product (MVP), and integrating with cloud services and third-party APIs. By implementing these strategies, businesses and developers can create functional and engaging iOS apps while minimizing development costs and time-to-market.
Boost Your Savings with These Money Management AppsJhone kinadey
A money management app can transform your financial life by tracking expenses, creating budgets, and setting financial goals. These apps offer features like real-time expense tracking, bill reminders, and personalized insights to help you save and manage money effectively. With a user-friendly interface, they simplify financial planning, making it easier to stay on top of your finances and achieve long-term financial stability.
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...Luigi Fugaro
Vector databases are transforming how we handle data, allowing us to search through text, images, and audio by converting them into vectors. Today, we'll dive into the basics of this exciting technology and discuss its potential to revolutionize our next-generation AI applications. We'll examine typical uses for these databases and the essential tools
developers need. Plus, we'll zoom in on the advanced capabilities of vector search and semantic caching in Java, showcasing these through a live demo with Redis libraries. Get ready to see how these powerful tools can change the game!
14 th Edition of International conference on computer visionShulagnaSarkar2
About the event
14th Edition of International conference on computer vision
Computer conferences organized by ScienceFather group. ScienceFather takes the privilege to invite speakers participants students delegates and exhibitors from across the globe to its International Conference on computer conferences to be held in the Various Beautiful cites of the world. computer conferences are a discussion of common Inventions-related issues and additionally trade information share proof thoughts and insight into advanced developments in the science inventions service system. New technology may create many materials and devices with a vast range of applications such as in Science medicine electronics biomaterials energy production and consumer products.
Nomination are Open!! Don't Miss it
Visit: computer.scifat.com
Award Nomination: https://x-i.me/ishnom
Conference Submission: https://x-i.me/anicon
For Enquiry: Computer@scifat.com
Consistent toolbox talks are critical for maintaining workplace safety, as they provide regular opportunities to address specific hazards and reinforce safe practices.
These brief, focused sessions ensure that safety is a continual conversation rather than a one-time event, which helps keep safety protocols fresh in employees' minds. Studies have shown that shorter, more frequent training sessions are more effective for retention and behavior change compared to longer, infrequent sessions.
Engaging workers regularly, toolbox talks promote a culture of safety, empower employees to voice concerns, and ultimately reduce the likelihood of accidents and injuries on site.
The traditional method of conducting safety talks with paper documents and lengthy meetings is not only time-consuming but also less effective. Manual tracking of attendance and compliance is prone to errors and inconsistencies, leading to gaps in safety communication and potential non-compliance with OSHA regulations. Switching to a digital solution like Safelyio offers significant advantages.
Safelyio automates the delivery and documentation of safety talks, ensuring consistency and accessibility. The microlearning approach breaks down complex safety protocols into manageable, bite-sized pieces, making it easier for employees to absorb and retain information.
This method minimizes disruptions to work schedules, eliminates the hassle of paperwork, and ensures that all safety communications are tracked and recorded accurately. Ultimately, using a digital platform like Safelyio enhances engagement, compliance, and overall safety performance on site. https://safelyio.com/
Stork Product Overview: An AI-Powered Autonomous Delivery FleetVince Scalabrino
Imagine a world where instead of blue and brown trucks dropping parcels on our porches, a buzzing drove of drones delivered our goods. Now imagine those drones are controlled by 3 purpose-built AI designed to ensure all packages were delivered as quickly and as economically as possible That's what Stork is all about.
A multi-threaded XML parser in C++ (MSc project dissertation)
1. PXML
A SAX-compliant parallel XML parser
A dissertation for the degree of MSc in Software Engineering
By David Kabongo Tshiany
Supervised by
Dr. Niki Trigoni
May 2013
Kellogg College
University of Oxford
2. Parallel XML parser - 2 -
Je dédie ce travail à ma tendre épouse Maguy,
Et à nos enfants Gaël, Ryan et Nathan,
Pour leur patience.
Declaration
Except as acknowledged, attributed and referenced, I declare that this dissertation is my own
unaided work.
3. Parallel XML parser - 3 -
Abstract
XML is a standardized and widely adopted markup language designed for data
exchange and storage. To use data from an XML document, applications typically
need to use an XML parser. The parser will be responsible for reading the XML file or
stream and provides XML data and structure to the application. Many programming
API and framework for processing XML files exist, and among the most used are DOM
and SAX.
In the last few years, the trend in the computer industry is to increase the number of
processors (or core within a processor) in computers rather than increasing the
processing speed. This fact is marking a fundamental change on software design and
applications development: whenever applicable, Software engineers should design
programs in ways that exploit the multiple processing resources available explicitly.
This dissertation presents a design and implementation of a SAX compliant XML
parser. The goal is certainly to make a faster parsing, but instead of counting on the
sequential processing speed increase, the parser will achieve an overall speedup
improvement by using multiple threads to read the same XML document concurrently.
4. Parallel XML parser - 4 -
Contents
1 Introduction ....................................................................................... 7
1.1 Motivation ...................................................................................................................7
1.1.1 Computer design trend...........................................................................................7
1.1.2 The need for faster XML parsing............................................................................8
1.1.3 Parallel parsing at the rescue.................................................................................9
1.2 Objectives.................................................................................................................10
1.2.1 Throughput ...........................................................................................................10
1.2.2 Concurrency support ............................................................................................11
1.2.3 Scalability .............................................................................................................11
1.3 Challenges ...............................................................................................................11
1.3.1 Synchronization and speedup ..............................................................................12
1.3.2 Programming languages support of concurrency.................................................12
1.4 Organization of the thesis.........................................................................................13
2 Background ..................................................................................... 15
2.1 The extensible markup language (XML) ..................................................................15
2.1.1 XML document representations............................................................................15
2.1.2 XML text and logical structure ..............................................................................15
2.1.3 XML tree concepts................................................................................................17
2.1.4 Well-formedness and validation constraints.........................................................18
2.1.5 XML standard and compliance.............................................................................19
2.1.6 Character's encoding, BOM and Unicode standards............................................19
2.2 XML processing........................................................................................................20
2.2.1 The Document Object Model (DOM)....................................................................20
2.2.2 A Simple API for XML processing (SAX)..............................................................21
2.2.3 SAX specification and language binding ..............................................................22
2.3 Elements of program optimization............................................................................23
2.3.1 Computer organization and its evolution ..............................................................23
2.3.2 Computer program performance goals.................................................................24
2.3.3 Branch optimization..............................................................................................24
2.3.4 Cache optimization...............................................................................................25
2.3.5 Principle of Locality and Common case ...............................................................26
2.4 Elements of program concurrency............................................................................26
2.4.1 Threads and cores................................................................................................26
2.4.2 Synchronization and concurrent objects...............................................................27
2.4.3 Lock-free and lock-based synchronization ...........................................................28
2.4.4 Speedup and thread concurrency ........................................................................29
2.5 Design patterns ........................................................................................................30
2.5.1 Strategy pattern....................................................................................................30
2.5.2 Observer pattern...................................................................................................31
2.5.3 Active object pattern.............................................................................................31
2.5.4 Monitor pattern .....................................................................................................31
5. Parallel XML parser - 5 -
2.5.5 Thread pool pattern ..............................................................................................32
2.5.6 Thread-local storage pattern ................................................................................32
2.6 Putting it all together.................................................................................................32
3 Design and implementation ........................................................... 34
3.1 Introduction...............................................................................................................34
3.2 Fundamental concept...............................................................................................35
3.2.1 Scanner types and chunk allocation.....................................................................35
3.2.2 Parsing properties and parsing modes.................................................................37
3.2.3 From bytes to SAX events....................................................................................38
3.3 Class relationship and interaction ............................................................................39
3.3.1 SAX classes .........................................................................................................39
3.3.2 PXML classes.......................................................................................................39
3.3.3 Concurrency classes ............................................................................................41
3.3.4 Class interaction and scanning loops...................................................................41
3.4 Implementation of SAX classes................................................................................43
3.4.1 Characters, String and C++ binding .....................................................................43
3.4.2 XMLReaderImpl class ..........................................................................................44
3.5 Implementation of PXML classes .............................................................................50
3.5.1 XmlTranscoder class............................................................................................50
3.5.2 TranscoderUtf8 class............................................................................................50
3.5.3 XmlScanner class.................................................................................................51
3.5.4 ChunkingScanner and ParsingScanner classes ..................................................52
3.6 Implementation of concurrency classes ...................................................................55
3.6.1 ChunkContext class..............................................................................................55
3.6.2 ThreadSafeQueue class.......................................................................................56
3.6.3 ThreadPool class..................................................................................................58
3.7 PXmlCount test program ..........................................................................................60
4 Evaluation ........................................................................................ 61
4.1 Evaluation objectives................................................................................................61
4.1.1 Speedup and elapsed time...................................................................................61
4.1.2 Performance optimization metrics ........................................................................61
4.2 Measurement collection ...........................................................................................62
4.2.1 Profiler and test program......................................................................................62
4.2.2 Accuracy and elapsed time ..................................................................................63
4.2.3 Test files ...............................................................................................................64
4.2.4 Test platforms.......................................................................................................64
4.3 Measurement results................................................................................................65
4.3.1 Observing parsing speed improvement with the PXmlCount program.................65
4.3.2 Observing parsing speed improvement with the Intel VTune Amplifier................67
4.4 Speedup evaluation..................................................................................................69
4.4.1 EnhancementFraction ..........................................................................................69
4.4.2 ImprovementRatio ................................................................................................69
4.4.3 Speedup ...............................................................................................................70
6. Parallel XML parser - 6 -
4.5 Hotspots and bottlenecks location............................................................................71
4.5.1 Hotspots ...............................................................................................................71
4.5.2 Synchronization bottleneck...................................................................................72
4.5.3 Memory and processor bottlenecks......................................................................73
4.6 Effects of parsing properties on performance...........................................................74
4.6.1 Pool configuration.................................................................................................74
4.6.2 Chunking depth ....................................................................................................75
4.6.3 Siblings per chunk ................................................................................................76
5 Reflection and conclusions............................................................ 78
5.1 PXML library integration to XML projects .................................................................78
5.2 Lessons learned .......................................................................................................78
5.3 Conclusion................................................................................................................79
5.4 Further research directions ......................................................................................80
5.4.1 Dynamic reconfiguration of parsing properties.....................................................80
5.4.2 Parsing based on XML schema............................................................................81
5.4.3 Lock-free synchronization.....................................................................................81
6 Bibliography .................................................................................... 82
7 Appendices...................................................................................... 85
7.1 PXmlCount program.................................................................................................85
7.1.1 PXmlSpinLock.hpp ...............................................................................................85
7.1.2 PXmlSpinLock.cpp ...............................................................................................86
7.1.3 PXmlCountHandler.hpp........................................................................................87
7.1.4 PXmlCountHandler.cpp (Part I)............................................................................88
7.1.5 PXmlCountHandler.cpp (Part II)...........................................................................89
7.1.6 PXmlCount.cpp (Part I).........................................................................................90
7.1.7 PXmlCount.cpp (Part II)........................................................................................91
7.2 PXML Character and String .....................................................................................92
7.2.1 XmlChar class ......................................................................................................92
7.2.2 XmlBuffer class.....................................................................................................93
7.3 XmlScanner states enumeration ..............................................................................94
7.4 ThreadJoiner and ChunkTask classes .....................................................................95
7.5 Other XMLReaderImpl methods...............................................................................96
7. Parallel XML parser - 7 -
1 Introduction
This chapter discusses the motivation behind this work; it presents the thesis objectives and
explains the challenges around them. Finally, it describes the thesis organization for the
remaining chapters.
1.1 Motivation
1.1.1 Computer design trend
Traditionally, computer performance depended mostly on the CPU clock speed
increase, the execution optimization and the refinement in memory organization [1 p.
665]. Faster clock speed, new CPU optimization techniques or better memory model
has meant a ‘de facto’ performance increase for a computer system and all its
programs.
Today that free benefit is over1
; for a decade now there is a fundamental change in the
computer industry that has pushed processor designers to a new design approach to
improving computers performance: placing multiple processors on the same chip [2 p.
344].
One of the reasons of this turnaround was that designers were not able to increase the
processor clock speed further due to physical limitations. Principally the high density of
micro components preventing power dissipation and the interconnect wires causing RC
delay (increase of the resistance R because they shrink in size and increase of the
capacitance C because they get closer to each other) [2 p. 19].
Although recent researches have proven that these limitations can be overcome
considering the arrival of better alternatives to silicon, such as the graphene [3], there is
an ultimate limit to the processor clock speed: the speed of the light [4].
The evolution of the memory hierarchy and caching techniques has conducted to
placement of a part of the computer memory just near the processor [1 p. 674]. Now the
CPU and memory are influencing each other so that improving the performance
optimization of one of them in isolation become impractical; typically, the memory speed
lags the processor speed. This phenomenon, called the memory wall, is just another
reason that convinced designers about the adoption of multi-core processor [5].
As predicted by the Moore Law, the number of transistors on a single chip continued to
grow exponentially due to the decreased price, leading to more complex but cheaper
CPUs. Multi-core organization quickly became the only way to build better-performing
1
This refers to the widely-cited essay “The free lunch is over” from Herb Sutter, who was among the first to
describe the change to exploit parallel hardware in the software world [61]
8. Parallel XML parser - 8 -
computers, and therefore parallelism as the most cost effective way to achieve better
program throughput.
This change has come with a number of improvements but also introduced new kind of
complexity in programs design. The industry has given a solution for performance
increase, but effectively obtaining performance in programs is today a programmer’s
burden: he has to program the multiple cores explicitly to take advantage of them.
That turned out to be a difficult exercise.
Multiple problems arise, most of them related to the added learning curve introduced by
these new concepts. The proliferation of concurrency counter-part of traditional design
patterns [6][7] is an example showing that help is needed by programmers to achieve
better-performing concurrent constructs.
Today, more than ever, the awareness of multi-core computer architecture and the
concurrency theory and practices are crucial to software engineers, system designers
and application programmers.
This thesis will illustrate this fact. Early chapters will have to dip the reader into the
complex and yet evolving area of multi-threading, beside a thorough exploration of
recent advances in processor and memory optimization.
1.1.2 The need for faster XML parsing
XML is a markup language widely used today to store and exchange business critical
information. One of the reasons behind its popular adoption is its simplicity, claimed
mostly because the language is self-describing and has both human- and machine-
readable format.
The XML language is extensively verbose, mostly in order to provide the human-
readability capability. For instance in the below XML document extract, the number of
characters used for the tags (see § 2.1.2 for tags), giving a contextual meaning to the
"content", is more than the number of characters in the "content" itself.
<tag>content</tag>
In a large XML document, the markups represent an important part of the overall size
and introduce substantial burden when it comes to processing that XML document with
machines.
Web Services and SOAP are examples of applications and protocols failing to reach a
satisfactory level of their most important performance requirement, which is the
response time, because of their use of XML [8] [9] [10]. When transferring large XML
data using web services, an XML parser needs to process the data on the client side.
9. Parallel XML parser - 9 -
The time required that is to process the data is affecting customer experience of the
service as soon as data reaches a certain size.
In this regards, JSON is a data interchange format repeatedly cited as a better
alternative to XML [11] [12]. The equivalent JSON format for the above XML document
extract will contain fewer characters and at large scale the difference will be noticeable.
Another overhead introduced in XML processing is the validation step. An XML
document that require conformance with a defined schema needs to be validated, on
top of being well-formed (see § 2.1.4). The validation step constitutes a considerable
burden for XML processing, many XML parser libraries, especially those claiming to be
‘fast’ (see Table 1-1, validation support), just do not include validation feature.
Generalizing the traditional relational model of database to XML is widely considered
today [13 p. 240] and XML database and XML-related database technologies are getting
a large adoption (XQuery, Oracle XML DB [14]). However, they all have efficiency
concerns directly or indirectly linked to the overhead introduced by the processing of
XML. One objection of using XML in database technologies has been for long the
higher overhead of processing it [15].
1.1.3 Parallel parsing at the rescue
Many researchers have considered increasing the XML parsing speed though
concurrency, and many have identified the arrival of multi-cores processors as an
opportunity to achieve a spectacular improvement in XML processing [16], [17], [18],
[19].
Authors in [19], among others, already suggested concurrently parsing pre-divided (pre-
parsed) chunks of an XML document, in a conquer-and-divide fashion, in order to
increase the parsing speed. They focussed on obtaining the DOM-like skeleton of the
full document (see § 2.2.1 for DOM) in the pre-parsing step, before processing the
produced chunks in parallel, while in this dissertation pre-parsing will take place in
parallel too.
Surprisingly none of the suggested technologies has affected the world of XML
processing, to the best of our knowledge; among the most widely used parser libraries
today, none has apparently adopted any of the above-referenced technologies or
methods.
Today's parser libraries remain inherently in their single-threaded version with no direct
support for concurrency. Some of them (see Table 1-1, concurrency support) offer a
limited support to add concurrency capability, but leaving the entire burden to the
programmer.
The Table 1-1 on the next page lists some of the popular XML parser libraries, their
compliance to XML, DOM and SAX specifications, and their support of concurrency and
validation.
10. Parallel XML parser - 10 -
XML parser
library
Language Style and
features
Compliance Validation
support
Concurrency
support
XML SAX DOM
Xerces [20] Java, C++ The most
compliant
1.0 &
1.1
2.0 Up to
Level 3
Yes No direct [21]
Libxml2 [22] C, C++ Partial SAX
and DOM
1.0 No No Yes No direct [23]
RapidXML
[24]
C++ DOM like,
fast (in situ)
Partial No No No No
Expat [25] C SAX-like,
popular
1.0 No No No No
TinyXML [26] C++ DOM like,
small size
Partial No No No No
Table 1-1 Popular XML parser libraries
1.2 Objectives
The objective of this work is the design and implementation of a SAX compliant XML parser
library, PXML, which uses parallel programming techniques to increase the processing
speed. The library aims to provide support to its users to take advantage of the multiple
processors available for parsing XML documents.
The developed algorithm will consist of cutting an XML document in chunks and parse the
chunks concurrently, with the cutting (or ‘chunking’) and the parsing occurring in parallel.
During the parsing, SAX events will be concurrently available to the library user, who will
eventually (not necessarily) use appropriate synchronization techniques in order to
consume those events.
Main aims are to improve throughput, provide concurrency support and offer
scalability.
1.2.1 Throughput
Computer architecture have improved very match lately; it is not possible to perform an
advanced program optimization without the knowledge of concepts such as branching or
caching and their impact on the program’s performance.
Their ignorance can drastically decrease the performance of an application, without
debugging or traditional troubleshooting being of any help in finding the cause.
Conversely mastering it may bring spectacular improvement on programs.
11. Parallel XML parser - 11 -
This work achieved a faster speed in parsing XML documents thanks to the knowledge
of computer architecture and organization, primarily the factors influencing programs
performance.
1.2.2 Concurrency support
The PXML library complies with the SAX specification for parsing XML documents; it
adds the concurrency support as a set of properties (as suggested by the SAX standard
[27]) on top of the specification.
The library gives the programmer opportunity to choose between the below three modes
of operation, the one that best fits the application domain:
1. Single-threaded
2. Multi-threaded manual (the user explicitly set the number of threads to use)
3. Multi-threaded automatic (the PXML library choose how many threads to use
according to the concurrency capability of the platform).
The library abstracts the hard concepts of concurrency internally but still gives the
programmer the opportunity to tune the parser behaviour at will. The library is easy to
reason about, it allies concurrency concepts and XML processing in a more naturally
way.
1.2.3 Scalability
The number of cores ranges from two to eight in today’s personal computers and up to
32 in servers, and this number is predicted to increase. A program made for 4-core
processors may need a redesign when it comes to run it on a 16-core computer, or
when the 16-core processors will be the standards in personal computers.
The designed parser aims to be scalable, meaning able to increase the parsing speed
seamlessly with the number of cores available in the platform where it is running,
without the programmer doing any additional coding for it.
The PXML parser library features that capability thanks to the multi-threaded automatic
mode. For the same parsing properties, the parsing improvement will be higher on a
computer with more CPU cores.
1.3 Challenges
There are a number of challenges to this project, the primary one being the difficulty to
apply concurrency correctly without affecting the system, and the real benefit obtained in
case of a successful application.
12. Parallel XML parser - 12 -
In addition, the fact that programming language support of concurrency is not a trivial issue
in most programming language does not make this task easy.
1.3.1 Synchronization and speedup
A concurrent program is a program made up of several entities that cooperate to a
common goal [28 p. vi]. Doing so they have to access shared resources on the
computer, but in order to keep these resources in a consistent state, the entities have to
synchronize their access to them. Many have agreed that this is hard to achieve. M.
Herlihy and N. Shavit refer to exploiting parallelism as one of the outstanding challenges
of modern Computer Science [29 p. 1].
The expected performance increase for programs is in most cases a faster program
execution or an increased throughput2
. However, the improvement brought by
concurrency is a potential rather than a guaranteed benefit in this regard, because the
overall performance increase (how much faster is the program) depends more on the
way concurrency have been applied to achieve it than simply on the number of
additional processors available.
Because both applying and getting benefit from concurrency are difficult, this thesis
considers synchronization and speedup the two fundamental concepts to master when
going for concurrency.
Synchronization (see § 2.4.2) is what one do to ensure program objects remain in a
consistent state. However, it has the disadvantage of introducing additional hurdles to
the overall throughput of the application, due to the relatively high cost of implementing
it.
Speedup (see § 2.4.4) is a measure of the overall improvement brought by the
concurrency to the program. Succeeding or not in making the program parallel, a
speedup equal to 1 means there was no improvement.
1.3.2 Programming languages support of concurrency
Another challenge, common to all those trying to dig into concurrency, but especially
programmers, is the fact that programming language support of concurrency for major
programming languages was slow to be effective; it is still in perpetual evolution today.
Java added concurrency utilities (with the java.util.concurent and other packages) in
Java 5 only [30]; the recent Java 8 introduced further support on concurrency with
libraries for parallel operations and concurrent accumulator [31]. Similarly, the Microsoft
2
Another goal when applying concurrency is in the domain of “separation of concern”, where each core
or thread is dedicated to an specific task, not tightly related to each other, for instance in the domain of
GUI programming. This aspect of concurrency is not discussed in this dissertation.
13. Parallel XML parser - 13 -
.Net framework (with the C# programming language) introduced concurrency support
only in its version 4.
The C++ programming language ignored even the existence of threads and atomic
operations up to the latest edition of the C++ standard [32] where concurrency support
increased both at language level and library level; and yet some features are expected
to incorporate the coming version [33].
Today there is a comprehensive support of concurrency in major programming
languages, but the learning curve remains important and the adoption is still slow. Some
important languages such as JavaScript does not contain any threading mechanism at
core level and limited support just started to be mentioned with ‘web workers’ [34 p.
322].
Some languages have built-in support for concurrency. One of the most widely used is
the Erlang programming language, which uses a message passing concurrency model
and claim to being easier to reason about and more robust in its implementation [35 pp.
1-14]. Unfortunately, this concurrency model does not fit lower level task like processing
XML documents.
Java has traditionally been the chosen language for XML and SAX specification, but
C++ is a more low-level language fitting for the task of processing files and streams and
provides a finer-grained control of the memory. C++ is the development language of
major web browsers’ rendering engine that also processes XML (Gecko, Blink, Trident,
and Webkit). Table 1-1 shows how C++ is the favoured language for XML parser
libraries.
With the recently added library support for concurrency, C++ features most concurrency
concepts essential to the realization of this thesis, within both the language and the
standard library, and deserved to be the chosen language for this project.
1.4 Organization of the thesis
The organization of the remaining chapter is as follows:
Chapter 2 (Background) begins with an introduction to XML and SAX concepts. For XML, it
discuss textual representation and tree representation of an XML document, XML
specifications, XML validation, XML characters and Unicode support. For SAX, it presents
the API, primarily in comparison to the DOM model, then focus on the SAX specification
and language binding.
The chapter continues with a description of selected terms of program optimization and
those of concurrency. It discusses principally relevant concepts, methods and techniques
used in this work. Finally, it briefly describes some design patterns used in the design and
implementation of the proposed parser.
14. Parallel XML parser - 14 -
Chapter 3 (Design and implementation) introduces the concept of chunking and parsing
scanners, the basis of the proposed PXML parser library. Then it discusses in details the
design and implementation of following parts of the parser:
- The SAX classes, which ensures the conformance to the SAX specification.
- The PXML classes, building blocks of the central library concepts
- The XML Reader implementation, which contains the parsing algorithm and
- The concurrency classes, essential constructs of the concurrency support
Chapter 4 (Evaluation) conduct the measurements of primary aspects of the PXML parser
performance, essentially the speedup. It also presents the metrics used for performance
improvement. The chapter discusses hotspots, thread concurrency, synchronization, CPU
usage and memory issues.
The chapter ends with a review of the parsing properties and their effect on the parser
performance, supported with measurement results.
Chapter 5 (Reflexion and conclusions) discusses some aspects of the realization of this
dissertation, such as the integration of the proposed library into XML projects and the
application of principles and algorithms from this thesis in a broader context. It is also the
conclusion of the thesis.
Bibliography and references are in chapter 6.
The appendices are in chapter 7. The first section presents the PXmlCount program; a full
program based on the PXML library, and used in this work as a test program to count the
number of elements and characters of an XML document.
15. Parallel XML parser - 15 -
2 Background
2.1 The extensible markup language (XML)
XML (eXtensible Markup Language) is a framework for defining markup languages. It is a
vast subject and introduction to it is impractical in the context of this work. A full introduction
to XML is available in [36] but [13] provide a more concise description of XML and related
technologies.
The XML standard [37] provides the complete XML specification and compliance, but it is
difficult to assimilate; this work recommends an XML course such as the one given in the
Software Engineering Programme [38] of the University of Oxford for comprehensive
understanding of XML concepts.
There are thousands of technologies and applications related to XML. This thesis will
mention exclusively those related to XML documents definition (DTD, Schema) and those
related to XML processing (XPath, XSLT, XQuery), discussing only concepts primordial to
the understanding of this thesis.
2.1.1 XML document representations
An XML document, in its textual representation (subsequently referred in this
dissertation as XML text), is made of a sequence of balanced and properly nested
markups and text fragments. However, conceptually, an XML document is equivalent to
a hierarchical tree structure called XML tree.
Listing 2-1 represents an XML text. It is a modified version of the TourAgency.xml
document found in the XML module of the SEP in Oxford [38 p. Exercises]. The XML
text includes annotations (texts after the sign) to identify key markups.
The figure 2.1 is another representation of the same XML text, but in its conceptual
form, as an XML tree.
2.1.2 XML text and logical structure
In the textual representation of the XML document, one can readily identify tags,
constituted of a name between an opening (<) and closing (>) bracket. Such tags
typically come in two flavours, a start-tag such as <hotel> and its corresponding end-
tag, that differs from the start-tag by the presence of a slash just after the opening
bracket, such as </hotel>.
An XML element includes the start-tag, its matching end-tag, and content between the
two. A particular type of element is the empty-element-tag such as <flat/>.
16. Parallel XML parser - 16 -
<?xml version="1.0" encoding="UTF‐8"?> XML declaration
<!DOCTYPE MyTourAgency SYSTEM "MyTourAgency.dtd"> Document type declaration
<MyTourAgency> Root element
<rating stars="2"> Element (rating)
<pool>true</pool>
<room_service>true</room_service>
</rating>
<rating stars="3">
<pool>true</pool>
<sauna>true</sauna>
</rating>
<country name="Bulgaria">
<resort name="Borovet">
<hotel name="Rila">500</hotel>
<flat>200</flat>
<lowSeasonRent>
<nbrDay>6</nbrDay>
<banner>The whole week!</banner>
</lowSeasonRent>
</resort>
</country>
<country name="Andorra">
<resort name="Pas De La Casa"> Start‐tag (resort)
<hotel name="Bovit">300</hotel>
<flat/> Empty‐element‐tag (flat)
</resort> End‐tag (resort)
<resort name="Soldeu / El tartar">
<info><![CDATA[Best restaurant]]></info> CDATA section
<info>Serial id is Character Data
3163<6475</info> Entity reference
<hotel rate="2">500</hotel>
<flat>200</flat>
</resort>
</country>
Ignorable white space
<!‐‐ countries and rating ‐‐> Comment
<?php printf("starting with ratings")?> Processing instruction
</MyTourAgency>
Listing 2-1 Textual representation of an XML document (with annotations)
Elements may have simple name/values pairs associated with them called attributes.
They usually identify or give more information about the element.
An XML production specifies a sequence of markups or other productions upon which
substitution can be recursively performed to generate new markup sequences3
.
The entire XML text is just a production called document and defined in the XML
standard as:
document ::= prolog element Misc*
3
The term ‘production’ comes from production rules used for grammar generation such as context-free
grammar.
17. Parallel XML parser - 17 -
The prolog is the first production of an XML document. It contains the XML declaration
and the Document type declaration productions (see annotations on Listing 2.1). The
XML declaration specifies the XML version to which it conforms (see § 2.1.5), and the
character encoding (see § 2.1.6) being used. The document type declaration belongs to
the built-in schema language (see § 2.1.4).
Anything between an element start-tag and end-tag is its content. The content of an
element consist of intermingled character data production (CharData) and any of the
element, processing instruction (PI), comment, the entity reference, character reference
or CDATA section (CDSect) production. The XML standard defines the content as
content ::= CharData? (
(element | EntityRef | CharRef |
CDSect | PI | Comment)
CharData?)
CDSect (also referred as CDATA section) is used to escape blocks of text between the
string ‘<![CDATA[‘ and the string ‘]]>‘, anything in-between is just pure text and should
not be processed as markup by the parser.
Entity reference allows representation of some characters that have a meaning in XML
using appropriate escape sequences to prevent the parser interpret them as markups.
For instance to represent the opening bracket ‘<’ within the content, one will place the
entity for this character between characters ‘&’ and ‘;’. So the sequence ‘<’ will be
interpreted by the parser as a single ‘<’ character.
Character reference allows representation of characters by the hexadecimal
representation of their code point between ‘&#’ and ‘;’. For instance, the sequence
‘<’ will be replaced by ‘<’ within an XML content.
Comments and Processing instructions are not part of the XML structure but are
destined respectively to the human reader and to the XML processor. They are XML
productions that can be themselves parts of the Misc production (see listing 2.1).
2.1.3 XML tree concepts
In the tree representation, a node is the counterpart of the element in the XML text, and
the root node corresponds to the first element in the textual representation, the root
element.
The tree theory defines a path as a sequence of nodes connected by edges [39 p. 10].
Please note path means here the shortest path, which is a path that does not repeat
nodes.
The depth of a node is the number of edges on its path to the root node; it is like its
‘distance’ to the root element, making the root element being of depth 0. In the XML text
18. Parallel XML parser - 18 -
(listing 2.1), the depth of an element is somewhat corresponding to its indentation level;
in the XML tree all nodes with the same depth are assembled inside horizontal dotted
lines.
MyTourAgency
rating rating country
sauna
pool
Room_service resort
country country
resort resort resort
hotel flat lowSeasonRent
bannernbrDay
hotel hotel hotel flat
name=Romania
stars=3
name=Bulgaria
stars=2
name=Andorra
name=Borovet
flat
name=Rila
Rate=5
name=Bovit
depth 1
depth 2
depth 3
depth 4
depth 0
root node
siblings
leaves
= path
edge
descendants of ‘Andorra’
Figure 2-1 Tree representation of XML document
Two nodes refer to each other as parent and child if the path between them does not
contain another node, the parent being the element closest to the root node. The child
depth is always +1 the parent depth.
A group of nodes are siblings when they are of the same depth and have a common
parent. Sibling nodes being side by side in the tree refer to each other as preceding-
sibling (left-hand) and following-sibling (right-hand) [13 p. 62].
The descendants of a node are the set of nodes that have this node on their paths to
the root node. The ancestors of a node are the nodes found in the path from that node
to the root node.
2.1.4 Well-formedness and validation constraints
The World Wide Web Consortium defined the XML standard in term of productions and
constraints. The specification describes a list of constraints called well-formedness
constraint (WFC) and validation constraint (VC).
19. Parallel XML parser - 19 -
A text document is qualified as a well-formed XML document (or simply XML
document) when its textual representation matches the production in Listing 2.1, and it
satisfy all the well-formedness constraints.
An XML document is a valid XML document if it satisfies all the validation constraints
(in respect to a given schema) in addition to being well-formed.
An XML language is a particular family of XML documents complying with additional
syntactic and semantic rules. A schema is a formal definition of the syntaxes and
semantics of that XML language. A schema language is then a formal language for
expressing schemas [13 p. 92]. The validation of an XML document is equivalent to the
establishment of its conformance to the syntax and semantic of a schema language.
The most popular schema languages are DTD and XML schemas.
Document Type Definition (DTD) is an XML built-in schema language, its definition is
part of the XML specification. The Document type declaration that optionally specifies a
type and the location of a document containing other rules for validation of an XML
document is part of the DTD language.
XML Schema is another popular schema language considered as much more
elaborated than DTD. A number of limitations have been identified on DTD [13 p. 112]
and XML schema have been specially designed to overcome them and bring other
improvements. An XML schema document is itself an XML document and has thus the
advantage of being self-describing.
XML schema and DTD encouraged the creation of validating parsers, but the
validation step introduce a non-negligible overhead to the overall processing of the XML
document. That overhead is a pretext for many XML parsers to not incorporating the
validation feature.
2.1.5 XML standard and compliance
The W3C standard for XML is today in its version 1.1, but the previous version 1.0 is still
widely used and even recommended for general use. The main additions in the version
1.1 are about the compliance to the later version of the Unicode standard [40].
Many XML libraries limit their compliance to the version 1.0, as it is enough for major
applications. The proposed PXML parser also conforms to the 1.0 version of the XML
specification, excluding validation constraints.
2.1.6 Character's encoding, BOM and Unicode standards
The Unicode [41] is an international encoding standard for characters, texts and
symbols representation. The standards assign a unique integer value to each character
so that its use is unambiguous over multiple languages. The binary representation of the
20. Parallel XML parser - 20 -
character’s Unicode number is its encoding. The encoding of the XML text is the
Unicode encoding specification to use for its conversion from binary to the textual
format.
XML specification does not allow all existing Unicode characters within XML documents;
it defines a number of valid XML characters and excludes the remaining, as their
introduction will affect the document interpretation.
Inside XML text, only subsets of valid XML characters can appear within specific
markups. For instance, the character data production allows using a larger set of
character than the element.
UTF-8 and UTF-16 [42] are the most popular Unicode encoding specification. UTF-8 is
the most widely used on the web, among other reasons because of its similarity to the
old ASCII encoding; UTF-16 is widely adopted by many programming languages (Java)
and operating systems (Windows).
The first step on parsing an XML file is to transform the sequence of bytes forming the
document data to a sequence of characters in the specified encoding. Because the byte
ordering (or the endianness) of a file or stream is different from a machine type to
another, XML parsers should consider it in order to not produce incorrect output.
Some files or streams use a byte order mark or BOM as their first character, to help
XML and text processors use the right endianness when reading them.
2.2 XML processing
XML processing languages allows extraction of XML content and structure. XML languages
such as XPath, XSLT or XQuery are the most popular for processing XML documents.
Although these processing languages fulfil most needs, some processing may require a
particular way of parsing; XML programming languages help programmers to define their
processing mechanism of XML documents.
DOM and SAX are two of the most used of such languages. They are intrinsically different
in the way they proceed to process XML documents and are the prime representatives of
the two principal models of XML programming.
2.2.1 The Document Object Model (DOM)
DOM is a language-independent API for XML (and HTML) documents defined by the
W3C consortium and freely available [43]. It defines the logical structure of XML
documents and allows programmatically reading, manipulating, modifying and creating
them.
21. Parallel XML parser - 21 -
When parsing XML documents the DOM parser first reads the full document to build an
in-memory representation of it, from which it will perform subsequent operations.
The DOM representation is usually similar to the XML document presentation itself and
has the form of a tree. The API defines a set of interface, procedure and methods to
navigate over XML document elements or create and modify them.
The number one problem with DOM is the necessity to read the full document before
any further processing because this incurs an extra delay and consumes the system
memory. The DOM processing model does not fit some needs, such as when the
parsing speed is crucial, or when parsing large documents.
On the other side, DOM is very stable. Once the DOM processor loads the XML
document into memory, the programmer can freely go back and forth throughout the
document, which is not possible in SAX-based processing.
2.2.2 A Simple API for XML processing (SAX)
There are circumstances where it is not necessary to build the full structure of the XML
document in advance before processing it; the document can be processed while being
read.
SAX is an API for processing XML documents, providing an alternative to the DOM
mechanism [44]. It is an event-based API (also referred as a “push parser”), which
operates by reading the XML file or stream and triggering SAX events for each XML
entity that it recognizes as part of the SAX specification. It is a serial access mechanism
that process each markup sequentially and once.
An important consequence of SAX parsing is that it is stateless. Once it has processed
a markup, the parser may discard any information about it before proceeding with the
next markup.
This fact is both an advantage and a disadvantage for SAX users. It is a disadvantage
for the extra burden of saving state information being on him, with the possible
introduction of errors in the processing. It is an advantage for the liberty to focus only on
the part of the XML document of its interest, avoiding reading the full document into the
memory.
<resort name="Pas De La Casa"> startElement(resort)
<flat> startElement(flat)
300 characters(“300”)
</flat> endElement(flat)
</resort> endElement(resort)
Listing 2-2 XML portion and corresponding SAX events
22. Parallel XML parser - 22 -
Given the XML document portion on the previous page (Listing 2-2), the SAX parser
reading it will generate the indicated five events (see annotations) corresponding to the
XML markups it has recognized.
SAX usually suits to XML processing that focuses on information retrieval; when it
comes to manipulating the document structure, the DOM parser is often more
appropriate.
2.2.3 SAX specification and language binding
Unlike DOM, SAX is not from the W3C consortium; it has been developed by the XML-
Dev mailing list, with the participation of many contributors [45]. Because the first
implementation of SAX uses the Java programming language, and no formal
specification exists since then, the Java implementation of the SAX API is the ‘de facto’
standard [46].
The SAX specification, currently in its version 2.0, is an ensemble of Java classes and
interfaces that implementation need to extend to make a SAX parser, or that library
users need to implement to use the parser. These classes and interfaces fall into two
important groups:
- Parser designer interfaces: XMLReader and Attributes
- Parser user interfaces or handlers: ContentHandler, ErrorHandler, DTDHandler
and EntityResolver
There are other classes, but this thesis does not discuss them. Among them are
SAXException and SAXParseException for exception handling, InputSource for
processing XML document stream, and LexicalHandler class, part of the SAX 2
Extensions [47], used to provide lexical information about an XML document, such as
comments and CDATA section boundaries.
An implementation of a SAX parser is required to extend the parse designer interfaces
(XMLReader and Attributes) and leave the implementation of handlers to the library
users. The XMLReader has methods for parsing, setting features, properties and
handlers. The proposed PXML parser will provide concurrency support as an
implementation-specific parsing property (see § 3.2.2).
Users of a SAX library have to implement parser handlers’ callbacks functions in order
to use the parser.
Because programming languages differs in semantics, an implementation of the SAX
using a language other than Java may provide its language binding, which is its
equivalent of the SAX Java classes and interfaces.
The case of Java and C++ is notable principally because of a fundamental difference in
their memory management styles (Garbage collector for Java and RAII and smart
23. Parallel XML parser - 23 -
pointers for C++). The C++ implementation will principally focus on providing an efficient
mechanism for string creation, destruction and manipulation (see § 3.4.1).
2.3 Elements of program optimization
The evolution of computer technology has greatly influenced the techniques of program
performance optimization. Nowadays more than ever, knowledge of computer organization
and architecture is a prerequisite to achieving a successful program optimization.
2.3.1 Computer organization and its evolution
Computers traditionally consisted of four main structural elements: a central
processing unit (CPU), a main memory (M) and input-output components (I/O) for
data movements between the computer and its external environment [48 p. 28].
Today the hardware revolution and a number of modern functional requirements have
boosted the appearance of new waves of technologies that have modified the
organization of a computer, mainly on the CPU and memory.
Computers have now multiple processors or cores available on the same chip or
socket. The memory hierarchy and management have been further improved. One level
of the memory hierarchy, the cache, is now playing a role of uppermost importance.
The cache memory is made of different level of memory blocks decreasing in size as
they get close to the core. Upper-level memory blocks are each dedicated to one core
and all the cores share lower-level memory blocks.
Figure 2-2 Intel core i7 block diagram (from [1 p. 56])
24. Parallel XML parser - 24 -
2.3.2 Computer program performance goals
The main function of a computer is to execute programs. A program is a set of
instructions that the processor execute. In its simplest form, a single instruction
execution consists of the fetch stage and the execution stage, constituting the
instruction cycle [48 p. 31]. The clock rate is the speed at which the processor executes
instructions.
In real implementations, one instruction involves many of these instruction cycles; the
number of cycles per instruction (CPI) is an important metric of processor execution
performance, as it influences the CPU time or the overall time spent by the processor to
run a program.
Besides data processing operations inside the Arithmetic Logic unit and control
instructions, most instructions in the execution stage and the entire fetch stage are
memory access operations. The access of the CPU to the memory has been identified
to be a relatively expensive operation and has been the central bottleneck location for
the computer performance improvement.
The above facts and observations have made techniques and technology for
programmes performance improvement to focus on two principal goals:
- Increase throughput: increase the number of cycle per instruction and increase the
processor clock rate for a decreased CPU time
- Decrease latency: obtain an optimal access speed of the CPU to the memory
throughout the program execution
2.3.3 Branch optimization
Branch prediction is a technique used within processors to improve the flow of
instructions, with a more positive impact when used with a pipelined processor
(pipelining is a technique that exploits the capability of the processor to evaluate multiple
instructions in parallel [2 pp. 147, 261]). Branch prediction occurs when a program reach
a conditional instruction (if-then-else or switch).
The processor will try to identify the branch that is most likely to be taken, pre-fetches its
code and speculatively execute it, eventually discarding it when it turned out to be not
the branch effectively taken by the programme.
Although prediction techniques have proven to be often successful and resulted in
improved performances, there will be cases where the chosen branch will be the wrong
one (branch misprediction) and the delay or penalty incurred to the program will be
considerable.
25. Parallel XML parser - 25 -
Branch optimization is a technique of reducing branch misprediction. In general,
performing this optimization before design and implementation is considered a
premature optimization.
2.3.4 Cache optimization
Tradeoffs in the cost-performance-size of memory technologies have leaded to the
appearance of the memory hierarchy [2 p. 72], and the cache memory plays an
important role in performance improvement. The cache is a relatively fast, small in size
and expensive memory that is placed on the same chip as the processor, thus causing a
reduced latency when accessed by the CPU.
Registers
Main memory
Caches
Disk storage
Size
Speed Price
< 1 KB
> 1GB 5 ms
5 ns $$$$$$$$
$$
Figure 2-3 Trade-offs in the cost-performance-size of memory
Optimization of cache performance often brings more benefit on programs than other
techniques of optimization. There are many of such caches optimization techniques [2 p.
78], but they all usually influence on few important metrics of caches optimization.
The hit ratio is a cache optimization metric. It represents the number of memory
references that hit the cache over the total number of memory references.
The miss rate is the equivalent metric, i.e. miss rate = 1- hit ratio. One important class
of miss rate is the compulsory miss rate that happens at the very first access of a
memory block, because the block has not previously been referenced.
LLC miss is a miss that occurs for the last-level layer of the cache memory; it is the one
incurring the most performance degradation.
Knowing the number of cache levels, the cache size, and the cache layout of a
processor can help tailor a program that will run efficiently on computers with such
26. Parallel XML parser - 26 -
processor. The problem with this approach is that the program might be inefficient when
running on a different type of processor.
Cache-oblivious optimization refers to the practice of cache optimization based on
general principles of caches such as the principle of locality (see § 2.3.5) or other
techniques such as divide-and-conquer, rather than based on a particular cache
configuration or size.
2.3.5 Principle of Locality and Common case
Programs tend to reuse data and instructions they have used recently [2 p. 45]. This
affirmation came from observations on programs showing that a program usually
spends 90% of its execution time in only 10% of the code (90/10 rule), which is the
program hotspot.
There are two types of locality: temporal locality concerns the code or data recently
accessed, and spatial locality (or locality of reference) is about data or code addresses
that are near one another.
Cache optimizations techniques heavily rely on this property. It is possible for an
advised programmer to increase performance of the program only by appropriately
exploiting this valuable property.
One of such situation is with loop-based structures. Because a programme with a loop
will likely return to the same code portion many times, appropriate use of the principle of
locality will often trigger unexpected performance improvements.
A similar principle referred as common case by [2 p. 45] is a design principle that
favours the frequent case over the infrequent case when deciding about performance
tradeoffs. When optimizing an algorithm, it is often more benefit to identify the frequent
case (such as a branch often taken in a switch conditional statement) and focus on
optimizing that case in priority.
2.4 Elements of program concurrency
2.4.1 Threads and cores
In a computer, a unit of sequential processing can be a thread or process. The process
is more ‘robust’ because it benefit from the operating system support in terms of
security; it encapsulate/protect all its internal structures and execute within its memory
space, while a thread execute in a memory space shared with other threads.
A program is a multi-process program or concurrent program if it allows execution of
more than one process or thread in parallel. As the program aims to a unique goal,
27. Parallel XML parser - 27 -
usually these processes or threads have to communicate. The communication among
processes is typically achieved through message passing while threads communicate
through shared memory, a memory location they can all access, used as a medium for
their communication.
This thesis will focus on threads and shared memory, and the term concurrent program
will be preferred to designate a multi-process program.
A computer is a multiprocessor or multi-core processor computer when it has more
than one processor. Concurrent programs fit well on multiprocessor computers, but they
can also run on a single processor computer; in such case, the operating system will
arrange so that the processor serve both threads successively in turn.
A context switch occurs when the operating system switches the processor from one
thread to another. Context switches are expensive (in term of CPU time) and so often
contribute considerably to the overall performance degradation. With many cores, a
context switch may still occur, although will less probability; hopefully its impact can be
considerably reduced if the program is correctly made so that threads are equally
distributed among processors and their cache memory.
On multiprocessor computers, access to data become hard to manage in the presence
of concurrent programs and is often the cause of hurdles and bottlenecks. For instance
access contention is one of the issues, it happens when data written by one thread is
read by another thread on another core; access contention impact is reflected on cache
memory and cause problems such as false sharing.
A typical case of false sharing is when two algorithms running in parallel on different
core use two variables logically separated but placed by inadvertence in memory
location near one to another. The caching algorithm, by the principle of spatial locality
(see § 2.3.5), will always try to treat them together, forcing them to move from one
dedicated cache to another, thus increasing the miss rate [2 p. 366].
2.4.2 Synchronization and concurrent objects
In a concurrent program, two given threads competing or cooperating to a common goal
may need to access the same space called shared memory, but an inappropriate
access will affect the programme integrity or consistency and leads to a hazardous
situation called race condition.
This problem, identified as the mutual exclusion problem, was solved many years ago
from now by E. W. Dijkstra [49], who provided synchronization as the solution to the
mutual exclusion problem.
Synchronization is a set of rules and mechanism that allows specification and
implementation of concurrent programs whose execution are guaranteed to be
correct[28 p. 5], or to have a degree of correctness called liveliness or progress
conditions[28 p. 137].
28. Parallel XML parser - 28 -
At a certain level of abstraction, a program is made of elements that participate to the
execution of that program; in some programming paradigm, they are referred to as
objects. A concurrent object is an object that can be safely accessed concurrently by
several threads without requiring an explicit synchronization; the object is said to be
thread safe.
A mutex is an example of such concurrent objects. It defines a lock and unlock methods
that can be called by many threads. Once one thread call the lock method of a mutex, it
has acquired that mutex; any other thread trying to acquire it will block (its execution is
suspended by the CPU) until the mutex is released by the thread that locked it calling its
unlock method.
The mutex can be used to ensure that only one thread enter the area of the program
code between its lock and its unlock method, making that region to be referred to as a
critical section.
A condition variable is also a concurrent object. Its C++ specification defines methods
wait, notify_one and notify_all that can be called by one or may threads. Any thread that
calls a condition variable’s wait method is blocked. Threads blocked by a condition
variable are said to be waiting as they can be unblocked only upon a defined condition
being met. The condition can be either that a predicate previously assigned to the
condition variable to be verified or that a notification for release to be sent to the
condition variable from another thread, by calling its notify_one method (to unblock only
one of the waiting threads) or its notify_all method (to unblock all the waiting threads).
Mutex and condition variable are part of a type of concurrent objects called
synchronization primitives as they can be used to build synchronization constructs
(see § 2.4.3 below).
The C++ language provides classes to work with concurrent objects (std::mutex and
std::condition_variable) and threads (std::thread).
The std::thread class allow creation and manipulation of threads. One of its methods
is the join method, used to synchronize execution of threads. In practice this method is
used to make a thread wait for another thread to complete, before continuing its
execution (see the ThreadJoiner class on § 3.6.3).
2.4.3 Lock-free and lock-based synchronization
Synchronization can be implemented in term of concurrent objects. There are two types
of synchronization depending on the concurrent object used to implement it, both types
having a set of related progress conditions.
Lock-based synchronization consists on providing a synchronization object called
lock that allows a zone of the code to be bracketed to guarantee that a single process
at a time can execute it. It is based on mutex and their critical section (see § 2.4.2).
29. Parallel XML parser - 29 -
When a thread is blocked due to synchronization reasons (or for other reasons such as
memory access latency), its execution is suspended by the CPU; it create an idle time
called wait time that is usually not desirable as the processor considers it a waste of
time.
There are cases where the wait time consumes CPU time. That is the case in the
implementation of spin lock, where an object is trying to acquire the mutex repeatedly
and so remains in a busy wait state. Spin lock a proven to be more efficient than
traditional locks on cases where critical section exclusivity is required only for a short
time.
There are two progress conditions (see § 2.4.2) that can be associated with lock-based
synchronizations: deadlock-freedom and starvation-freedom. In other words, deadlock
and starvation are the two main issues of lock-based synchronizations
Lock-free synchronization is based on atomic registers or hardware-provided primitive
operations (e.g. compare & swap). The following progress conditions can be associated
to lock-free synchronizations: obstruction-freedom, non-blocking and wait-freedom,
which is the highest level of correctness of a synchronization technique can achieve.
2.4.4 Speedup and thread concurrency
Concurrency makes programs run faster by improving their throughput. The speedup
tells how much faster a program will run with concurrency applied as opposed to its
single-threaded version [2 p. 46]; it allows estimating the benefit of applying concurrency
to a program. Let us consider two parameters:
- EnhancementFraction. The enhancement fraction, which is the fraction of the single-
threaded version that can be converted to run with multiple threads
- ImprovementRatio. The improvement ratio, which is the time of running a single-
threaded program compared to the time that will be spent for running the same
program using multiple threads.
The speedup formula is
1
Speedup = ----------------------------------------------------------------------------
EnhancementFraction
(1 – EnhancementFraction) + -------------------------------
ImprovementRatio
The speedup formula is defined from the Amdahl's law. The law states that the
performance improvement to be gained from using some faster mode of execution of a
program is limited by the fraction of the time the faster mode can be used [2].
30. Parallel XML parser - 30 -
For a concurrent program, one obtains the best ImprovementRatio when the maximum
number of threads is effectively running in parallel, and all threads are performing a
useful work.
EnhancementFraction is playing an important role in the speedup equation as it can limit
possibility of program improvement for any given ImprovementRatio if its value is small,
and can increase the potentiality of a better speedup as its value is approaching 1.
2.5 Design patterns
The multithreading revolution has led to identification new design patterns [6], [7]. Some
are concurrency counter-part of traditional design patterns described by the Gang of four
[50], other simply new design patterns specific to parallel computing.
This section describes two traditional design patterns and four concurrency design patterns
used in this work.
2.5.1 Strategy pattern
The intent of the strategy pattern is to define a family of algorithms and make them
interchangeable; one of the motivations is that different algorithms may be appropriate
at different time [50 p. 315].
For example let's have a Context that need to scan some text, but needs different
variant of scanning algorithm at different time. Different scan strategies can be
implemented (ScanStrategy1, ScanStrategy2 and ScanStrategy3) and the Context can
use them interchangeably through the ScanStrategy.
Context
scan(ScanStrategy sc)
sc‐>scan()
ScanStrategy
scan()
ScanStrategy1
scan()
ScanStrategy2
scan()
ScanStrategy3
scan()
Figure 2-4 The strategy pattern
31. Parallel XML parser - 31 -
A remarkable benefit of the Strategy pattern is that it represents an alternative to
conditional statements [50 p. 315]. The conditional statement can be replaced by the
strategy assignment, and each branch moved into its strategy.
2.5.2 Observer pattern
The observer pattern is needed when a change to an object called subject require to be
watched by other objects (called observers).
The interaction between the subject and the observers is known as publish-subscribe,
the subject is the publisher of notifications, and any number of observers can subscribe
to receive them [50 p. 294].
Subject
subscribe(Observer)
unsubscribe(Observer)
notify()
List<Observer> obsList
Observer
update(UpdateData data)
ConcreteObserver
update(UpdateData data)ConcreteSubject
Figure 2-5 The observer pattern
2.5.3 Active object pattern
The intent of the active object pattern is to have objects with methods that can be
asynchronously invocated in one thread and have their execution in a different thread.
The pattern decouples method invocation from method execution [51].
An active object can be an object that reside and run in its own thread (thread of
execution) independently of the thread that created it (thread of creation).
Implementations of an active object usually provide a mean for the thread of execution
to communicate results or outcome of the execution to the thread of creation.
2.5.4 Monitor pattern
The intent of the monitor pattern is to synchronize concurrent method execution to
ensure that only one method at a time runs within an object [51] . In addition to being
mutually exclusive within the object, the methods executions are also pre-conditioned to
some predicate to be verified.
32. Parallel XML parser - 32 -
The monitor is a higher-level concurrency construct that require a certain number of
synchronization primitive to participate in its construction. An essential participant is a
mutex, a synchronization object providing mutual exclusion.
2.5.5 Thread pool pattern
Multithreading allows running multiple tasks in parallel, each task running within one
thread. The thread pool organization is needed when the number of tasks to run in
parallel is much higher than the number of available threads.
A thread pool mechanism will typically consist of inserting the tasks in an internal data
structure such as queue or stack [29 pp. 223, 245], then let each thread fetch a task in
the queue, run it and proceed with another task until the data structure is empty.
As the threads competing for tasks may cause race conditions, the thread pool requires
synchronization mechanisms that allow thread-safe insertion and retrieval of tasks in
and from the queue.
2.5.6 Thread-local storage pattern
In multi-threaded environment, threads are sharing the same memory space and need
synchronization to access a shared memory location. There are situations where each
thread requires the same functionality, which can be implemented using the same
program variable, but does not require to be shared by other threads.
The thread-local storage pattern allows multiple threads to have access to a single
definition of an object, but instead of having the object instance shared by all the
threads, it arrange for each thread to have its copy of the object instance, kept internal
to the thread’s stack of execution.
Although the object definition appears to be global, any reference to it will be a
reference to a unique, local version internal to the thread accessing it.
2.6 Putting it all together
The operational mode of a SAX parser is similar to that of a pushdown automata or
PDA [39 p. 109]. The PDA has a number of states defining its stable conditions (PDA
states) and another set of state ranged in an internal stack (stack states).
An input to the PDA can cause a transition if the combination of the input with the
current PDA state and the top stack state has a corresponding pair of state formed by a
PDA state and a stack state. The relationship between all possible input set (the input,
the PDA state and the stack state) and theirs corresponding pair (a PDA state and a
stack state) is the transition function or transition table.
33. Parallel XML parser - 33 -
The implementation of the parser will thus inevitably make use of some conditional
statements to define transitions, by comparing the current state, the stack state and the
input for a possible match in the transition table.
The proposed PXML parser will use the strategy pattern in its implementation to
eliminate a number of the conditional statements, and replace them by a strategy
algorithm. The implemented strategy pattern will achieve both a performance
optimization goal, which is the reduction of branch misprediction (see § 2.5.1), and the
organizational goal, which is achieving dynamic polymorphism, as strategies will be
interchangeable at runtime.
As discussed in § 1.1.3, this thesis will adopt a divide-and-conquer strategy that will
consist of the parser cutting the XML document in multiple parts or chunks and parsing
them in parallel.
The number of chunks should be in reasonable proportion to the number of available
resources for parsing them in parallel, that is the number of processor cores and the
number of threads. However, the number of chunks will be typically higher than the
number of cores or threads. The thread pool pattern will allow a balanced distribution of
chunks to the available resources (see § 2.5.5).
The parser will have to process each chunk within its dedicated thread; for each chunk,
the thread function and the chunk details constitute an active object, as it will be
evolving in a multi-processing environment different from its thread of creation (see §
2.5.3 ). The active objects will need to interact as they are participating to the common
goal of parsing an XML document; the monitor pattern will allow synchronized
interaction between them (see § 2.5.4).
The most important requirement of this parser will be the support of concurrency.
In a SAX parser, events come in sequential order, as they appear in the document it
parses. For the proposed parser, event will come in a non-sequential order. The parser
will have to provide contextual information for each chunk to help the user reorder the
events for meaningful use. The thread local storage pattern will help the library make the
contextual information local to its chunk so that it is accessible safely within the chunk,
without requiring synchronization (see § 2.5.6).
34. Parallel XML parser - 34 -
3 Design and implementation
3.1 Introduction
The SAX parser that this thesis proposes has two functional requirements:
- Conformance to the SAX specification and
- Support of concurrency
Conformance to the SAX specification is the easier part. The SAX API is simple and has
already an implementation in Java that is fully compliant. The work in this thesis will be
about providing a C++ binding for the Java implementation.
The support of concurrency needs to be implemented without affecting the conformance to
the SAX specification. That is, the modification of the existing SAX implementation classes
in order to achieve concurrency must be at the extent allowed by the standards, and
classes added for the sake of achieving parallel parsing should be properly encapsulated.
This thesis suggests grouping the classes of this design and implementation in three
groups, concerning their contribution to the requirements:
- SAX classes: represent all classes coming from the SAX specification. They are
used to implement the ‘visible’ part of the SAX parser, in contrast to other classes
that will be encapsulated. These classes address one of the functional requirements;
that is the conformance to SAX. See § 3.3.1 and § 3.4 for design and
implementation.
- PXML classes: represent classes that implement the concepts of the proposed
parser, such as chunking and parsing scanner, and the algorithms used for
achieving parallel parsing, such as chunking and parsing loop. See § 3.3.2 and §
3.5 for design and implementation.
- Concurrency classes: additional classes added for concurrency support, they allow
keeping the whole system consistent in regards to the introduced multi-processing.
They address non-functional requirements such as synchronization, thread safety
and thread concurrency. See § 3.3.3 and § 3.6 for design and implementation.
This chapter presents the design and implementation of the essentials classes of each
above-mentioned group; it brings out the relationship among them and their interaction in
achieving the proposed objective.
This thesis widely introduced SAX specification and concurrency principle in previous
chapters; this chapter begins with the presentation of the fundamental concept of the
proposed PXML parser.
35. Parallel XML parser - 35 -
3.2 Fundamental concept
The proposed PXML parser will divide the XML document into chunks so that it parses
them in parallel using multiple threads, and so increases the XML parsing speed. The
“chunking” algorithm is not directly based on physical properties such as the chunks size,
although the expected endeavour is to have chunks of similar sizes; it is rather a markup-
aware chunking, based on the logical structure of the XML document (see § 2.1.2).
As far as markups are concerned, different parts of an XML document will require different
processing effort. For instance within an element, no comment or CDATA section can be
expected, in addition characters used for elements are part of a limited set of allowed
characters; Within a content on the other side the vast majority of markups and productions
can be expected, including other elements. This difference will make the parser need more
effort when parsing a chunk that is part of the content than a chunk part of an element,
even if they are of the same physical size.
At byte level, however, the size is what matters the most. Because the file is read byte by
byte from memory (or group of bytes if a buffer is used), the reading effort will always be
directly proportional to the size of the chunk, the logical structure of the document being
meaningless at this level.
The markup-aware chunking will try to determine the right balance of processing to perform
the chunking, taking into account the logical structure of the chunks, but striving to have
them of equal size to obtain balanced repartition of the parsing effort.
Because XML documents store data, they are usually made of a sequence of elements
that, such as database records, contains similar content. Considering this fact as a
“common case” (see § 2.3.5), the proposed PXML parser bases its chunking algorithm on
the following assumption:
For most XML documents, there is a depth at which elements start repeating in
similar shapes, hence in similar sizes.
Once the parser identifies that depth, it performs the chunking of the XML document solely
based on its logical structure. It counts on the natural, ‘common case’ fact that the structure
of the XML document will be a repetition of elements of similar size, and so obtain chunks
that are balanced both in size and markups.
3.2.1 Scanner types and chunk allocation
The parser has to process the XML document somewhat in order to identify the right
depth and define the chunking location. If, in addition, the parser will again need to
parse the produced chunks, there will be a double processing. However, the two
36. Parallel XML parser - 36 -
processing will have different goals, hence different algorithm of parsing and thus
different speed.
The proposed PXML parser will use two scanners, each dedicated to a particular
processing.
The chunking scanner will scrutinize the file in order to identify chunks locations. The
element is the basis on which the scanner divides chunks, to minimize the need of any
communication between chunks. For a given depth, the chunking scanner mission will
be to identify XML elements at that depth.
The parsing scanner will receives chunks information (start-tag location) from the
chunking scanner and properly parses the chunks, following the full set of XML rules.
Multiple parsing scanners can parse chunks in parallel.
The PXML’s algorithm success relies on the fact that the chunking scanner is faster than
the parsing scanners. The chunking scanner has a much less set of XML rules to
comply with, needs not to parse internal content of elements and, most importantly,
does not trigger any external event. Because the chunking scanner completes its job
earlier, the concurrent parsing of chunks compensates the double scanning overhead.
prolog scanner
<!DOCTYPE MyTourAgency…
<?xml vers… …ng="UTF‐8"?>
<rating stars=“2>
<pool>true</pool>
<room_service>true</ro….
</rating>
<rating stars=“3>
…
</rating>
<country name="Bulgaria">
<resort name="Borovet">
<hotel name="Rila">500…
<flat>200</flat>
<lowSeasonRent>
<nbrDay>6</nbrDay>
<banner>The whole …
</banner>
</lowSeasonRent>
</resort>
</country>
<country name=“Andorra">
……
</country>
<MyTourAgency > …….. </MyTourAgency.>
chunking scanner
XML
Document
parsing scanner
thread #1
parsing scanner
thread #2
parsing scanner
thread #3
parsing scanner
thread #4
Figure 3-1 Chunks allocation to different scanners
The figure 3-1 shows a possible allocation of chunks per scanner. Notice the presence
of another scanner, the prolog scanner, to which is allocated to the prolog. Because
the prolog has specific rules that differ from the remainder of the file, it will require a
different algorithm of parsing, thus a different type of scanner.
37. Parallel XML parser - 37 -
3.2.2 Parsing properties and parsing modes
Three PXML parser properties help the user to control the parsing; they are the pool
configuration, the chunking depth and the siblings per chunks.
The chunking depth property represents the depth of the element from which the
parser will start cutting the XML document in chunks. Once the parser identifies the first
element at the chunking depth, the chunking scanner starts cutting all following-siblings
elements (see § 2.1.3) until it reaches the closing tag of the root element.
PrologScanner
<!DOCTYPE MyTourAgency…
<?xml vers… …ng="UTF‐8"?>
<rating stars=“2>
<pool>true</pool>
<room_service>true</ro….
</rating>
<rating stars=“3>
…
</rating>
<country name="Bulgaria">
<resort name="Borovet">
<hotel name="Rila">500…
<flat>200</flat>
<lowSeasonRent>
<nbrDay>6</nbrDay>
<banner>The whole …
</banner>
</lowSeasonRent>
</resort>
</country>
<country name=“Andorra">
……
</country>
<MyTourAgency > …….. </MyTourAgency.>
ParsingScanner
XML
Document
Figure 3-2 Chunks allocation in the single-threaded parsing mode
The siblings per chunk property represent the number of sibling elements to be parts
of each chunk. Because the number of elements will be typically much higher than the
number of threads, the siblings per chunks indirectly allows a fine-grained control on the
number of chunks.
The pool configuration property determine the parsing mode of the parser (see §
1.1.2). For a given value, the pool configuration property set the parser in the
corresponding parsing mode:
- single-threaded parsing mode (pool configuration equals -1)
- multi-threaded automatic parsing mode (pool configuration equals 0)
- multi-threaded manual parsing mode (pool configuration > 0)
In the single-threaded mode, the parser does not need multi-threading and do not use
the chunking scanner; it uses the parsing scanner with only one thread for parsing the
whole document, including the root element as the only chunk as shown in the above
figure 3-2. This parsing mode is typically required for the case of small files.
38. Parallel XML parser - 38 -
In the multi-threaded mode, the pool configuration controls the thread concurrency of
the parser by directly setting the number of threads the parser will use. The multi-
threaded mode can be manual (number of threads controlled by the user) or automatic
(number of threads controlled by the PXML library).
3.2.3 From bytes to SAX events
From bytes to characters and from characters to SAX events, the parsing is
accomplished through the composition of two pushdown automata or PDA (see § 2.6):
the transcoder and the scanner.
The Transcoder consumes bytes and recognizes valid XML characters (or more
precisely, code points). The transition function is a subset of the encoding specification
that the parser is using, such as UTF-8 or UTF-16 (because not all UTF-8 or UTF-16
characters are a valid XML character).
The Scanner is a PDA that operates at a higher level than the transcoder; it consumes
characters produced by the transcoder and recognizes valid markups or productions.
Here the transition function is the XML specification itself.
The Reader is a filter that selects only markups and productions that comply with SAX
specification in order to create and trigger SAX events. For instance, the comment
production triggers an event only if a LexicalHandler is available (see § 2.2.3). The
reader has access to the SAX handler classes in order to trigger their callback method
as SAX events.
bytes
0101100
characters
Abc45>g&
markups
<element/>
SAX events
startElement()
Transcoder Scanner Reader
PDA
UTF‐8/UTF‐16 encoding
and XML specifications
XML specifications
rules
SAX specifications
rules
PDA Filter
Figure 3-3 Main parser components and their responsibility
39. Parallel XML parser - 39 -
3.3 Class relationship and interaction
The UML class diagram in Figure 3-4 (page 40) shows the most important PXML classes,
and the relationship between them; the Figure 3-5 (page 42) represents the PXML classes’
interaction during the multi-threaded parsing.
3.3.1 SAX classes
The PXML parser provides XMLReaderImpl as the implementation of XMLReader
interface and AttributesImpl as the implementation of Attributes interface; it leaves the
implementation of the SAX handlers to the library users. PXmlCountHandler is an
example SAX handler implementation of the ContentHandler interface; it is not a library
class but is part of the PXmlCount test program.
Taken in isolation, the SAX classes constitute an observer pattern (see observer pattern
in § 2.5.2), with XMLReader (Subject), XMLReaderImpl (ConcreteSubjet),
ContentHandler (Observer) and PXmlCountHandler (ConcreteObserver) as
participants.
The XMLReaderImpl is a central class of the parser concept. It contains the parse and
the concurrentParse methods that define the chunking algorithm and the concurrent
parsing algorithm respectively. Its C++ implementation is discussed in § 3.4.2.
3.3.2 PXML classes
The abstract classes XmlTranscoder and XmlScanner are generalizations of the
central PXML concepts of transcoder and scanner (see § 3.2.3).
The XmlTranscoder defines an interface for converting from bytes to characters. Its
subclasses TranscoderUtf8 and TranscoderUtf16 implement the convert_bytes
method.
The XmlScanner defines a family of algorithms for converting from characters to
markups. Its subclasses PrologScanner, ChunkingScanner and ParsingScanner,
provide their implementation of the consume_char method.
The relationship between these classes is visible in the UML diagram as a cascade of
two strategy patterns (see strategy pattern in § 2.5.1).
The strategy pattern for conversion from bytes have XMLReaderImpl (Context),
XmlTranscoder (Strategy), TranscoderUtf8 (ConcreteStrategy) and TranscoderUtf16
(ConcreteStrategy) as participants.
41. Parallel XML parser - 41 -
3.3.3 Concurrency classes
The ThreadPool responsibility is to create and maintain an appropriate number of
threads to be each used for parsing the chunks of the XML document; it uses for this the
classes ThreadSafeQueue, ChunkTask and ThreadJoiner. The UML class diagram in
Figure 3-4 shows that a composition aggregation links them, where the ThreadPool
owns the other classes.
The ThreadSafeQueue is the internal container used to store chunking data for each
chunk that the parser creates. Chunking data means all information needed to perform
the parsing of that chunk independently; ChunkTask is a class that gathers this
information. So the ThreadSafeQueue is a parameterized thread-safe container of
ChunkTask.
The ThreadJoiner class is used to ensure cooperation between the thread containing
the chunking scanner and the parsing scanner threads.
The ChunkContext is not participating in the thread pool organization; it is used to
provide chunk information to the library user. It is created by the ParsingScanner class,
and they are both attached to a particular chunk. It is a member of the ContentHandler
so that it can be available the handler callback function, which is accessible to the library
users.
3.3.4 Class interaction and scanning loops
The Figure 3-5 represents interactions between PXML classes during the parsing in
multi-threaded mode with two threads.
The sequence diagram starts with an instance of ContentHandler and an instance of
XMLReaderImpl, the reader. Upon the call of its parse method, the reader creates
XmlTranscoder and PrologScanner instances, and then sets the scanner to receive
characters from the transcoder.
The tandem transcoder-scanner performs a prolog loop, which consists of the
transcoder calling convert_bytes and the chunking scanner calling consume_char in
loop until the scanner recognize the root element and notify the reader by setting
isRootElementFound to true.
Because the parsing mode is multi-threaded (pool_config = 2), the reader create the
ThreadPool, which creates two threads and waits for the reader to submit ChunkTask
instances for parallel parsing.
The reader continues with a ChunkingScanner, which replaces the prolog scanner in
the transcoder, forming what is now a chunking loop. Anytime the scanner recognizes
a chunk position, according to the parsing properties, it notifies the reader by setting
isChunkPosition to true. The reader then collects the chunking information as a
ChunkTask and submits it to the ThreadPool.
42. Parallel XML parser - 42 -
sd parse
handler
:ContentHandler
reader
:XmlReader
parse()
:XmlTranscoder
prolog
:PrologScanner
:ThreadPool
chunking
:ChunkingScanner
:concurentParse
:XmlTranscoder
scanner
:ElementScanner
alt thread #1
:concurentParse
:XmlTranscoder
scanner
:ElementScanner
alt thread #2
new()
new()
new()
new()
setXmlScanner(scanner)
setXmlScanner(scanner)
*convert_bytes()
*consume_char()
*convert_bytes()
*consume_char()
new()
new()
new()
new()
new()
new()
setXmlScanner(prolog)
*convert_bytes()
*consume_char()
setXmlScanner(chunking)
*convert_bytes()
*consume_char()
[isChunkPosition=true]:
*convert_bytes()
*consume_char()
[isChunkPosition=true]:
submitTask()
submitTask()
startDocument()
endDocument()
endElement()
startElement()
[isRootElementFound=true]:
[isEndOfChunk=true]:
[isEndOfChunk=true]:
[pool_config>=0]:
[pool_config>=0]:
Parsing properties:
‐ pool_config=2
‐ chunking_depth=1
‐ siblings_per_chunk=1
Remark: convert_bytes() and
consume_char() operation are
marked with * to denote they
are iterative operations within
a loop but the loop fragment
is not represented for the
sake of diagram clarity
Figure 3-5 PXML sequence diagram
43. Parallel XML parser - 43 -
The reader continues with the scanning loop until it reaches the end of the document
element and waits for the ThreadPool to finish its task before acknowledging the end of
document to the user with the SAX endDocument callback.
Upon a thread within the ThreadPool receives a ChunkTask, it create XmlTranscoder
and a ParsingScanner instances to form a transcoder-scanner tandem, performing a
parsing loop. The loop consists of the transcoder calling convert_bytes and the parsing
scanner calling consume_char in loop until the scanner reaches the end of the chunk
and notifies the thread by setting isEndOfChunk to true.
The PXML considers the prolog loop, the chunking loop and the parsing loop as
specialization of the general concept of scanning loop, having all a loop breakers
(isRootElementFound, isChunkPosition and isEndOfChunk).
The sequence diagram illustrated the definition of the PXML algorithm:
The PXML algorithm consists of a prolog loop, then a chunking loop and one or
many parsing loops running in parallel.
3.4 Implementation of SAX classes
3.4.1 Characters, String and C++ binding
Because the PXML implementation of the SAX parser is not in Java, the library has to
provide a binding, the C++ equivalent of objects used in Java.
The main difference will be on the string type. The C++ language provides the Java
equivalent of String (that is std::string), but in order to optimize the performance of the
parser and, most importantly, to provide the right representation of the Unicode
encoding (not handily provided by C++), the library provides its string type.
PXML defines the following type and classes:
- XmlCh: an XML character type, or more precisely an XML code point [42]
- XmlChar: a class for XML character or code point manipulation
- XmlBuffer: a class that represents a dynamic string
UTF8 and UTF16 allow representation of all their symbols using 4-bytes memory
storage, or a pair of 2-bytes memory storage [42].