This document evaluates several XML compression tools. It examines 11 compressors, including 3 general text compressors and 8 XML-aware compressors. The compressors are evaluated on 57 XML documents from different domains, measuring compression ratio, compression time, and decompression time on two different computing environments. The results show that XML-aware compressors generally achieve better compression ratios than general text compressors, with ratios ranging from 0.07 to 0.24 on structural documents and 0.12 to 0.22 on original documents.
Het bevat alle basis informatie dat u moet weten over Koongo in één en makkelijk te lezen flyer. Deze flyer is voornamelijk van waarde voor onze marketing specialisten en mogelijke partners.
Slides accompanying a talk by Jack Thurston of farmsubsidy.org, a project bringing together researchers, activists and journalists pushing for more transparency in Europe's farm subsidy system.
Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...Orgad Kimchi
Analyzing the performance of a virtualized multitenant cloud environment can be challenging because of the layers of abstraction. This article shows how to use Oracle Solaris 11 to overcome those limitations.
For more information see:
http://www.oracle.com/technetwork/articles/servers-storage-admin/perf-analysis-multitenant-cloud-2082193.html
Het bevat alle basis informatie dat u moet weten over Koongo in één en makkelijk te lezen flyer. Deze flyer is voornamelijk van waarde voor onze marketing specialisten en mogelijke partners.
Slides accompanying a talk by Jack Thurston of farmsubsidy.org, a project bringing together researchers, activists and journalists pushing for more transparency in Europe's farm subsidy system.
Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...Orgad Kimchi
Analyzing the performance of a virtualized multitenant cloud environment can be challenging because of the layers of abstraction. This article shows how to use Oracle Solaris 11 to overcome those limitations.
For more information see:
http://www.oracle.com/technetwork/articles/servers-storage-admin/perf-analysis-multitenant-cloud-2082193.html
C* Summit 2013: Time is Money Jake Luciani and Carl YeksigianDataStax Academy
This session will focus on our approach to building a scalable TimeSeries database for financial data using Cassandra 1.2 and CQL3. We will discuss how we deal with a heavy mix of reads and writes as well as how we monitor and track performance of the system.
Joget Workflow Clustering and Performance Testing on Amazon Web Services (AWS)Joget Workflow
Joget Workflow is an open source web-based workflow software to develop workflow and BPM applications. It is also a rapid application development platform that offers full-fledged agile development capabilities (consisting of processes, forms, lists, CRUD and UI), not just back-end EAI/orchestration/integration or the task-based interface.
This document is intended to describe and analyze the results of performance tests on a clustered deployment of Joget Workflow on Amazon Web Services (AWS). This document also demonstrates the baseline performance of the Joget Workflow platform for a basic app and shows how horizontal and vertical scaling can be used to support larger deployments.
More information on Joget Workflow at http://www.joget.org
The post release technologies of Crysis 3 (Slides Only) - Stewart NeedhamStewart Needham
For AAA games now there is a consumer expectation that the developer has a post release strategy. This strategy goes beyond just DLC content. Users expect to receive bug fixes, balancing updates, gamemode variations and constant tuning of the game experience. So how can you architect your game technology to facilitate all of this? Stewart explains the unique patching system developed for Crysis 3 Multiplayer which allowed the team to hot-patch pretty much any asset or data used by the game. He also details the supporting telemetry, server and testing infrastructure required to support this along with some interesting lessons learned.
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
Keynote by Brendan Gregg for YOW! 2018. Video: https://www.youtube.com/watch?v=03EC8uA30Pw . Description: "At Netflix, improving the performance of our cloud means happier customers and lower costs, and involves root cause
analysis of applications, runtimes, operating systems, and hypervisors, in an environment of 150k cloud instances
that undergo numerous production changes each week. Apart from the developers who regularly optimize their own code
, we also have a dedicated performance team to help with any issue across the cloud, and to build tooling to aid in
this analysis. In this session we will summarize the Netflix environment, procedures, and tools we use and build t
o do root cause analysis on cloud performance issues. The analysis performed may be cloud-wide, using self-service
GUIs such as our open source Atlas tool, or focused on individual instances, and use our open source Vector tool, f
lame graphs, Java debuggers, and tooling that uses Linux perf, ftrace, and bcc/eBPF. You can use these open source
tools in the same way to find performance wins in your own environment."
Environment Canada's Data Management ServiceSafe Software
A brief history in TimeSeries data at Environment Canada. An Enterprise view of how FME can be integrated into departmental data management activities.
Experience Mazda Zoom Zoom Lifestyle and Culture by Visiting and joining the Official Mazda Community at http://www.MazdaCommunity.org for additional insight into the Zoom Zoom Lifestyle and special offers for Mazda Community Members. If you live in Arizona, check out CardinaleWay Mazda's eCommerce website at http://www.Cardinale-Way-Mazda.com
Data-intensive tasks call for distributed computation. By using one of the many available libraries based on the Hadoop Spark engine, even a newcomer of distributed computing can take advantage of parallelism using familiar programming languages without having to worry about the underlying data and task distribution. We present demos of SparkR (the distributed version of R), Koalas (similar to Pandas) and SparkSQL.
Presentation at ASHPC22, the Austrian-Slovenian HPC Meeting - 31 May – 2 June 2022 (https://vsc.ac.at/research/conferences/ashpc22/).
Delivering Micro-Credentials in Technical and Vocational Education and TrainingAG2 Design
Explore how micro-credentials are transforming Technical and Vocational Education and Training (TVET) with this comprehensive slide deck. Discover what micro-credentials are, their importance in TVET, the advantages they offer, and the insights from industry experts. Additionally, learn about the top software applications available for creating and managing micro-credentials. This presentation also includes valuable resources and a discussion on the future of these specialised certifications.
For more detailed information on delivering micro-credentials in TVET, visit this https://tvettrainer.com/delivering-micro-credentials-in-tvet/
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
"Protectable subject matters, Protection in biotechnology, Protection of othe...
XML Compression Benchmark
1. An Empirical Evaluation of XML Compression Tools
Sherif Sakr
School of Computer Science and Engineering
University of New South Wales
1st International Workshop on Benchmarking of XML and Semantic Web Applications
(BenchmarX’09)
20 April 2009
S. Sakr (CSE, UNSW) BenchmarX’09 20 April 2009 1 / 20
2. XML Compression: Why?
XML has become a popular standard with many useful applications.
XML is often referred as self-describing data.
On one hand, this self-describing feature grants the XML great
flexibility.
On the other hand, it introduces the main problem of verbosity.
XML compression has many advantages such as:
Reducing the network bandwidth required for data exchange.
Reducing the disk space required for storage.
Minimizing the main memory requirements of processing and querying
XML documents.
S. Sakr (CSE, UNSW) BenchmarX’09 20 April 2009 2 / 20
3. XML Compressors: Classifications I
With respect to the awareness of the structure of the XML documents:
General Text Compressors: They are XML-Blind, treats XML
documents as usual plain text documents and applies the traditional
text compression techniques.
XML Conscious Compressors: They are designed to take the
advantage of the awareness of the XML document structure to
achieve better compression ratios over the general text compressors.
Schema dependent compressors: Both of the encoder and decoder
must have access to the document schema information achieve the
compression process. They are not commonly used in practice.
Schema independent compressors: The availability of the schema
information is not required to achieve the encoding and decoding
processes.
S. Sakr (CSE, UNSW) BenchmarX’09 20 April 2009 3 / 20
4. XML Compressors: Classifications II
With respect to the ability of supporting queries:
Non-Queriable (Archival) XML Compressors: They do not allow
any queries to be processed over the compressed format. They are
mainly focusing to achieve the highest compression ratio.
By default, general purpose text compressors belong to the
non-queriable group of compressors.
Queriable XML Compressors: They allow queries to be processed
over their compressed formats. The compression ratio is usually worse
than that of the archival XML compressors. The main focus is to
avoid full document decompression during query execution.
The ability to perform direct queries on compressed XML formats is
important for many applications which are hosted on resource-limited
computing devices such as: mobile devices and GPS systems .
By default, all queriable compressors are XML conscious compressors
as well.
S. Sakr (CSE, UNSW) BenchmarX’09 20 April 2009 4 / 20
5. Examination Criteria
In our study we considered, to the best of our knowledge, all XML
compressors which are fulfilling the following conditions:
Is publicly and freely available either in the form of open source codes
or binary versions.
Is schema-independent.
Be able to run under our Linux version of operating system.
Ubuntu 7.10 (Linux 2.6.20 Kernel)
Ubuntu 7.10 (Linux 2.6.22 Kernel)
S. Sakr (CSE, UNSW) BenchmarX’09 20 April 2009 5 / 20
6. Examined Compressors
XML Compressors List
Compressor Features Code Compressor Features Code
Available Available
GZIP (1.3.12) GAI Y XGrind SQI Y
BZIP2 (1.0.4) GAI Y XBzip SQI N
PPM (j.1) GAI Y XQueC SQI N
XMill (0.7) SAI Y XCQ SQI N
XMLPPM (0.98.3) SAI Y XPress SQI N
SCMPPM (0.93.3) SAI Y XQzip SQI N
XWRT (3.2) SAI Y XSeq SQI N
Exalt (0.1.0) SAI Y QXT SQI N
AXECHOP SAI Y ISX SQI N
DTDPPM SAD Y XAUST SAD Y
rngzip SQD Y Millau SAD N
Symbols list of XML compressors features
Symbol Description Symbol Description
G General Text Compressor S Specific XML Compressor
D Schema dependent Compressor I Schema Independent Compressor
A Archival XML Compressor Q Queriable XML Compressor
S. Sakr (CSE, UNSW) BenchmarX’09 20 April 2009 6 / 20
7. Testing Corpus (Data Sets)
Determining the XML files that should be used for evaluating the set
of XML compression tools is not a simple task.
The documents of our corpus are classified into four categories:
Regular Documents: Regular document structure and short data
contents. They reflect the XML view of relational data. The data ratio
of these documents is in the range of between 40% and 60%.
Irregular documents: Very deep, complex and irregular structure.
More challenging in terms of compression efficiency.
Textual documents: Simple structure and high ratio of the contents
is preserved to the data values. The ratio of the data contents of these
documents represent more than 70% of the document size.
Structural documents: No data contents at all. 100% of each
document size is preserved to its structure information. They are used
to assess the claim of XML conscious compressors on using the well
known structure of XML documents for achieving higher compression
ratios on the structural parts of XML documents.
S. Sakr (CSE, UNSW) BenchmarX’09 20 April 2009 7 / 20
9. Testing Environments
To ensure the consistency of the performance behaviors of the
evaluated XML compressors, we ran our experiments on two different
environments.
One environment with high computing resources and the other with
considerably limited computing resources.
High Resources Setup Limited Resources Setup
OS Ubuntu 7.10 (Linux 2.6.22 Kernel) Ubuntu 7.10 (Linux 2.6.20 Kernel)
CPU Intel Core 2 Duo E6850 Intel Pentium 4
3.00 GHz, FSB 1333MHz 2.66GHz, FSB 533MHz
4MB L2 Cache 512KB L2 Cache
HD Seagate ST3250820AS - 250 GB Western Digital WD400BB - 40 GB
RAM 4 GB 512 MB
S. Sakr (CSE, UNSW) BenchmarX’09 20 April 2009 9 / 20
10. Performance Criteria
We measure and compare the performance of the XML compression tools
using the following metrics:
Compression Ratio: represents the ratio between the sizes of
compressed and uncompressed XML documents.
Compression Ratio = (Compressed Size) / (Uncompressed Size)
Compression Time: represents the elapsed time during the
compression process.
Decompression Time: represents the elapsed time during the
decompression process.
For all metrics: the lower the metric value, the better the compressor.
S. Sakr (CSE, UNSW) BenchmarX’09 20 April 2009 10 / 20
11. Experimental Framework
We evaluated 11 XML compressors: 3 general purpose text
compressors and 8 XML conscious compressors.
Our corpus consists of 57 documents: 27 original documents, 27
structural copies and 3 randomly generated structural documents.
We run the experiments on two different platforms.
For each combination of an XML test document and an XML
compressor, we run two different operations (compression -
decompression).
To ensure accuracy, all reported numbers for our time metrics are the
average of five executions with the highest and the lowest values
removed.
We created our own mix of Unix shell and Perl scripts to run and
collect the results of these huge number of runs.
The web page of this study provides access to the test files, examined
XML compressors and the detailed results of this study.
http://xmlcompbench.sourceforge.net/.
S. Sakr (CSE, UNSW) BenchmarX’09 20 April 2009 11 / 20
14. Experimental Results : Average Compression Ratios
0.24
0.07
0.06 0.22
Avergae Compression Ratio
Average Compression Ratio
0.05 0.20
0.04 0.18
0.03
0.16
0.02
0.14
0.01
0.12
0.00
XMillPPM
XMillGzip
XMillBzip2
XMLPPM
Bzip2
Axechop
Gzip
SCMPPM
PPM
Exalt
XWRT
SCMPPM
XMLPPM
XMillBzip2
XMillPPM
Bzip2
PPM
XMillGzip
Gzip
XWRT
(a) Structural documents. (b) Original documents.
S. Sakr (CSE, UNSW) BenchmarX’09 20 April 2009 14 / 20
15. Overall Performance of XML Compressors
7
6
5 Bzip2
Gzip
PPM
4 XMillBzip2
XMillGzip
XMillPPM
3 XMLPPM
XWRT
SCMPPM
2
1
0
Compression Ratio Compression Time Decompression Time
S. Sakr (CSE, UNSW) BenchmarX’09 20 April 2009 15 / 20
16. Proposed Ranking of XML Compressors
The results of our experiments have not shown a clear winner.
Different ranking methods and different weights for the factors could
be used for this task. Deciding the weight of each metric is mainly
dependant on the scenarios and requirements of the applications
where these compression tools could be used.
We used three ranking functions which give different weights for our
performance metrics:
WF1 = (1/3 * CR) + (1/3 * CT) + (1/3 * DCT).
WF2 = (1/2 * CR) + (1/4 * CT) + (1/4 * DCT).
WF3 = (3/5 * CR) + (1/5 * CT) + (1/5 * DCT).
where CR represents the compression ratio metric, CT represents the
compression time metric and DCT represents the decompression time
metric.
S. Sakr (CSE, UNSW) BenchmarX’09 20 April 2009 16 / 20
17. Proposed Ranking of XML Compressors
3.0
2.5
Bzip2
Gzip
2.0 PPM
XMillBzip2
XMillGzip
XMillPPM
1.5 XMLPPM
XWRT
SCMPPM
1.0
0.5
0.0
WF1 WF2 WF3
S. Sakr (CSE, UNSW) BenchmarX’09 20 April 2009 17 / 20
18. Conclusions
The primary innovation in the XML compression mechanisms was
introduced in XMill by separating the structural part of the XML
document from the data part and then group the related data items
into homogenous containers that can be compressed separably. Most
of the following XML compressors have simulated this idea in
different ways.
The dominant practice in most of the XML compressors is to utilize
the well-known structure of XML documents for applying a
pre-processing encoding step and then forwarding the results of this
step to general purpose compressors.
There are no publicly available solid implementations for
grammar-based XML compression techniques and queriable XML
compressors.
The authors of the XML compressors should provide more attention
to provide the source code of their implementations available.
S. Sakr (CSE, UNSW) BenchmarX’09 20 April 2009 18 / 20
19. Conclusions
We believe that this paper could be valuable for both the developers of
new XML compression tools and interested users as well.
For developers, they can use the results of this paper to effectively
decide on the points which can be improved in order to make an
effective contribution.
We recommend tackling the area of developing stable efficient
queriable XML compressors. Although there has been a lot of literature
presented in this domain, we are still missing efficient, scalable and
stable implementations in this domain.
For users, this study could be helpful for making an effective decision
to select the suitable compressor for their requirements.
For users with highest compression ratio requirement, the results of our
experiments recommend the usage of either the PPM compressor with
the highest level of compression parameter or the XWRT compressor
with the highest level of compression parameter.
For users with fastest compression time and moderate compression ratio
requirements, gzip and XMillGzip are considered to be the best choice.
S. Sakr (CSE, UNSW) BenchmarX’09 20 April 2009 19 / 20
20. The End
Thank You
S. Sakr (CSE, UNSW) BenchmarX’09 20 April 2009 20 / 20