In this paper, we propose new models and algorithms to perform practical computations on W3C XML Schemas, which are schema minimization, schema equivalence testing, subschema testing and subschema extraction. We have conducted experiments on an e-commerce standard XSD called xCBL to demonstrate the e?ectiveness of our algorithms. One experiment has refuted the claim that the xCBL 3.5 XSD is compatible with the xCBL 3.0 XSD. Another experiment has shown that the xCBL XSDs can be effectively trimmed into small subschemas for specific applications, which has significantly reduced schema processing time.
The document discusses various primitive data types including numeric, boolean, character, and string types. It describes integers, floating point numbers, complex numbers, decimals, booleans, characters, and strings. It also covers array types like static, dynamic, and associative arrays. Other topics include records, slices, and unions.
The document discusses various primitive data types including numeric, boolean, character, and string types. It describes integer types like byte, short, int that can store negative numbers using two's complement. Floating point types are represented as fractions and exponents. Boolean types are either true or false. Character types are stored as numeric codes. String types can have static, limited dynamic, or fully dynamic lengths. User-defined types like enumerations and subranges are also covered. The document also discusses array types including their initialization, operations, and implementation using row-major and column-major ordering. Associative arrays are described as unordered collections indexed by keys. Record and union types are summarized.
This document summarizes advanced SQL features covered in Chapter 4, including:
1) SQL data types such as date, time, timestamp, interval, and user-defined types.
2) Integrity constraints in SQL like not null, unique, primary key, check, and referential integrity constraints.
3) Referential integrity is enforced through primary keys, foreign keys, and references between tables.
Development of Open Source and Standards Technology in Hong KongThomas Lee
Location: The European Telecommunications Standards Institute (ETSI) Workshop on Open source Applications and Standards, Sophia Antipolis, France /
Event Date: Dec 16, 2004 /
Organization: European Telecommunications Standards Institute (ETSI)
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Boris Yen
Cassandra is a distributed database that is highly scalable and fault tolerant. It uses a dynamic partitioning approach to distribute and replicate data across nodes. Cassandra offers tunable consistency levels and supports various client libraries like Hector and CQL for Java applications to interface with Cassandra. Some key features include horizontal scaling by adding nodes, replication of data for fault tolerance, and tunable consistency levels for reads and writes.
This document introduces Mahout Scala and Spark bindings, which aim to provide an R-like environment for machine learning on Spark. The bindings define algebraic expressions for distributed linear algebra using Spark and provide optimizations. They define data types for scalars, vectors, matrices and distributed row matrices. Features include common linear algebra operations, decompositions, construction/collection functions, HDFS persistence, and optimization strategies. The goal is a high-level semantic environment that can run interactively on Spark.
Type Checking Scala Spark Datasets: Dataset TransformsJohn Nestor
This document discusses type checking Scala Spark Datasets. It introduces Dataset transforms which allow checking field names and types at compile time rather than run time. The transforms include operations like map, filter, sort, join and aggregate. The implementation uses Scala macros to analyze case class definitions at compile time and generate meta structures representing the fields and types. This allows encoding the transforms as Spark SQL queries that benefit from optimization while also providing strong typing. Code and examples for the transforms are available on GitHub.
XML Schema Computations: Schema Compatibility Testing and Subschema ExtractionThomas Lee
In this paper, we propose new models and algorithms to perform practical computations on W3C XML Schemas, which are schema minimization, schema equivalence testing, subschema testing and subschema extraction. We have conducted experiments on an e-commerce standard XSD called xCBL to demonstrate the e?ectiveness of our algorithms. One experiment has refuted the claim that the xCBL 3.5 XSD is compatible with the xCBL 3.0 XSD. Another experiment has shown that the xCBL XSDs can be effectively trimmed into small subschemas for specific applications, which has significantly reduced schema processing time.
The document discusses various primitive data types including numeric, boolean, character, and string types. It describes integers, floating point numbers, complex numbers, decimals, booleans, characters, and strings. It also covers array types like static, dynamic, and associative arrays. Other topics include records, slices, and unions.
The document discusses various primitive data types including numeric, boolean, character, and string types. It describes integer types like byte, short, int that can store negative numbers using two's complement. Floating point types are represented as fractions and exponents. Boolean types are either true or false. Character types are stored as numeric codes. String types can have static, limited dynamic, or fully dynamic lengths. User-defined types like enumerations and subranges are also covered. The document also discusses array types including their initialization, operations, and implementation using row-major and column-major ordering. Associative arrays are described as unordered collections indexed by keys. Record and union types are summarized.
This document summarizes advanced SQL features covered in Chapter 4, including:
1) SQL data types such as date, time, timestamp, interval, and user-defined types.
2) Integrity constraints in SQL like not null, unique, primary key, check, and referential integrity constraints.
3) Referential integrity is enforced through primary keys, foreign keys, and references between tables.
Development of Open Source and Standards Technology in Hong KongThomas Lee
Location: The European Telecommunications Standards Institute (ETSI) Workshop on Open source Applications and Standards, Sophia Antipolis, France /
Event Date: Dec 16, 2004 /
Organization: European Telecommunications Standards Institute (ETSI)
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Boris Yen
Cassandra is a distributed database that is highly scalable and fault tolerant. It uses a dynamic partitioning approach to distribute and replicate data across nodes. Cassandra offers tunable consistency levels and supports various client libraries like Hector and CQL for Java applications to interface with Cassandra. Some key features include horizontal scaling by adding nodes, replication of data for fault tolerance, and tunable consistency levels for reads and writes.
This document introduces Mahout Scala and Spark bindings, which aim to provide an R-like environment for machine learning on Spark. The bindings define algebraic expressions for distributed linear algebra using Spark and provide optimizations. They define data types for scalars, vectors, matrices and distributed row matrices. Features include common linear algebra operations, decompositions, construction/collection functions, HDFS persistence, and optimization strategies. The goal is a high-level semantic environment that can run interactively on Spark.
Type Checking Scala Spark Datasets: Dataset TransformsJohn Nestor
This document discusses type checking Scala Spark Datasets. It introduces Dataset transforms which allow checking field names and types at compile time rather than run time. The transforms include operations like map, filter, sort, join and aggregate. The implementation uses Scala macros to analyze case class definitions at compile time and generate meta structures representing the fields and types. This allows encoding the transforms as Spark SQL queries that benefit from optimization while also providing strong typing. Code and examples for the transforms are available on GitHub.
XML Schema Computations: Schema Compatibility Testing and Subschema ExtractionThomas Lee
In this paper, we propose new models and algorithms to perform practical computations on W3C XML Schemas, which are schema minimization, schema equivalence testing, subschema testing and subschema extraction. We have conducted experiments on an e-commerce standard XSD called xCBL to demonstrate the e?ectiveness of our algorithms. One experiment has refuted the claim that the xCBL 3.5 XSD is compatible with the xCBL 3.0 XSD. Another experiment has shown that the xCBL XSDs can be effectively trimmed into small subschemas for specific applications, which has significantly reduced schema processing time.
Cassandra Java APIs Old and New – A Comparisonshsedghi
The document compares old Java APIs for Cassandra like Thrift, Hector and JDBC to the new DataStix Java driver. It provides an overview of each API, including how they interact with Cassandra (e.g. using Thrift), examples of basic operations like reading rows, and references for more information. It also briefly introduces Cassandra's data model and the binary protocol which the new driver utilizes.
This document proposes an approach to enable ontology-based access to streaming data sources. It discusses mapping streaming data schemas to ontological concepts and extending SPARQL to support querying streaming RDF data. This would allow expressing continuous queries over streaming data using ontological terms. The approach includes translating such SPARQL queries to queries over streaming data sources using mappings between the ontology and streaming schemas. An implementation of a semantic integration service is proposed to deploy this ontology-based access to streaming data.
The document discusses various NoSQL databases including Voldemort, MongoDB, CouchDB, Riak, Neo4J, and Redis. It provides information on their origins, licenses, implementation languages, data models, how they scale, APIs, deployments, and support. Key aspects covered are the CAP theorem tradeoffs between consistency, availability, and partition tolerance. The document emphasizes that NoSQL databases take different approaches and have different tradeoffs to consider for each use case.
Large Scale Machine Learning with Apache SparkCloudera, Inc.
Spark offers a number of advantages over its predecessor MapReduce that make it ideal for large-scale machine learning. For example, Spark includes MLLib, a library of machine learning algorithms for large data. The presentation will cover the state of MLLib and the details of some of the scalable algorithms it includes.
This document provides an agenda and overview of Big Data Analytics using Spark and Cassandra. It discusses Cassandra as a distributed database and Spark as a data processing framework. It covers connecting Spark and Cassandra, reading and writing Cassandra tables as Spark RDDs, and using Spark SQL, Spark Streaming, and Spark MLLib with Cassandra data. Key capabilities of each technology are highlighted such as Cassandra's tunable consistency and Spark's fault tolerance through RDD lineage. Examples demonstrate basic operations like filtering, aggregating, and joining Cassandra data with Spark.
Knowledge Discovery Query Language (KDQL)Zakaria Zubi
The document discusses Knowledge Discovery Query Language (KDQL), a proposed query language for interacting with i-extended databases in the knowledge discovery process. KDQL is designed to handle data mining rules and retrieve association rules from i-extended databases. The key points are:
1) KDQL is based on SQL and is intended to support tasks like association rule mining within the ODBC_KDD(2) model for knowledge discovery.
2) It can be used to query i-extended databases, which contain both data and discovered patterns.
3) The KDQL RULES operator allows users to specify data mining tasks like finding association rules that satisfy certain frequency and confidence thresholds.
Ast2Cfg - A Framework for CFG-Based Analysis and Visualisation of Ada ProgramsGneuromante canalada.org
Georg Kienesberger - Vienna University of Technology
FOSDEM’09
Free and Open Source Software Developers’ European Meeting
7-8 February 2009 - Brussels, Belgium
These slides are licensed under a Creative Commons Attribution-Share Alike 3.0 Austria License. http://creativecommons.org
To date, Hadoop usage has focused primarily on offline analysis--making sense of web logs, parsing through loads of unstructured data in HDFS, etc. But what if you want to run map/reduce against your live data set without affecting online performance? Combining Hadoop with Cassandra's multi-datacenter replication capabilities makes this possible. If you're interested in getting value from your data without the hassle and latency of first moving it into Hadoop, this talk is for you. I'll show you how to connect all the parts, enabling you to write map/reduce jobs or run Pig queries against your live data. As a bonus I'll cover writing map/reduce in Scala, which is particularly well-suited for the task.
This document provides an overview of what artificial intelligence (AI) can do for businesses and some key considerations for implementing AI. It discusses how AI can perform tasks like acting, drawing, playing games, and building things. The document also covers different types of machine learning like supervised learning, unsupervised learning, and reinforcement learning. It emphasizes the importance of having the right use cases, specialists, and technology infrastructure to make AI work for a business. Finally, it addresses some controversies and risks to consider, such as data privacy, bias and ethics, explainability, and unpredictability.
XML Schema Design and Management for e-Government Data Interoperability Thomas Lee
One-stop public services and single window systems are primary goals of many e-government initiatives. How to facilitate the technical and data interoperability among the systems in different government agencies is a key of meeting these goals. While many software standards, such as Web Services and ebXML, have been formulated to address the interoperability between different technical platforms, the data interoperability problem remains to be a big challenge. The data interoperability concerns how different parties agree on what information to exchange, and the definition and representation of such information. To address this problem, the Hong Kong government has released the XML Schema Design and Management Guide as well as the Registry of Data Standards under its e-Government Interoperability Framework initiative. This paper introduces how the data modelling methodology provided by the Guide can be used to develop data interfaces
and standards for e-government systems. We also discuss how the Macao government has formulated their data interoperability policy and has applied the Guide in their situation.
Automating Relational Database Schema Design for Very Large Semantic DatasetsThomas Lee
Many semantic datasets or RDF datasets are very large but have no pre-defined data structures. Triple stores are commonly used as RDF databases yet they cannot achieve good query performance for large datasets owing to excessive self-joins. Recent research work proposed to store RDF data in column-based databases. Yet, some study has shown that such an approach is not scalable to the number of predicates. The third common approach is to organize an RDF data set in different tables in a relational database. Multiple “correlated” predicates are maintained in the same table called property table so that table-joins are not needed for queries that involve only the predicates within the table. The main challenge for the property table approach is that it is infeasible to manually design good schemas for the property tables of a very large RDF dataset. We propose a novel data-mining technique called Attribute Clustering by Table Load (ACTL) that clusters a given set of attributes into correlated groups, so as to automatically generate the property table schemas. While ACTL is an NP-complete problem, we propose an agglomerative clustering algorithm with several effective pruning techniques to approximate the optimal solution. Experiments show that our algorithm can efficiently mine huge datasets (e.g., Wikipedia Infobox data) to generate good property table schemas, with which queries generally run faster than with triple stores and column-based databases.
Formal Models and Algorithms for XML Data InteroperabilityThomas Lee
In this paper, we study the data interoperability problem of web services in terms of XML schema compatibility. When Web Service A sends XML messages to Web Service B, A is interoperable with B if B can accept all messages from A. That is, the XML schema R for B to receive XML instances must be compatible with the XML schema S for A to send XML instances, i.e., A is a subschema of B. We propose a formal model called Schema Automaton (SA) to model W3C XML Schema (XSD) and develop several algorithms to perform different XML schema computations. The computations include schema minimization, schema equivalence testing, subschema testing, and subschema extraction. We have conducted experiments on an e-commerce standard XSD called xCBL to demonstrate the practicality of our algorithms. One experiment has refuted the claim that the xCBL 3.5 XSD is backward compatible with the xCBL 3.0 XSD. Another experiment has shown that the xCBL XSDs can be effectively trimmed into small subschemas for specific applications, which has significantly reduced the schema processing time.
Cloud Portability and Interoperability Architecture Model and Best Practices ...Thomas Lee
The document summarizes the key topics discussed at a meeting on cloud computing interoperability standards. It covered background on portability and interoperability concepts, a distributed computing reference model that applications can map to, recommendations for current practices and standards development, and conclusions. Recommendations included adopting loose coupling and service-oriented design principles, using standard interfaces and formats like OVF, and developing standards around application-platform and service management interfaces. The conclusions were that understanding interoperability issues can help adoption strategies and that portability/interoperability will become critical to cloud vendor competitiveness as the technology matures.
Architecture and Practices on Cloud Interoperability and PortabilityThomas Lee
Cloud computing is believed to be another big wave of Internet technology after World Wide Web and mobile computing. The Open Group has identified cloud computing as a major driver to develop global GDP. In Hong Kong, the Office of Government CIO (OGCIO) has established the Expert Group on Cloud Computing Services and Standards (EGCCSS) to drive cloud computing adoption and deployment. Various cloud technical committees, including the two groups mentioned above, have identified the interoperability and portability of cloud services as a key principle for stimulating and driving economic benefits. EGCCSS has formed a Working Group Cloud Computing Interoperability Standards (WGCCIS) specifically to address this challenge.
In this talk, Dr Thomas Lee shares his experience in working in WGCCIS as a co-opt member and introduces the Open Group Guide on Cloud Computing Portability and Interoperability. He explains the fundamental concepts of cloud interoperability and portability and the reference architecture to design interoperable interfaces between on-premise and cloud application components. He also discusses the architectural principles for supporting cloud service providers to develop interoperable cloud services. From the enterprise user perspective, he also summarizes some good practices from the Open Group Guide that help cloud consumers to formulate their cloud strategy to manage vendor lock-in when selecting cloud services.
ebXML Technology Development in Hong KongThomas Lee
The document summarizes ebXML technology development in Hong Kong. It discusses Project Phoenix, an initiative to establish ebXML infrastructure in Hong Kong, including R&D products like an ebXML registry and pilot projects applying ebXML to e-procurement and e-logistics. It also covers promotion activities, challenges in adopting ebXML technology, and a strategy to cross the chasm between early adopters and mainstream users.
Location: Speaker's Session, Hong Kong Computer Society XML Specialist Group, Hong Kong Polytechnic University /
Event Date: Aug 23, 2003 /
Organization: Hong Kong Computer Society
E government Interoperability Infrastructure DevelopmentThomas Lee
Location: The 2nd APEC E-Commerce Business Alliance Forum, Qingdao, China /
Organization: Asia-Pacific Economic Cooperation (APEC)
Conference End Date: May 19, 2006 /
Conference Start Date: May 18, 2006
Cassandra Java APIs Old and New – A Comparisonshsedghi
The document compares old Java APIs for Cassandra like Thrift, Hector and JDBC to the new DataStix Java driver. It provides an overview of each API, including how they interact with Cassandra (e.g. using Thrift), examples of basic operations like reading rows, and references for more information. It also briefly introduces Cassandra's data model and the binary protocol which the new driver utilizes.
This document proposes an approach to enable ontology-based access to streaming data sources. It discusses mapping streaming data schemas to ontological concepts and extending SPARQL to support querying streaming RDF data. This would allow expressing continuous queries over streaming data using ontological terms. The approach includes translating such SPARQL queries to queries over streaming data sources using mappings between the ontology and streaming schemas. An implementation of a semantic integration service is proposed to deploy this ontology-based access to streaming data.
The document discusses various NoSQL databases including Voldemort, MongoDB, CouchDB, Riak, Neo4J, and Redis. It provides information on their origins, licenses, implementation languages, data models, how they scale, APIs, deployments, and support. Key aspects covered are the CAP theorem tradeoffs between consistency, availability, and partition tolerance. The document emphasizes that NoSQL databases take different approaches and have different tradeoffs to consider for each use case.
Large Scale Machine Learning with Apache SparkCloudera, Inc.
Spark offers a number of advantages over its predecessor MapReduce that make it ideal for large-scale machine learning. For example, Spark includes MLLib, a library of machine learning algorithms for large data. The presentation will cover the state of MLLib and the details of some of the scalable algorithms it includes.
This document provides an agenda and overview of Big Data Analytics using Spark and Cassandra. It discusses Cassandra as a distributed database and Spark as a data processing framework. It covers connecting Spark and Cassandra, reading and writing Cassandra tables as Spark RDDs, and using Spark SQL, Spark Streaming, and Spark MLLib with Cassandra data. Key capabilities of each technology are highlighted such as Cassandra's tunable consistency and Spark's fault tolerance through RDD lineage. Examples demonstrate basic operations like filtering, aggregating, and joining Cassandra data with Spark.
Knowledge Discovery Query Language (KDQL)Zakaria Zubi
The document discusses Knowledge Discovery Query Language (KDQL), a proposed query language for interacting with i-extended databases in the knowledge discovery process. KDQL is designed to handle data mining rules and retrieve association rules from i-extended databases. The key points are:
1) KDQL is based on SQL and is intended to support tasks like association rule mining within the ODBC_KDD(2) model for knowledge discovery.
2) It can be used to query i-extended databases, which contain both data and discovered patterns.
3) The KDQL RULES operator allows users to specify data mining tasks like finding association rules that satisfy certain frequency and confidence thresholds.
Ast2Cfg - A Framework for CFG-Based Analysis and Visualisation of Ada ProgramsGneuromante canalada.org
Georg Kienesberger - Vienna University of Technology
FOSDEM’09
Free and Open Source Software Developers’ European Meeting
7-8 February 2009 - Brussels, Belgium
These slides are licensed under a Creative Commons Attribution-Share Alike 3.0 Austria License. http://creativecommons.org
To date, Hadoop usage has focused primarily on offline analysis--making sense of web logs, parsing through loads of unstructured data in HDFS, etc. But what if you want to run map/reduce against your live data set without affecting online performance? Combining Hadoop with Cassandra's multi-datacenter replication capabilities makes this possible. If you're interested in getting value from your data without the hassle and latency of first moving it into Hadoop, this talk is for you. I'll show you how to connect all the parts, enabling you to write map/reduce jobs or run Pig queries against your live data. As a bonus I'll cover writing map/reduce in Scala, which is particularly well-suited for the task.
Similar to XML Schema Computations: Schema Compatibility Testing and Subschema Extraction (9)
This document provides an overview of what artificial intelligence (AI) can do for businesses and some key considerations for implementing AI. It discusses how AI can perform tasks like acting, drawing, playing games, and building things. The document also covers different types of machine learning like supervised learning, unsupervised learning, and reinforcement learning. It emphasizes the importance of having the right use cases, specialists, and technology infrastructure to make AI work for a business. Finally, it addresses some controversies and risks to consider, such as data privacy, bias and ethics, explainability, and unpredictability.
XML Schema Design and Management for e-Government Data Interoperability Thomas Lee
One-stop public services and single window systems are primary goals of many e-government initiatives. How to facilitate the technical and data interoperability among the systems in different government agencies is a key of meeting these goals. While many software standards, such as Web Services and ebXML, have been formulated to address the interoperability between different technical platforms, the data interoperability problem remains to be a big challenge. The data interoperability concerns how different parties agree on what information to exchange, and the definition and representation of such information. To address this problem, the Hong Kong government has released the XML Schema Design and Management Guide as well as the Registry of Data Standards under its e-Government Interoperability Framework initiative. This paper introduces how the data modelling methodology provided by the Guide can be used to develop data interfaces
and standards for e-government systems. We also discuss how the Macao government has formulated their data interoperability policy and has applied the Guide in their situation.
Automating Relational Database Schema Design for Very Large Semantic DatasetsThomas Lee
Many semantic datasets or RDF datasets are very large but have no pre-defined data structures. Triple stores are commonly used as RDF databases yet they cannot achieve good query performance for large datasets owing to excessive self-joins. Recent research work proposed to store RDF data in column-based databases. Yet, some study has shown that such an approach is not scalable to the number of predicates. The third common approach is to organize an RDF data set in different tables in a relational database. Multiple “correlated” predicates are maintained in the same table called property table so that table-joins are not needed for queries that involve only the predicates within the table. The main challenge for the property table approach is that it is infeasible to manually design good schemas for the property tables of a very large RDF dataset. We propose a novel data-mining technique called Attribute Clustering by Table Load (ACTL) that clusters a given set of attributes into correlated groups, so as to automatically generate the property table schemas. While ACTL is an NP-complete problem, we propose an agglomerative clustering algorithm with several effective pruning techniques to approximate the optimal solution. Experiments show that our algorithm can efficiently mine huge datasets (e.g., Wikipedia Infobox data) to generate good property table schemas, with which queries generally run faster than with triple stores and column-based databases.
Formal Models and Algorithms for XML Data InteroperabilityThomas Lee
In this paper, we study the data interoperability problem of web services in terms of XML schema compatibility. When Web Service A sends XML messages to Web Service B, A is interoperable with B if B can accept all messages from A. That is, the XML schema R for B to receive XML instances must be compatible with the XML schema S for A to send XML instances, i.e., A is a subschema of B. We propose a formal model called Schema Automaton (SA) to model W3C XML Schema (XSD) and develop several algorithms to perform different XML schema computations. The computations include schema minimization, schema equivalence testing, subschema testing, and subschema extraction. We have conducted experiments on an e-commerce standard XSD called xCBL to demonstrate the practicality of our algorithms. One experiment has refuted the claim that the xCBL 3.5 XSD is backward compatible with the xCBL 3.0 XSD. Another experiment has shown that the xCBL XSDs can be effectively trimmed into small subschemas for specific applications, which has significantly reduced the schema processing time.
Cloud Portability and Interoperability Architecture Model and Best Practices ...Thomas Lee
The document summarizes the key topics discussed at a meeting on cloud computing interoperability standards. It covered background on portability and interoperability concepts, a distributed computing reference model that applications can map to, recommendations for current practices and standards development, and conclusions. Recommendations included adopting loose coupling and service-oriented design principles, using standard interfaces and formats like OVF, and developing standards around application-platform and service management interfaces. The conclusions were that understanding interoperability issues can help adoption strategies and that portability/interoperability will become critical to cloud vendor competitiveness as the technology matures.
Architecture and Practices on Cloud Interoperability and PortabilityThomas Lee
Cloud computing is believed to be another big wave of Internet technology after World Wide Web and mobile computing. The Open Group has identified cloud computing as a major driver to develop global GDP. In Hong Kong, the Office of Government CIO (OGCIO) has established the Expert Group on Cloud Computing Services and Standards (EGCCSS) to drive cloud computing adoption and deployment. Various cloud technical committees, including the two groups mentioned above, have identified the interoperability and portability of cloud services as a key principle for stimulating and driving economic benefits. EGCCSS has formed a Working Group Cloud Computing Interoperability Standards (WGCCIS) specifically to address this challenge.
In this talk, Dr Thomas Lee shares his experience in working in WGCCIS as a co-opt member and introduces the Open Group Guide on Cloud Computing Portability and Interoperability. He explains the fundamental concepts of cloud interoperability and portability and the reference architecture to design interoperable interfaces between on-premise and cloud application components. He also discusses the architectural principles for supporting cloud service providers to develop interoperable cloud services. From the enterprise user perspective, he also summarizes some good practices from the Open Group Guide that help cloud consumers to formulate their cloud strategy to manage vendor lock-in when selecting cloud services.
ebXML Technology Development in Hong KongThomas Lee
The document summarizes ebXML technology development in Hong Kong. It discusses Project Phoenix, an initiative to establish ebXML infrastructure in Hong Kong, including R&D products like an ebXML registry and pilot projects applying ebXML to e-procurement and e-logistics. It also covers promotion activities, challenges in adopting ebXML technology, and a strategy to cross the chasm between early adopters and mainstream users.
Location: Speaker's Session, Hong Kong Computer Society XML Specialist Group, Hong Kong Polytechnic University /
Event Date: Aug 23, 2003 /
Organization: Hong Kong Computer Society
E government Interoperability Infrastructure DevelopmentThomas Lee
Location: The 2nd APEC E-Commerce Business Alliance Forum, Qingdao, China /
Organization: Asia-Pacific Economic Cooperation (APEC)
Conference End Date: May 19, 2006 /
Conference Start Date: May 18, 2006
Webformer: a Rapid Application Development Toolkit for Writing Ajax Web Form ...Thomas Lee
Organization: The 4th International Conference on Distributed Computing and Internet Technology (ICDCIT) /
Location: ISTA Hotel, Bangalore, India /
Conference Start Date: Dec 17, 2007 /
Conference End Date: Dec 20, 2007
E-Government Interoperability Infrastructure in Hong KongThomas Lee
Organization: OASIS /
Location: Open Standards for Business and Government Forum, Goacher Auditorium Wesley College, 40 Coode Street, South Perth, Australia /
Time: 1 PM to 5 PM /
Event Date: Jan 23, 2009
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Presentation of the OECD Artificial Intelligence Review of Germany
XML Schema Computations: Schema Compatibility Testing and Subschema Extraction
1. XML Schema Computations: Schema
Compatibility Testing and Subschema Extraction
Thomas Y.T. LEE and David W.L. Cheung
Department of Computer Science
The University of Hong Kong
October 28, 2010
CIKM 2010
Toronto, Canada
1
2. Outline
Introduction and motivation
Formal models for XML data and schemas
Schema computational algorithms
Experiments and conclusions
2
3. Outline
Introduction and motivation
Formal models for XML data and schemas
Schema computational algorithms
Experiments and conclusions
3
4. Data interoperability on web services
In order for two web services to be interoperable , the XML
schema on the message receiving end must accept all possible
XML messages from the sending end.
The sending schema must be a subschema of the receiving
schema.
_
∩
XML XML
Instances Instances
Schema A Schema B
Web Web
Service Service
A B
4
5. W3C XML Schema and data standards
1. W3C XML Schema (XSD) is the most popular schema
language to define data standards.
2. In order for the new version of an XSD to be
backward-compatible with the old version, the new version
must be a superschema of the old version.
The new schema must accept every instance of the old
schema.
3. However, a typical e-commerce standard XSD contains
thousands of types / elements, which makes manual
verification of compatibility hardly possible.
4. When an XSD is too large, how can we extract a smaller
subschema just enough for processing by a specific
application?
5
6. Schema compatibility problems
1. Given two XSDs, how to verify two XSDs are equivalent or
one is a subschema of the other?
2. Given XSD A , how to extract a smaller subschema of A called
B so that B recognizes only a subset of elements recognized
by A ?
3. In this research, we have developed the formal models for
XML data and schemas, as well as the algorithms to solve
these problems.
6
7. Outline
Introduction and motivation
Formal models for XML data and schemas
Schema computational algorithms
Experiments and conclusions
7
8. Data Tree (DT) to model XML data
A DT is a tree where edges represent elements and nodes
represent their contents.
<Quote> n0:ε
<Line> <Quote>
<Desc>hPhone</Desc>
<Price>499.9</Price> n1:ε
</Line>
<Line> <Line>
<Line>
<Desc>iMat</Desc> n2:ε n3:ε
<Price>999.9</Price>
<Desc> <Price> <Desc> <Price>
</Line>
</Quote> n4: n5: n6: n7:
"hPhone" "499.9" "iMat" "999.9"
8
9. Schema Automaton (SA) to model XML schemas
1. An SA is a deterministic finite automaton (DFA) where each
state is associated with a regular expression (RE) and a set of
values called value domain (VDom)
2. The DFA called vertical language (VLang) defines how the
symbols are arranged along the paths from the root to the
leaves.
2.1 Each state represents an XSD data type and each symbol
represents an element name.
3. The RE of a state called horizontal language (HLang)
defines how child elements can be arranged under an XSD
data type, i.e., content model.
4. The value domain defines the set of all possible values an
element can contain.
9
11. Outline
Introduction and motivation
Formal models for XML data and schemas
Schema computational algorithms
Experiments and conclusions
11
12. Schema compatibility testing
1. Schema equivalence testing and subschema testing .
2. A schema minimization is involved.
2.1 All useless states (data types) are removed first. A useless
state is an inaccessible state or a state which does not
recognize any element with a finite number of descendants.
2.2 The process is like a DFA minimization but the HLang and
VDom of each state are considered when deciding whether
two states can be merged.
3. We have proved that two SAs (XSDs) are equivalent iff their
minimized forms have isomorphic VLang DFAs and all
corresponding HLangs and VDoms are equivalent .
4. We have developed an algorithm to verify whether an SA is a
subschema of another SA.
12
13. Useless states
B q2
A
A
q0 A q7 q8
C q3 B
q1
C B C
q4 q5 A B
q6 q9
q HLang(q) VDom(q) q HLang(q) VDom(q)
q0 A{2,5}BC? STRINGS q5 C STRINGS
q1 C* STRINGS q6 A+B* INTEGERS
q2 { } INTEGERS q7 A? STRINGS
q3 A* STRINGS q8 B* STRINGS
q4 B+ STRINGS q9 { } DECIMALS
1. q7 and q8 are inaccessible.
2. q5 and q6 are irrational because they generate infinite children.
3. q9 is useless because it is blocked by irrational states.
4. q4 is useless because it must lead to an irrational state.
13
14. Schema minimization and equivalence
q HLang(q) VDom(q)
q0 Quote | Order { }
Schema A q1 Line + { }
<Line> q3 <Desc> q2 Line + { }
<Quote> q1
<Price> q5
q0 <Order> q3 Desc Price { }
<Line> <Qty>
q2 q4 q8 <Desc> q4 Product Qty { }
<Product>
<Price> q6 q5 { } STRS
q7 q6 { } DECS
q7 Desc Price { }
q8 { } INTS
q4 Product Qty { }
1. q3 and q7 can be merged into q9.
2. Two SAs are equivalent. q HLang(q) VDom(q)
q0 Quote | Order { }
<Desc> q5
<Line>
q1 Line + { }
<Quote> q1 q9 <Price>
<Order> <Product> q2 Line + { }
q0
<Line> q6 q9 Desc Price { }
q2 q4 <Qty>
q8 q4 Product Qty { }
q5 { } STRS
Schema B q6 { } DECS
q8 { } INTS
14
15. Subschema testing
q HLang(q) VDom(q)
Schema A q0 Quote | Order { }
q1 Line + { }
<Desc> q5
q2 Line + { }
q1 <Line>
<Quote> q9 <Price>
<Order> <Product> q9 Desc Price { }
q0
<Line> q6 q4 Product Qty { }
q2 q4 <Qty>
q8 q5 { } STRS
q6 { } DECS
q8 { } INTS
B is a subschema of A.
1. HLang(q0B ) ⊆ HLang(q0A ) and VDom(q0B ) = VDom(q0A ).
2. HLang(q6B ) = HLang(q6A ) and VDom(q6B ) ⊆ VDom(q6A ).
3. HLang(qiB ) = HLang(qiA ) and VDom(qiB ) = VDom(qiA ), for i = 1.5, 9.
q HLang(q) VDom(q)
<Desc> q5
q0 Quote { }
<Quote> <Line>
q0 q1 q9 <Price> q1 Line + { }
q6 q9 Desc Price { }
q5 { } STRS
Schema B q6 { } INTS
15
16. Subschema extraction
We have developed the subschema extraction algorithm:
Given SA (XSD) A and a set of symbols (element names) Z,
compute an SA which accepts all instances (XML documents)
of A except those containing some symbols not in Z.
<Desc> q4
q1 <Line>
<Quote> q2 <Price>
q0 <Order> <Product>
<Line> q5
q7 q3 <Qty>
q6
q HLang(q) VDom(q) q HLang(q) VDom(q)
q0 <Quote>|<Order> { } q3 <Product><Qty> { }
q1 <Line>+ { } q4 { } STRINGS
q7 <Line>+ { } q5 { } DECIMALS
q2 <Desc><Price> { } q6 { } INTEGERS
Z = {<Quote>, <Line>, <Desc>, <Price>, <Order>, <Qty>}, where <Product> is
excluded.
16
17. Outline
Introduction and motivation
Formal models for XML data and schemas
Schema computational algorithms
Experiments and conclusions
17
18. xCBL compatibility testing experiment
1. Data sets: XML Common Business Library
file no. of data element doc.
XSD size files types names types
xCBL 3.0 1.8MB 413 1,290 3,728 42
xCBL 3.5 2.0MB 496 1,476 4,473 51
2. The subschema testing program has disproved the claim on
xCBL.org:
The only modifications allowed to xCBL 3.0 documents were the
additions of new optional elements and additions to code lists; to
maintain interoperability between the two versions. An xCBL 3.0
instance of a document is also a valid instance in xCBL 3.5.
3. xCBL 3.5 is not a superschema of xCBL 3.0.
4. The experiment took only 272ms when the quick RE test
was applied.
Machine: Q6600@2.40GHz, 4GB RAM, Linux OS
18
19. Schema size reduction by subschema extraction
1. The subschema extraction program was run to extract
different subschemas from xCBL. Each subschema
recognizes a different element subset for a specific
application, e.g., order, invoice, etc.
2. The schema size was reduced to 6–32% of the original size.
3. The time required by XMLBeans to compile a subschema was
reduced to 34–50% of the time originally required.
4. The time to extract such a subschema was only 2–3s.
5000 35
#element names
#types 30
4000 #element declarations
XMLBeans compilation time 25
time (second)
3000
number
20
2000 15
10
1000
5
0 0
original invoice order quote auction catalog
Subschema extraction from xCBL 3.5.
19
20. Conclusions
1. We have developed:
formal models for XML and XSD, and
algorithms for schema equivalence and subschema testing,
and subschema extraction.
2. These algorithms are PSPACE-complete because of
comparions of regular expressions.
We have developed a heuristic (quick RE test) to make these
algorithms run fast on very large schemas.
3. Our experiments:
have proved that xCBL 3.5 is in fact not backward-compatible
with xCBL 3.0, and
have extracted small subschemas from xCBL for different
instance subsets, which largely reduce processing time on
these subschemas.
4. These models can be extended for other applications:
web service adaptor for legacy systems (text to XML
transformation), and
schema inferrer from XML instances.
20