An introduction to the XPath XML query possibilities. In particular, there is a focus on the abbreviations that makes XPath efficient to use. A larger section is allocated to explain and illustrated the use of axes in XPath
More and more organizations are moving their ETL workloads to a Hadoop based ELT grid architecture. Hadoop`s inherit capabilities, especially it`s ability to do late binding addresses some of the key challenges with traditional ETL platforms. In this presentation, attendees will learn the key factors, considerations and lessons around ETL for Hadoop. Areas such as pros and cons for different extract and load strategies, best ways to batch data, buffering and compression considerations, leveraging HCatalog, data transformation, integration with existing data transformations, advantages of different ways of exchanging data and leveraging Hadoop as a data integration layer. This is an extremely popular presentation around ETL and Hadoop.
Data Quality With or Without Apache Spark and Its EcosystemDatabricks
Few solutions exist in the open-source community either in the form of libraries or complete stand-alone platforms, which can be used to assure a certain data quality, especially when continuous imports happen. Organisations may consider picking up one of the available options – Apache Griffin, Deequ, DDQ and Great Expectations. In this presentation we’ll compare these different open-source products across different dimensions, like maturity, documentation, extensibility, features like data profiling and anomaly detection.
This Hadoop Hive Tutorial will unravel the complete Introduction to Hive, Hive Architecture, Hive Commands, Hive Fundamentals & HiveQL. In addition to this, even fundamental concepts of BIG Data & Hadoop are extensively covered.
At the end, you'll have a strong knowledge regarding Hadoop Hive Basics.
PPT Agenda
✓ Introduction to BIG Data & Hadoop
✓ What is Hive?
✓ Hive Data Flows
✓ Hive Programming
----------
What is Apache Hive?
Apache Hive is a data warehousing infrastructure built over Hadoop which is targeted towards SQL programmers. Hive permits SQL programmers to directly enter the Hadoop ecosystem without any pre-requisites in Java or other programming languages. HiveQL is similar to SQL, it is utilized to process Hadoop & MapReduce operations by managing & querying data.
----------
Hive has the following 5 Components:
1. Driver
2. Compiler
3. Shell
4. Metastore
5. Execution Engine
----------
Applications of Hive
1. Data Mining
2. Document Indexing
3. Business Intelligence
4. Predictive Modelling
5. Hypothesis Testing
----------
Skillspeed is a live e-learning company focusing on high-technology courses. We provide live instructor led training in BIG Data & Hadoop featuring Realtime Projects, 24/7 Lifetime Support & 100% Placement Assistance.
Email: sales@skillspeed.com
Website: https://www.skillspeed.com
Procesamiento de datos a gran escala con Apache SparkSoftware Guru
Apache Spark es un framework para procesamiento de datos en paralelo que permite el procesamiento de los mismos en la memoria. Es hasta 100x más rápido que Apache Hadoop. Hoy en día las aplicaciones estarán pensadas para DataWorkflows y Spark te permite esta interacción con esos datos ya sea en Scala o Python. Adicionalmente puedes aplicar una seríe de Transformaciones a esos datos y aplicar procesamiento en Grafos (GraphX) Machine Learning (MLLib)
More and more organizations are moving their ETL workloads to a Hadoop based ELT grid architecture. Hadoop`s inherit capabilities, especially it`s ability to do late binding addresses some of the key challenges with traditional ETL platforms. In this presentation, attendees will learn the key factors, considerations and lessons around ETL for Hadoop. Areas such as pros and cons for different extract and load strategies, best ways to batch data, buffering and compression considerations, leveraging HCatalog, data transformation, integration with existing data transformations, advantages of different ways of exchanging data and leveraging Hadoop as a data integration layer. This is an extremely popular presentation around ETL and Hadoop.
Data Quality With or Without Apache Spark and Its EcosystemDatabricks
Few solutions exist in the open-source community either in the form of libraries or complete stand-alone platforms, which can be used to assure a certain data quality, especially when continuous imports happen. Organisations may consider picking up one of the available options – Apache Griffin, Deequ, DDQ and Great Expectations. In this presentation we’ll compare these different open-source products across different dimensions, like maturity, documentation, extensibility, features like data profiling and anomaly detection.
This Hadoop Hive Tutorial will unravel the complete Introduction to Hive, Hive Architecture, Hive Commands, Hive Fundamentals & HiveQL. In addition to this, even fundamental concepts of BIG Data & Hadoop are extensively covered.
At the end, you'll have a strong knowledge regarding Hadoop Hive Basics.
PPT Agenda
✓ Introduction to BIG Data & Hadoop
✓ What is Hive?
✓ Hive Data Flows
✓ Hive Programming
----------
What is Apache Hive?
Apache Hive is a data warehousing infrastructure built over Hadoop which is targeted towards SQL programmers. Hive permits SQL programmers to directly enter the Hadoop ecosystem without any pre-requisites in Java or other programming languages. HiveQL is similar to SQL, it is utilized to process Hadoop & MapReduce operations by managing & querying data.
----------
Hive has the following 5 Components:
1. Driver
2. Compiler
3. Shell
4. Metastore
5. Execution Engine
----------
Applications of Hive
1. Data Mining
2. Document Indexing
3. Business Intelligence
4. Predictive Modelling
5. Hypothesis Testing
----------
Skillspeed is a live e-learning company focusing on high-technology courses. We provide live instructor led training in BIG Data & Hadoop featuring Realtime Projects, 24/7 Lifetime Support & 100% Placement Assistance.
Email: sales@skillspeed.com
Website: https://www.skillspeed.com
Procesamiento de datos a gran escala con Apache SparkSoftware Guru
Apache Spark es un framework para procesamiento de datos en paralelo que permite el procesamiento de los mismos en la memoria. Es hasta 100x más rápido que Apache Hadoop. Hoy en día las aplicaciones estarán pensadas para DataWorkflows y Spark te permite esta interacción con esos datos ya sea en Scala o Python. Adicionalmente puedes aplicar una seríe de Transformaciones a esos datos y aplicar procesamiento en Grafos (GraphX) Machine Learning (MLLib)
*** Apache Spark and Scala Certification Training: https://www.edureka.co/apache-spark-scala-training ***
This Edureka PPT on "RDD Using Spark" will provide you the detailed and comprehensive knowledge about RDD, which are considered to be the backbone of Apache Spark. You will learn about the various Transformations and Actions that can be performed on RDDs. This PPT will cover the following topics:
Need for RDDs
What are RDDs?
Features of RDDs
Creation of RDDs using Spark
Operations performed on RDDs
RDDs using Spark: Pokemon Use Case
Blog Series: http://bit.ly/2VRogGx
Complete Apache Spark and Scala playlist: http://bit.ly/2In8IXD
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
These are the slides from my talk at Data Day Texas 2016 (#ddtx16).
The world of data warehousing has changed! With the advent of Big Data, Streaming Data, IoT, and The Cloud, what is a modern data management professional to do? It may seem to be a very different world with different concepts, terms, and techniques. Or is it? Lots of people still talk about having a data warehouse or several data marts across their organization. But what does that really mean today in 2016? How about the Corporate Information Factory (CIF), the Data Vault, an Operational Data Store (ODS), or just star schemas? Where do they fit now (or do they)? And now we have the Extended Data Warehouse (XDW) as well. How do all these things help us bring value and data-based decisions to our organizations? Where do Big Data and the Cloud fit? Is there a coherent architecture we can define? This talk will endeavor to cut through the hype and the buzzword bingo to help you figure out what part of this is helpful. I will discuss what I have seen in the real world (working and not working!) and a bit of where I think we are going and need to go in 2016 and beyond.
Spark is fast becoming a critical part of Customer Solutions on Azure. Databricks on Microsoft Azure provides a first-class experience for building and running Spark applications. The Microsoft Azure CAT team engaged with many early adopter customers helping them build their solutions on Azure Databricks.
In this session, we begin by reviewing typical workload patterns, integration with other Azure services like Azure Storage, Azure Data Lake, IoT / Event Hubs, SQL DW, PowerBI etc. Most importantly, we will share real-world tips and learnings that you can take and apply in your Data Engineering / Data Science workloads
Binary Similarity : Theory, Algorithms and Tool EvaluationLiwei Ren任力偉
Similarity digesting is a class of algorithms and technologies that generate hashes from files and preserve file similarity. They find applications in various areas across security industry: malware variant detection, spam filtering, computer forensic analysis, data loss prevention and etc.. There are a few schemes and tools available that include ssdeep, sdhash and TLSH. While being useful for detecting file similarity, they define similarity from different perspectives. In other words, they take different approaches to describe what file similarity is about. In order to compare those tools with better evaluation, we introduce a simple mathematical model to describe similarity that would cover all three schemes and beyond. This model enables us to establish a theoretic framework for analyzing essential differences of various similarity digesting algorithms & tools. As a result, a few tools are found to be complementary to each other so that we can use them in a hybrid approach in practice. Data experiment results are provided to support the theoretic analysis. In addition, we introduce a novel similarity digesting scheme that were designed based on the mathematical model.
How to describe a dataset. Interoperability issuesValeria Pesce
Presented by Valeria Pesce during the pre-meeting of the Agricultural Data Interoperability Interest Group (IGAD) of the Research Data Alliance (RDA), held on 21 and 22 September 2015 in Paris at INRA.
This presentation contains the introduction to NOSQL databases, it's types with examples, differentiation with 40 year old relational database management system, it's usage, why and we should use it.
Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-bas...Willy Lulciuc
At WeWork, it's critical that we understand the complete context for all datasets. We also want to be able to explore dependencies between jobs and the datasets they produce and consume. To do this, WeWork needs metadata. In this talk I will focus on Marquez, a core service for the collection, aggregation and visualization of a data ecosystems metadata. Marquez maintains the provenance of how datasets are consumed and produced while providing global visibility into job runtime.
*** Apache Spark and Scala Certification Training: https://www.edureka.co/apache-spark-scala-training ***
This Edureka PPT on "RDD Using Spark" will provide you the detailed and comprehensive knowledge about RDD, which are considered to be the backbone of Apache Spark. You will learn about the various Transformations and Actions that can be performed on RDDs. This PPT will cover the following topics:
Need for RDDs
What are RDDs?
Features of RDDs
Creation of RDDs using Spark
Operations performed on RDDs
RDDs using Spark: Pokemon Use Case
Blog Series: http://bit.ly/2VRogGx
Complete Apache Spark and Scala playlist: http://bit.ly/2In8IXD
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
These are the slides from my talk at Data Day Texas 2016 (#ddtx16).
The world of data warehousing has changed! With the advent of Big Data, Streaming Data, IoT, and The Cloud, what is a modern data management professional to do? It may seem to be a very different world with different concepts, terms, and techniques. Or is it? Lots of people still talk about having a data warehouse or several data marts across their organization. But what does that really mean today in 2016? How about the Corporate Information Factory (CIF), the Data Vault, an Operational Data Store (ODS), or just star schemas? Where do they fit now (or do they)? And now we have the Extended Data Warehouse (XDW) as well. How do all these things help us bring value and data-based decisions to our organizations? Where do Big Data and the Cloud fit? Is there a coherent architecture we can define? This talk will endeavor to cut through the hype and the buzzword bingo to help you figure out what part of this is helpful. I will discuss what I have seen in the real world (working and not working!) and a bit of where I think we are going and need to go in 2016 and beyond.
Spark is fast becoming a critical part of Customer Solutions on Azure. Databricks on Microsoft Azure provides a first-class experience for building and running Spark applications. The Microsoft Azure CAT team engaged with many early adopter customers helping them build their solutions on Azure Databricks.
In this session, we begin by reviewing typical workload patterns, integration with other Azure services like Azure Storage, Azure Data Lake, IoT / Event Hubs, SQL DW, PowerBI etc. Most importantly, we will share real-world tips and learnings that you can take and apply in your Data Engineering / Data Science workloads
Binary Similarity : Theory, Algorithms and Tool EvaluationLiwei Ren任力偉
Similarity digesting is a class of algorithms and technologies that generate hashes from files and preserve file similarity. They find applications in various areas across security industry: malware variant detection, spam filtering, computer forensic analysis, data loss prevention and etc.. There are a few schemes and tools available that include ssdeep, sdhash and TLSH. While being useful for detecting file similarity, they define similarity from different perspectives. In other words, they take different approaches to describe what file similarity is about. In order to compare those tools with better evaluation, we introduce a simple mathematical model to describe similarity that would cover all three schemes and beyond. This model enables us to establish a theoretic framework for analyzing essential differences of various similarity digesting algorithms & tools. As a result, a few tools are found to be complementary to each other so that we can use them in a hybrid approach in practice. Data experiment results are provided to support the theoretic analysis. In addition, we introduce a novel similarity digesting scheme that were designed based on the mathematical model.
How to describe a dataset. Interoperability issuesValeria Pesce
Presented by Valeria Pesce during the pre-meeting of the Agricultural Data Interoperability Interest Group (IGAD) of the Research Data Alliance (RDA), held on 21 and 22 September 2015 in Paris at INRA.
This presentation contains the introduction to NOSQL databases, it's types with examples, differentiation with 40 year old relational database management system, it's usage, why and we should use it.
Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-bas...Willy Lulciuc
At WeWork, it's critical that we understand the complete context for all datasets. We also want to be able to explore dependencies between jobs and the datasets they produce and consume. To do this, WeWork needs metadata. In this talk I will focus on Marquez, a core service for the collection, aggregation and visualization of a data ecosystems metadata. Marquez maintains the provenance of how datasets are consumed and produced while providing global visibility into job runtime.
LDM Slides: Data Modeling for XML and JSONDATAVERSITY
Data modeling has traditionally focused on relational database systems. But in the age of the internet, technologies such as XML and JSON have evolved to provide structure and definition to “data in motion”. Have data modeling technologies evolved to support these technologies? Can we use traditional approaches to model data in XML and JSON? Or are new tools and methodologies required? Join this webinar to discuss:
- XML & JSON vs. Relational Database Modeling
- Techniques & Tools for Data Modeling for XML
- Techniques & Tools for Data Modeling for JSON
- Use Cases & Opportunities for XML and JSON Data Modeling
Les Hazlewood, Stormpath co-founder and CTO and the Apache Shiro PMC Chair demonstrates how to design a beautiful REST + JSON API. Includes the principles of RESTful design, how REST differs from XML, tips for increasing adoption of your API, and security concerns.
Presentation video: https://www.youtube.com/watch?v=5WXYw4J4QOU
More info: http://www.stormpath.com/blog/designing-rest-json-apis
Further reading: http://www.stormpath.com/blog
Sign up for Stormpath: https://api.stormpath.com/register
Stormpath is a user management and authentication service for developers. By offloading user management and authentication to Stormpath, developers can bring applications to market faster, reduce development costs, and protect their users. Easy and secure, the flexible cloud service can manage millions of users with a scalable pricing model.
This slide show is from my presentation on what JSON and REST are. It aims to provide a number of talking points by comparing apples and oranges (JSON vs. XML and REST vs. web services).
A comparison of a database table to an XML document. There is an overview of basic XML concepts suchs as attribute, element, entity, and tag. Data centric and document centric XML document are covered.
Introduction to the usage of DTDs in connection with XML documents. Elements and attributes are introduced in details. Use of ID, IDREF, and IDREFS for uniqueness and referring to elements are illustrated using a number of examples.
Alannah fitzgerald The TOETOE project planning for impactLORO
Slides of the morning presentation by Alannah Fitzgerald for the event : "Does it make a difference? The impact of repositories and OERs on teaching and learning", March 2011
Sentiment analysis using naive bayes classifier Dev Sahu
This ppt contains a small description of naive bayes classifier algorithm. It is a machine learning approach for detection of sentiment and text classification.
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesMax Irwin
Presentation as given to the Haystack Conference, which outlines research and techniques for automatic extraction of keywords, concepts, and vocabularies from text corpora.
Looking ahead in your DBA program, the final step in the course sequ.docxwashingtonrosy
Looking ahead in your DBA program, the final step in the course sequence is a comprehensive exam. Students must successfully pass the comprehensive exam before moving on to the dissertation phase.
Your preparation for the comprehensive exam should have already started via your study in previous courses, self-documentation of your learning, and completion of the reflection journals with your mentor.
For the Week 3 paper, you will provide an annotated outline that:
Segments the OB field into at least three broad categories. You may have more than three categories, but the minimum for this assignment is three. For example:
Individual
Group/Team
Organizational/Environmental
Provides a set of core topics beneath each of the broad categories. At this point in your doctoral development, you must have at least five core topics for each category (thus, a minimum of fifteen total). You may have more than five topics for each category (and eventually you will in your preparation for the exam!) but the minimum for this assignment is five topics for each of your categories (or, at least a total of fifteen topics if you have more than three categories).
Integrates one annotated resource (key author—with brief description or bullet points of important points the author makes about the topic) for your fifteen core topics. You may provide more than one resource per topic, but you must have at least one resource each for a minimum of fifteen topics.
Synthesizes key learning from working on the guide, next steps you will take working on your guide, and potential benefits beyond comprehensive exam preparation.
Sample Structure:
Summary assessment of your three (or more) part schema. What is it? Why is it an appropriate way to view the OB field?
First Category (e.g., Individual Level of Analysis)
Topic 1:
Brief definition, description, and overview (2–3 sentences)
Key resource:
APA formatted reference
Summary of key insights about the topic from the source
Topic 2:
Brief definition, description, and overview (2–3 sentences)
Key resource:
APA formatted reference
Summary of key insights about the topic from the source
Topic 3:
Brief definition, description, and overview (2–3 sentences)
Key resource:
APA formatted reference
Summary of key insights about the topic from the source
Topic 4:
Brief definition, description, and overview (2–3 sentences)
Key resource:
APA formatted reference
Summary of key insights about the topic from the source
Topic 5:
Brief definition, description, and overview (2–3 sentences)
Key resource:
APA formatted reference
Summary of key insights about the topic from the source
Second Category (e.g., Group/Team Level of Analysis)
Topic 1:
Brief definition, description, and overview (2–3 sentences)
Key resource:
APA formatted reference
Summary of key insights about the topic from the source.
Clustering the results of a search helps the user to overview the information returned. In this paper, we
look upon the clustering task as cataloguing the search results. By catalogue we mean a structured label
list that can help the user to realize the labels and search results. Labelling Cluster is crucial because
meaningless or confusing labels may mislead users to check wrong clusters for the query and lose extra
time. Additionally, labels should reflect the contents of documents within the cluster accurately. To be able
to label clusters effectively, a new cluster labelling method is introduced. More emphasis was given to
/produce comprehensible and accurate cluster labels in addition to the discovery of document clusters. We
also present a new metric that employs to assess the success of cluster labelling. We adopt a comparative
evaluation strategy to derive the relative performance of the proposed method with respect to the two
prominent search result clustering methods: Suffix Tree Clustering and Lingo.
we perform the experiments using the publicly available Datasets Ambient and ODP-239
Now the age of information technology, textual document is spontaneously increasing over the internet, e-mail, b pages, offline and online reports, journals, articles and they stored in the electronic database format. Millions of new text file created in a day, but for the proper classification, people miss vast information those are useful to several challenges in daily life. To maintain and access those documents are very difficult without adequate rating and when there has classification without any information provide call clustering. To overcome such difficulties K-means and others old clustering algorithms are unfit to impart as may be expected on Natural languages. Because of high-dimensional about texts, the presence of logical structure clues within the texts and novel segmentation techniques have taken advantage of advances in generative topic modeling algorithms, specifically designed to spot questions at intervals text to cipher word–topic distributions. By considering those challenges there, in the current thesis proposed a semantic document clustering framework and the framework be developed by using Python platform and tested each of steps. In this context there have preprocessing steps like tag elimination, removed stop words according to Oxford dictionary, applying lemmatization process after getting the help of WordNet semantic information available and synsets for each word individually from raw text. So considering the limitation of K-Means algorithm and other old algorithms, COBB conceptual clustering algorithm applied to the preprocessed data in this context. Clusters quality and accuracy is one of the most significant contributions to this research. For ensuring the accuracy of clusters, the f-measure accuracy measuring methods selected for evaluate the clusters and feedback the accuracy of clusters. F-Measure returns the accuracy of clusters and also ensuring the purity of clustering process. Framework tests on 20 samples of 20 different articles and minimum accuracy considered as the accuracy of the clusters and the developed system return 71.42% accurate. There are several challenges, such as synonym, high dimensionality, extracting core semantics from texts, and assigning appropriate description for the generated clusters need to experiment further. This research to work to find an accurate way to cluster text documents based on semantic meaning by the help of WordNet database.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Introduction to XPath
1. Introduction to XPath
Kristian Torp
Department of Computer Science
Aalborg University
people.cs.aau.dk/˜torp
torp@cs.aau.dk
November 3, 2015
daisy.aau.dk
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 1 / 59
2. Outline
1 Introduction
2 Tree Terminology
3 Location Path and Steps
4 XPath Path Expressions
5 Axes
6 Summary
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 2 / 59
3. Learning Goals and Focus
Learning Goals
Understand the XPath data model
Know the basic tree terminology
Good at querying XML documents using XPath
Know the abbreviations used in XPath
Very handy to know in practice
Compact and quite readable!
Database Focus
All XML technologies are presented from a database perspective also
called a data focus (i.e., not a document focus)!
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 3 / 59
4. Outline
1 Introduction
2 Tree Terminology
3 Location Path and Steps
4 XPath Path Expressions
5 Axes
6 Summary
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 4 / 59
5. Introduction
Example
Find all courses: /coursecatalog/course
Find the semesters: //semester/text()
Overview
A language for
finding/addressing information in XML documents
navigating through elements and attributes in an XML document
Used in many XML technologies, e.g., XQuery and XPointer
A part of the XSLT recommendation
Microsoft/Visual Studio makes heavy usage of XSLT
The data model is an abstract and logical structure of an XML
document
Called a node tree
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 5 / 59
6. The Node Tree
Terminology
Document node: The entire XML document
Also called the document root or the root node
Element node: An XML element
A special one is the document element or root element
Text node: The text strings in an element node
Attribute node: An attribute
Example (A Node Tree)
/
coursecatalog
course
id=4 name
OOP
semester
3
desc
snip
course
id=2 name
DB
semester
7
desc
snip
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 6 / 59
7. Example: Find the Courses
Example (Document)
/
coursecatalog
course
id=4 name
OOP
semester
3
desc
snip
course
id=2 name
DB
semester
7
desc
snip
Query
/coursecatalog/course
Result
course
id=4 name
OOP
semester
3
desc
snip
course
id=2 name
DB
semester
7
desc
snip
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 7 / 59
8. Example: Find the Semesters
Example (Document)
/
coursecatalog
course
id=4 name
OOP
semester
3
desc
snip
course
id=2 name
DB
semester
7
desc
snip
Query
//semester/text()
Result
3 7
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 8 / 59
9. Major Components
Components
Nodes
XML document treated as a tree of nodes
Examples: Elements, attributes, and comments
Path expressions
Select a set of nodes in an XML document
Examples: /, /coursecatalog/course
Standard functions
Approximate 100 built-in functions
Examples: concat(’a’, ’b’), round(1.5)
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 9 / 59
10. Quiz
Example (Document)
/
coursecatalog
course
id=4 name
OOP
semester
3
desc
snip
course
id=2 name
DB
semester
7
desc
snip
Questions
Who is the parent of the document element?
How many document elements are there in an XML document?
How many elements can there be in an XML document?
Are elements and attributes the same node type?
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 10 / 59
11. Outline
1 Introduction
2 Tree Terminology
3 Location Path and Steps
4 XPath Path Expressions
5 Axes
6 Summary
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 11 / 59
15. Tree Terminology
Example (A Node Tree)
1
2
3 4
5
6
7
8
9 A B
Children of 1
Quiz
Who are the children of 3?
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 12 / 59
16. Tree Terminology
Example (A Node Tree)
1
2
3 4
5
6
7
8
9 A B
Siblings of 9
Quiz
Who are the siblings of 3?
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 12 / 59
17. Tree Terminology
Example (A Node Tree)
1
2
3 4
5
6
7
8
9 A B
Ancestors of 6
Quiz
Who are the ancestors of 9?
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 12 / 59
18. Tree Terminology
Example (A Node Tree)
1
2
3 4
5
6
7
8
9 A B
Parent of 8
Quiz
Who are the parents of 4?
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 12 / 59
19. Tree Terminology
Example (A Node Tree)
1
2
3 4
5
6
7
8
9 A B
Descendants of 1
Quiz
Who are the descendants of 5?
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 12 / 59
20. Quiz
Example (Another Node Tree)
1
2
3 4
5 6 7
8
9
A
B
C
D E F
G
H I
J
Questions
Parent of E?
Children of 2?
Descendants of 2?
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 13 / 59
21. Outline
1 Introduction
2 Tree Terminology
3 Location Path and Steps
4 XPath Path Expressions
5 Axes
6 Summary
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 14 / 59
22. Location Path and Location Step I
Definition (Location Path)
A location path evaluates to a sequence of nodes
Example (Location Path)
/child::coursecatalog/child::course[name=’OOP’or name=’DB’][@id<10]
Definition (Location Step)
A location path consists of a number of location steps.
Example (Location Steps)
child::coursecatalog
child::course[name=’OOP’or name=’DB’][@id<10]
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 15 / 59
23. Location Path and Location Step II
Definition
A location step consists of an axis, a node test, and a set of predicates
Example (One)
child::coursecatalog
Axis: child
Node test: coursecatalog
Predicates: empty
Example (Two)
child::course[name=’OOP’or name=’DB’][@id<10]
Axis: child
Node test: course
Predicates: [name=’OOP’or name=’DB’][@id<10]
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 16 / 59
24. Abbreviations
Most Used
Abbreviation Meaning
. self::node()
.. parent::node()
//coursecatalog /descendant-or-self::coursecatalog
course child::course
Example (Abbreviations in Action)
Abbreviation Meaning
//name /descendant-or-self::name
//name/.. /descendant-or-self::name/parent::node()
/coursecatalog/course /child::coursecatalog/child::course
Note
Abbreviations makes the expression more readable
Sometimes abbreviations can make it hard to guess the result
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 17 / 59
25. Evaluation of Location Path I
Example XML Document
/
coursecatalog
course
id=4 name
OOP
semester
3
desc
snip
course
id=2 name
DB
semester
7
desc
snip
Evalute the Location Path
/child::coursecatalog/child::course[name=’OOP’or name=’DB’][@id<10]/name
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 18 / 59
26. Evaluation of Location Path II
The Steps in the Evaluation
1 Starts with / therefore the context node is set to root node
2 Evaluate the location step child::coursecatalog
3 Result is the coursecatalog root element node
4 Set context to root element node
5 Evaluate the location step
child::course[name=’OOP’or name=’DB’][@id<10]
6 The result is the two course element nodes
7 Set context to the OOP course element node
8 Evaluate the location step child::name
9 Results in the name element node which is the first part of the result
10 Set context to the DB course element node
11 Evaluate the location step child::name
12 Results in the name element node which is the last part of the result
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 19 / 59
27. Context
Definition (Context)
A context node (a node in the node tree)
A context size and context position
A set of variable bindings
A function library
A set of name space declaration
Definition (Context Size)
The context size is the lenght of the sequence of nodes return by the
previous location step
Definition (Context Position)
The context position is the current node in the sequence being evaluated
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 20 / 59
28. Outline
1 Introduction
2 Tree Terminology
3 Location Path and Steps
4 XPath Path Expressions
5 Axes
6 Summary
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 21 / 59
29. Compact Notation for Node Tree
Example (The Node Tree)
/
coursecatalog
course
id=4 name
OOP
semester
3
desc
snip
course
id=2 name
DB
semester
7
desc
snip
Example (The Equivalent Compact Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 22 / 59
30. Example: Find the Courses
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query
/coursecatalog/course
Result
course:OOP
id=4 name:OOP sem:3 dsc
course:DB
id=2 name:DB sem:3 dsc
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 23 / 59
31. Example: Find Elements That Do Not Exist
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query
/coursecatalog/name
Result
Empty no name element below coursecatalog!
Note that it is not an error!
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 24 / 59
32. Example: Find the Course Names
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query
/coursecatalog//name
Result
name:OOP name:DB
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 25 / 59
33. Examples: Find the OOP Course
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query
/coursecatalog/course[name="OOP"]
Result
course:OOP
id=4 name:OOP sem:3 dsc
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 26 / 59
34. Example: Find a Course Based on ID
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query
/coursecatalog/course[@id="2"]
Result
course:DB
id=2 name:DB sem:7 dsc
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 27 / 59
35. Example: Filter on an Attribute
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query
/coursecatalog/course[@id="2"]/name
Result
name:DB
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 28 / 59
36. Example: Get the Name of a Course as a String
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query
/coursecatalog/course[@id="2"]/name/text()
Result
The string DB
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 29 / 59
37. Example: Use Parent Axis
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query
//course[@id="2"]/parent::node()
Result
The document node, i.e., the entire tree
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 30 / 59
38. Example: Use Child Axis
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query
/coursecatalog/child::node()
Result
course:OOP
id=4 name:OOP sem:3 dsc
course:DB
id=2 name:DB sem:7 dsc
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 31 / 59
39. Example: Use Descendant Axis
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query
/coursecatalog/descendant::node()
Result
8 element nodes
6 text nodes
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 32 / 59
40. Example: Use Functions
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query
concat("hello, ", "world!")
Result
The string ’hello, world!’
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 33 / 59
41. Example: Functions and XPath Expressions
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query
concat("hello ", /coursecatalog/course[@id="2"]/name/text())
Result
The string ’hello DB’
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 34 / 59
42. Most used Path Expressions
Often Used Expressions
Path Expression Description
/ select from the root node
//NodeName select NodeName element nodes
. select the current node
.. select parent of the current node
/NodeName[@id>7] select based on attribute node
/NodeName[Node2=’H’] select based on element node
/NodeName/text() select the text node value
/NodeName/attribute() select the attribute nodes
/NodeName[1] select the first NodeName element node
/NodeName[last()] select the last NodeName element node
Note
Almost like Linux/Unix directory navigation
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 35 / 59
43. Quiz
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Questions
/coursecatalog/course/name returns?
/coursecatalog/teacher returns?
/coursecatalog is the same as /?
/coursecatalog/course/../course/../course returns?
/coursecatalog/course[@id<11]/name/text() returns?
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 36 / 59
44. Outline
1 Introduction
2 Tree Terminology
3 Location Path and Steps
4 XPath Path Expressions
5 Axes
6 Summary
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 37 / 59
45. Node Numbering
Example (Node Tree)
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Note
Depth-first numbering of nodes
Used for relative access to other nodes
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 38 / 59
46. Forward and Backward Axes
Definition (Axis)
An axis is a sequence of nodes located relative to the context node.
Definition (Forward Axis)
A forward axis can only return the context node or nodes after in the
document order.
Definition (Backward Axis)
An backward axis can only return the context node or nodes that are
before in the document order.
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 39 / 59
47. The Axes
Axis Name Direction Description
attribute forward All my attributes
self forward My self
child forward All my children
descendant forward All my children, grand children, etc.
parent backward My unique parent
ancestor backward My parent, grand parent, etc.
following forward All after me that are not ancestors
preceding backward All before me that are not ancestors
following-sibling forward My “younger” siblings
preceding-sibling backward My “elder” siblings
descendant-or-self forward My self and all my descendants
ancestor-or-self backward My self or my ancestors
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 40 / 59
48. Child
Finds
Immediately descendants to current node.
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
cur
1 2 3
Quiz
Which direction of the child axis (and why)?
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 41 / 59
49. Child Examples
Example (Document Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Queries
/coursecatalog/child::node()
Result: the two course nodes
/coursecatalog/course/child::node()
Result: six element nodes
/coursecatalog/course/attribute()
Result: two attribute nodes
/coursecatalog/course/semester/child::node()
Result: two text nodes
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 42 / 59
50. Parent
Finds
The one node immediately above
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
1
cur
Quiz
Which direction of the parent axis (and why)?
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 43 / 59
51. Parent Examples
Example (Document Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Queries
/coursecatalog/course[@id=’2’]/name/parent::node()
Result: the course element node with id = 2
/coursecatalog/course/name/parent::node()
Result: the two course element nodes
/coursecatalog/parent::node()
Result: the document root
/parent::node()
Result: empty
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 44 / 59
52. Descendent
Finds
Children all the way down the tree
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
cur
1
2 3
4 5
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 45 / 59
53. Descendant Examples
Example (Document Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Queries
/coursecatalog/descendant::node()
Result: 8 element nodes + 6 text nodes
/coursecatalog/course[name="OOP"]/descendant::node()
Result: 3 element nodes + 3 text nodes
/coursecatalog/course[name="OOP"]/descendant::node()/attribute()
Result: 2 attribute nodes
/coursecatalog/course/name/descendant::node()
Result: two text nodes
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 46 / 59
54. Ancestor
Finds
Parents all the way up the tree
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
4
3
2
1
cur
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 47 / 59
56. Following
Finds
All nodes that follows excluding descendants
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
cur 1 2
3 4 5
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 49 / 59
57. Following Examples
Example (Document Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Queries
/coursecatalog/course[@id="4"]/following::node()
Result: 4 element nodes + 3 text nodes
/coursecatalog/course[@id="2"]/following::node()
Result: empty
/coursecatalog/course[@id="4"]/name/text()/following::node()
Result: 6 element nodes and 5 text nodes
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 50 / 59
58. Preceding
Finds
All preceding nodes excluding ancestors
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
3
2 1
cur
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 51 / 59
59. Preceding Examples
Example (Document Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Queries
/coursecatalog/course[@id="4"]/semester/text()/preceding::node()
Result: 1 element node + 1 text node, root element is anscestor
/coursecatalog/course/preceding::node()
Result: the OOP course 4 element nodes + 3 text nodes
/coursecatalog/course[name="OOP"]/preceding::node()
Result: empty
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 52 / 59
60. Following Sibling
Finds
All siblings nodes following
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
cur 1 2 3
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 53 / 59
61. Following Sibling Examples
Example (Document Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Queries
/coursecatalog/course/following-sibling::node()
Result: 1 element node (the DB course)
/coursecatalog/course[@id="2"]/following-sibling::node()
Result: empty
/coursecatalog/course/semester/following-sibling::node()
Result: 2 element nodes (descriptions)
/coursecatalog/course/@id/following-sibling::node()
Result: empty
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 54 / 59
62. Preceding Sibling
Finds
All siblings nodes before
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
2 1 cur
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 55 / 59
63. Preceding Sibling Examples
Example (Document Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Queries
/coursecatalog/course/preceding-sibling::node()
Result: 1 element node (the OOP course)
/coursecatalog/course[@id="2"]/preceding-sibling::node()
Result: 1 element node (the OOP course)
/coursecatalog/course/semester/preceding-sibling::node()
Result: 2 element nodes (names)
/coursecatalog/course/desc/preceding-sibling::node()
Result: 4 element nodes (0 attribute nodes)
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 56 / 59
64. Outline
1 Introduction
2 Tree Terminology
3 Location Path and Steps
4 XPath Path Expressions
5 Axes
6 Summary
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 57 / 59
65. Summary: XPath
Main Points
XPath is widely used
Not an XML syntax!
XPath is used for many purposes in related XML technologies
XQuery
XSLT
SQL/XML
W3C Recommendation November 1999 www.w3.org/TR/xpath
Note
Very good idea to get familiar with XPath
XPath is the foundation for understanding other XML technologies
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 58 / 59
66. Additional Information
Web Sites
www.w3schools.com/XPath/xpath_intro.asp: W3C is always a
good place to start
www.stylusstudio.com/w3c/xpath/: A very good and quite
elaborated tutorial
www.devarticles.com/c/a/XML/Introduction-to-XPath/: Good
4 page tutorial
pierre.senellart.com/wdmd/chap-xpath.pdf: A description of
the XPath data model
Tools
pgfearo.googlepages.com/: A very good tool for playing around
with XPath
There is an introduction screencast
http://www.bit-101.com/xpath/: A good online tool
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 59 / 59