XML and Web Data discusses XML, its characteristics, elements, and schemas. XML is used to simplify data exchange between software agents. It uses tags like HTML but is extensible and has no predefined semantics. XML documents must be well-formed with a root element and properly nested tags. Namespaces are used to avoid naming conflicts. XML schemas define rules for XML documents and can specify data types and constraints.
The document discusses the objectives and syllabus of an IT course on Service Oriented Architecture, including learning XML fundamentals, building XML-based applications, understanding SOA principles and web services technologies, and building SOA-based applications; it provides the textbook and reference book details and outlines the topics to be covered in each unit such as XML document structure, building XML applications, SOA, and web services.
This document provides an overview of XML programming and XML documents. It discusses the physical and logical views of an XML document, document structure including the root element, and how XML documents are commonly stored as text files. It also summarizes how an XML parser reads and validates an XML document by checking its syntax and structure. The document then covers various XML components in more detail, such as elements, attributes, character encoding, entities, processing instructions, well-formedness, validation via DTDs, and document modeling.
XML documents can be represented and stored in memory as tree structures using models like DOM and XDM. XPath is an expression language used to navigate and select parts of an XML tree. It allows traversing elements and their attributes, filtering nodes by properties or position, and evaluating paths relative to a context node. While XPath expressions cannot modify the document, they are commonly used with languages like XSLT and XQuery which can transform or extract data from XML trees.
This document discusses the structure and components of an XML document. It explains that an XML document consists of elements, attributes, comments, processing instructions, and a document type declaration. It describes each of these components in detail, including their purpose and general syntax. The document type declaration identifies the document and can reference an internal or external DTD that defines the valid elements and attributes.
This document discusses style sheet languages like CSS that are used to control the presentation of XML documents. CSS allows one to specify things like fonts, colors, spacing etc. for different elements in an XML file. A single XML file can then be formatted in multiple ways just by changing the associated CSS stylesheet without modifying the XML content. The document provides examples of using CSS selectors, rules and properties to style elements in an XML file and controlling presentation aspects like layout of elements on a page. It also discusses how to link the CSS stylesheet to an XML file using processing instructions.
The document discusses XPath, which is a language for finding information in an XML document. It defines XPath syntax using path expressions to select nodes. It describes XPath terminology like nodes, relationships between nodes, and functions. Examples are provided to demonstrate XPath expressions for selecting elements, attributes, and filtering nodes. Predicates are also described for finding specific nodes or values.
XML (eXtensible Markup Language) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. It is designed to transport and store data with a focus on what data is. XML has several advantages over HTML such as being extensible, content-oriented, and providing a standard data infrastructure and data validation capabilities. XML documents form a tree structure with properly nested elements. XML uses tags to mark elements and attributes to provide additional information about elements.
An attribute declaration specifies attributes for elements in a DTD. It defines the attribute name, data type or permissible values, and required behavior. For example, an attribute may have a default value if not provided, be optional, or require a value. Notations can label non-XML data types and unparsed entities can import binary files. Together DTDs and entities provide a schema to describe document structure and relationships.
The document discusses the objectives and syllabus of an IT course on Service Oriented Architecture, including learning XML fundamentals, building XML-based applications, understanding SOA principles and web services technologies, and building SOA-based applications; it provides the textbook and reference book details and outlines the topics to be covered in each unit such as XML document structure, building XML applications, SOA, and web services.
This document provides an overview of XML programming and XML documents. It discusses the physical and logical views of an XML document, document structure including the root element, and how XML documents are commonly stored as text files. It also summarizes how an XML parser reads and validates an XML document by checking its syntax and structure. The document then covers various XML components in more detail, such as elements, attributes, character encoding, entities, processing instructions, well-formedness, validation via DTDs, and document modeling.
XML documents can be represented and stored in memory as tree structures using models like DOM and XDM. XPath is an expression language used to navigate and select parts of an XML tree. It allows traversing elements and their attributes, filtering nodes by properties or position, and evaluating paths relative to a context node. While XPath expressions cannot modify the document, they are commonly used with languages like XSLT and XQuery which can transform or extract data from XML trees.
This document discusses the structure and components of an XML document. It explains that an XML document consists of elements, attributes, comments, processing instructions, and a document type declaration. It describes each of these components in detail, including their purpose and general syntax. The document type declaration identifies the document and can reference an internal or external DTD that defines the valid elements and attributes.
This document discusses style sheet languages like CSS that are used to control the presentation of XML documents. CSS allows one to specify things like fonts, colors, spacing etc. for different elements in an XML file. A single XML file can then be formatted in multiple ways just by changing the associated CSS stylesheet without modifying the XML content. The document provides examples of using CSS selectors, rules and properties to style elements in an XML file and controlling presentation aspects like layout of elements on a page. It also discusses how to link the CSS stylesheet to an XML file using processing instructions.
The document discusses XPath, which is a language for finding information in an XML document. It defines XPath syntax using path expressions to select nodes. It describes XPath terminology like nodes, relationships between nodes, and functions. Examples are provided to demonstrate XPath expressions for selecting elements, attributes, and filtering nodes. Predicates are also described for finding specific nodes or values.
XML (eXtensible Markup Language) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. It is designed to transport and store data with a focus on what data is. XML has several advantages over HTML such as being extensible, content-oriented, and providing a standard data infrastructure and data validation capabilities. XML documents form a tree structure with properly nested elements. XML uses tags to mark elements and attributes to provide additional information about elements.
An attribute declaration specifies attributes for elements in a DTD. It defines the attribute name, data type or permissible values, and required behavior. For example, an attribute may have a default value if not provided, be optional, or require a value. Notations can label non-XML data types and unparsed entities can import binary files. Together DTDs and entities provide a schema to describe document structure and relationships.
An XML schema defines the structure and elements of an XML document. It is an XML-based alternative to DTDs that allows defining element types, attributes, data types, defaults and restrictions. Schemas support namespaces, data types, extensibility and are written in XML, allowing the use of XML tools. Complex elements can contain child elements, text or both.
The document discusses XML document structure and XML schema. It provides information on the key components of an XML document including the XML declaration, document type declaration, element data, attribute data, and character data. It then describes XML schema in detail, explaining that it defines the structure of an XML document. Key aspects of XML schema covered include elements, attributes, simple vs complex types, and restrictions.
The document discusses XML schemas, explaining that they define elements, attributes, and data types that can be used in XML documents. It covers creating simple and complex elements, declaring data types, and grouping elements using sequences, groups, and choices. The document also provides examples of how to define attributes and create user-defined data types in an XML schema.
This document provides an overview of XML DTD and Schema. It defines key terms like well-formed, valid, DTD and describes how a DTD is used to define element types and attributes. It also explains different DTD rules like EMPTY, ANY, #PCDATA. The document then covers XML Schema elements, data types, and how to declare elements, attributes and complex/simple types in a schema.
The document discusses the fundamentals of XML including XML document structure, elements, attributes, character data, the XML declaration, document type declaration, and XML content model. It also covers XML rules for structure, namespaces, and the differences between well-formed and valid XML documents.
The document describes what an XML Schema is and its key components and purposes. It defines an XML Schema as describing the structure of an XML document, and that it can define elements, attributes, element sequence and number, data types, and default values. It compares XML Schemas to DTDs, noting schemas are more powerful and support namespaces and data types. The document provides examples of using XML Schema to define simple and complex elements, attributes, and restrictions.
XML is a markup language similar to HTML but designed for carrying data rather than displaying it. It allows users to define their own elements and tags. XML documents use tags to describe and structure information and can be displayed using CSS or transformed using XSL. Key benefits of XML include its ability to describe hierarchical data, separate data from presentation, and enable data sharing across different systems.
XML Schema defines rules for encoding documents in a machine-readable format. It allows data exchange between systems independently of programming languages. XML Schema defines elements, attributes, and data types to structure XML documents. It provides more data typing capabilities than DTDs. Namespaces are used to avoid element name conflicts between different XML vocabularies. User-defined types can restrict built-in types or create new complex types from simple types to structure application-specific data.
This document provides an overview of XML schemas, including:
- The objectives of learning about XML schemas, which include explaining schemas, advantages over DTDs, defining elements, creating simple and complex types, applying restrictions, and creating reusable schemas.
- An introduction to XML schemas, including how schemas address issues with large DTDs and an example of creating a simple schema.
- The advantages of XML schemas over DTDs, such as supporting data types, defining element order, and extending schemas.
- How to define elements and attributes in a schema, including using built-in data types.
- The differences between simple and complex element types and examples of each.
The document discusses schemas and their purpose in specifying the structure and constraints of an XML document. It provides examples of things that cannot be done with DTDs but can be done with schemas, such as constraining text values. The document outlines the components of a schema, including elements, attributes, and data types. It provides an example of defining a schema in IE5 and the steps involved, including declaring element types, specifying content models, and using data types.
XML Schema provides a way to formally define and validate the structure and content of XML documents. It allows defining elements, attributes, and data types, as well as restrictions like length, pattern, and value ranges. DTD is more limited and cannot validate data types. XML Schema is written in XML syntax, uses XML namespaces, and provides stronger typing capabilities compared to DTD. It allows defining simple and complex element types, attributes, and restrictions to precisely describe the expected structure and values within XML documents.
The document defines the XML Information Set, which provides a consistent set of definitions for the information contained in a well-formed XML document. It describes 11 types of information items that comprise the information set, including document, element, attribute, and others. Each information item has defined properties. The information set represents the same information as the tree structure of an XML document.
The document provides an overview of XML basics including XML concepts, technologies for describing XML like DTD and XML Schema, and how to parse XML in Java using SAX, DOM, and JAXP. It introduces XML, elements, attributes, namespaces, validation with DTD and XML Schema. It describes parsing XML with SAX, which is event-driven and does not store the parsed data, and DOM, which parses the entire XML into an in-memory tree structure that allows random access.
This document provides an overview of XML (eXtensible Markup Language). It defines XML as a meta markup language for representing text documents and data. XML allows users to define their own tags to represent different types of information. The document discusses how XML documents form a tree structure with a root element and nested elements. It also covers XML syntax rules and parsing methods like SAX and DOM that can be used to read and manipulate XML documents.
This document provides an overview of XML including:
- XML is a meta markup language used to define text document structures and represent textual data.
- XML examples show how it can be used to easily represent structured data for both human and machine readability.
- XML schemas are used to define the rules and structure for XML documents and provide data type definitions.
- XML documents form a tree structure with a single root element and hierarchical branching.
This document discusses XML and semi-structured data models. Some key points:
- XML is a data model based on graphs that can represent both regular and irregular data. It allows for self-describing, flexible typing, and serialization of data.
- Semi-structured data models like XML and JSON allow for missing attributes, duplicates, and variations in structure compared to relational models.
- XML represents semi-structured data by annotating each data item with its description. This serialization allows the data to be transmitted while maintaining interoperability.
This document provides an overview of XML including:
- XML stands for Extensible Markup Language and is used to carry data, not display it. Tags are user-defined.
- An XML example shows a simple note with predefined tags.
- XML schemas define valid elements, attributes, structure and data types for XML documents.
- XML documents form a tree structure with elements nested within a root element. Syntax rules ensure documents are well-formed.
- XML parsers like SAX and DOM are used to read and build a model of an XML document programmatically.
This document discusses XML and semi-structured data models. Some key points:
- XML represents semi-structured data using a graph-based data model where data comes with its own description. This is different from relational models where schema and content are separate.
- Semi-structured data allows for flexible typing, duplicates, missing elements, etc. making it suitable for irregular data.
- XML serializes the graph into a byte stream for transmission. This provides interoperability but wastes space.
- XML documents can be represented as trees in memory using DOM. The serialized XML form maps to this tree representation.
XML stands for eXtensible Markup Language. It is used to store and transport structured data. XML allows users to define their own tags for marking up data with a tree structure, with one root element. Key features of XML include being extensible, using markup tags, and describing data in a human- and computer-friendly format. XML is commonly used for transporting data between systems and long-term storage of structured data.
An XML schema defines the structure and elements of an XML document. It is an XML-based alternative to DTDs that allows defining element types, attributes, data types, defaults and restrictions. Schemas support namespaces, data types, extensibility and are written in XML, allowing the use of XML tools. Complex elements can contain child elements, text or both.
The document discusses XML document structure and XML schema. It provides information on the key components of an XML document including the XML declaration, document type declaration, element data, attribute data, and character data. It then describes XML schema in detail, explaining that it defines the structure of an XML document. Key aspects of XML schema covered include elements, attributes, simple vs complex types, and restrictions.
The document discusses XML schemas, explaining that they define elements, attributes, and data types that can be used in XML documents. It covers creating simple and complex elements, declaring data types, and grouping elements using sequences, groups, and choices. The document also provides examples of how to define attributes and create user-defined data types in an XML schema.
This document provides an overview of XML DTD and Schema. It defines key terms like well-formed, valid, DTD and describes how a DTD is used to define element types and attributes. It also explains different DTD rules like EMPTY, ANY, #PCDATA. The document then covers XML Schema elements, data types, and how to declare elements, attributes and complex/simple types in a schema.
The document discusses the fundamentals of XML including XML document structure, elements, attributes, character data, the XML declaration, document type declaration, and XML content model. It also covers XML rules for structure, namespaces, and the differences between well-formed and valid XML documents.
The document describes what an XML Schema is and its key components and purposes. It defines an XML Schema as describing the structure of an XML document, and that it can define elements, attributes, element sequence and number, data types, and default values. It compares XML Schemas to DTDs, noting schemas are more powerful and support namespaces and data types. The document provides examples of using XML Schema to define simple and complex elements, attributes, and restrictions.
XML is a markup language similar to HTML but designed for carrying data rather than displaying it. It allows users to define their own elements and tags. XML documents use tags to describe and structure information and can be displayed using CSS or transformed using XSL. Key benefits of XML include its ability to describe hierarchical data, separate data from presentation, and enable data sharing across different systems.
XML Schema defines rules for encoding documents in a machine-readable format. It allows data exchange between systems independently of programming languages. XML Schema defines elements, attributes, and data types to structure XML documents. It provides more data typing capabilities than DTDs. Namespaces are used to avoid element name conflicts between different XML vocabularies. User-defined types can restrict built-in types or create new complex types from simple types to structure application-specific data.
This document provides an overview of XML schemas, including:
- The objectives of learning about XML schemas, which include explaining schemas, advantages over DTDs, defining elements, creating simple and complex types, applying restrictions, and creating reusable schemas.
- An introduction to XML schemas, including how schemas address issues with large DTDs and an example of creating a simple schema.
- The advantages of XML schemas over DTDs, such as supporting data types, defining element order, and extending schemas.
- How to define elements and attributes in a schema, including using built-in data types.
- The differences between simple and complex element types and examples of each.
The document discusses schemas and their purpose in specifying the structure and constraints of an XML document. It provides examples of things that cannot be done with DTDs but can be done with schemas, such as constraining text values. The document outlines the components of a schema, including elements, attributes, and data types. It provides an example of defining a schema in IE5 and the steps involved, including declaring element types, specifying content models, and using data types.
XML Schema provides a way to formally define and validate the structure and content of XML documents. It allows defining elements, attributes, and data types, as well as restrictions like length, pattern, and value ranges. DTD is more limited and cannot validate data types. XML Schema is written in XML syntax, uses XML namespaces, and provides stronger typing capabilities compared to DTD. It allows defining simple and complex element types, attributes, and restrictions to precisely describe the expected structure and values within XML documents.
The document defines the XML Information Set, which provides a consistent set of definitions for the information contained in a well-formed XML document. It describes 11 types of information items that comprise the information set, including document, element, attribute, and others. Each information item has defined properties. The information set represents the same information as the tree structure of an XML document.
The document provides an overview of XML basics including XML concepts, technologies for describing XML like DTD and XML Schema, and how to parse XML in Java using SAX, DOM, and JAXP. It introduces XML, elements, attributes, namespaces, validation with DTD and XML Schema. It describes parsing XML with SAX, which is event-driven and does not store the parsed data, and DOM, which parses the entire XML into an in-memory tree structure that allows random access.
This document provides an overview of XML (eXtensible Markup Language). It defines XML as a meta markup language for representing text documents and data. XML allows users to define their own tags to represent different types of information. The document discusses how XML documents form a tree structure with a root element and nested elements. It also covers XML syntax rules and parsing methods like SAX and DOM that can be used to read and manipulate XML documents.
This document provides an overview of XML including:
- XML is a meta markup language used to define text document structures and represent textual data.
- XML examples show how it can be used to easily represent structured data for both human and machine readability.
- XML schemas are used to define the rules and structure for XML documents and provide data type definitions.
- XML documents form a tree structure with a single root element and hierarchical branching.
This document discusses XML and semi-structured data models. Some key points:
- XML is a data model based on graphs that can represent both regular and irregular data. It allows for self-describing, flexible typing, and serialization of data.
- Semi-structured data models like XML and JSON allow for missing attributes, duplicates, and variations in structure compared to relational models.
- XML represents semi-structured data by annotating each data item with its description. This serialization allows the data to be transmitted while maintaining interoperability.
This document provides an overview of XML including:
- XML stands for Extensible Markup Language and is used to carry data, not display it. Tags are user-defined.
- An XML example shows a simple note with predefined tags.
- XML schemas define valid elements, attributes, structure and data types for XML documents.
- XML documents form a tree structure with elements nested within a root element. Syntax rules ensure documents are well-formed.
- XML parsers like SAX and DOM are used to read and build a model of an XML document programmatically.
This document discusses XML and semi-structured data models. Some key points:
- XML represents semi-structured data using a graph-based data model where data comes with its own description. This is different from relational models where schema and content are separate.
- Semi-structured data allows for flexible typing, duplicates, missing elements, etc. making it suitable for irregular data.
- XML serializes the graph into a byte stream for transmission. This provides interoperability but wastes space.
- XML documents can be represented as trees in memory using DOM. The serialized XML form maps to this tree representation.
XML stands for eXtensible Markup Language. It is used to store and transport structured data. XML allows users to define their own tags for marking up data with a tree structure, with one root element. Key features of XML include being extensible, using markup tags, and describing data in a human- and computer-friendly format. XML is commonly used for transporting data between systems and long-term storage of structured data.
XML is a markup language used for storing and transferring data. It allows data to be shared across different systems even if they have different hardware/software. XML uses tags to structure the data and is readable by both humans and machines. XML documents can be validated using DTDs or XML schemas to ensure they follow the defined structure and syntax rules. When parsing an XML document, DOM reads the entire document into memory while SAX reads nodes sequentially without storing the entire document in memory.
This document provides an introduction to XML, including:
- XML stands for eXtensible Markup Language and allows users to define their own tags to provide structure and meaning to data.
- XML documents use elements with start and end tags to organize content in a hierarchical, tree-like structure. Elements can contain text or other nested elements.
- Attributes within start tags provide additional metadata about elements. Well-formed XML documents must follow syntax rules to be valid.
This document discusses XML principles for data integration and exchange. It provides an overview of XML, including its data model, schema languages like DTDs and XML Schema, and querying languages like XPath and XQuery. XML allows hierarchical and semi-structured data to be encoded and exchanged in a standard format. Schema languages provide structure and typing, while query languages like XPath allow selecting subsets of XML documents.
XML is a markup language similar to HTML but designed for structured data rather than web pages. It uses tags to define elements and attributes, and can be validated using DTDs or XML schemas. XML documents can be transformed and queried using XSLT and XPath respectively. SAX is an event-based parser that reads XML sequentially while DOM loads the entire document into memory for random access.
This document provides an overview of XML (Extensible Markup Language). It defines XML, notes that it is derived from SGML and is simpler. It describes the structure of XML data including tags, elements, attributes, and proper nesting. It discusses XML schemas like DTDs which constrain the structure and storage of XML data in databases or files. It also summarizes XML applications like data exchange, querying XML with XPath and XQuery, and the advantages of XML like being human-readable and supporting integration.
- XML (eXtensible Markup Language) is a markup language that is designed to store and transport data. It was released in the late 1990s and became a W3C recommendation in 1998.
- XML is not meant to display data like HTML, but rather to carry data. It is designed to be self-descriptive, platform independent, and language independent. Tags are defined by the user rather than being predefined.
- A markup language uses tags to highlight or underline parts of a document. Modern markup languages like XML use tags to replace highlighting and underlining.
This document provides an introduction to XML, including an overview of its components and structure. It discusses the XML prolog, tags, attributes, entities, comments, and processing instructions that make up an XML document. It also describes the XML document type definition (DTD) that defines the allowable tags and syntax of an XML language. Key points covered include XML being extensible and separating content from presentation, as well as examples of basic XML code structure and syntax rules.
XML Introduction,Syntax of XML,Well formed XML Documents,XML Document Structure,Document Type Definitions,XML Namespace,XML Schemas,DOM(Document Object Model)
XML (eXtensible Markup Language) is a meta markup language that allows defining custom markup languages. It became a W3C recommendation in 1998 and uses a tag-based syntax similar to HTML. XML allows defining tags to represent different types of text documents and data in a well-structured, machine-readable format. It is not a replacement for other technologies but can be converted to and used with many formats and languages.
Data interchange integration, HTML XML Biological XML DTDAnushaMahmood
Data interchange integration. Data interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML
The document describes XML Schema, including what it is, how it defines the structure of an XML document, and how it compares to DTDs. Specifically:
- An XML Schema defines the legal structure of an XML document by defining elements, attributes, data types, and more. This is similar to a blueprint for the document.
- Schemas are more powerful than DTDs as they are extensible, have more features like data typing, and are written in XML syntax.
- A key advantage of schemas is they enforce data typing, unlike DTDs which treat all content as strings. This allows for easier validation and use of data.
The document discusses REST (REpresentational State Transfer), an architectural style based on HTTP. It defines REST as using resources and HTTP verbs to manipulate representations of those resources. The agenda outlines understanding REST and why it's important, then building an API with ASP.NET Web API. Key points are that REST uses standardized HTTP requests and responses, resources are accessed via URLs, and verbs like GET, POST, PUT and DELETE are used to perform actions on resources and return representations of those resources.
The document provides an overview of business analytics (BA) including its history, types, examples, challenges, and relationship to data mining. BA involves exploring past business performance data to gain insights and guide planning. It can focus on specific business segments. Types of BA include reporting, affinity grouping, clustering, and predictive analytics. Challenges to BA include acquiring high quality data and rapidly processing large volumes of data. Data mining is an important part of BA, helping to deal with large datasets and specific analysis tasks like clustering and prediction.
Decision trees are a supervised learning technique that can be used for classification problems. They work by recursively splitting a dataset based on the values of predictor variables, with the goal of maximizing purity in the descendant nodes. The document discusses how decision trees are constructed using a greedy approach that selects the predictor variable resulting in the largest information gain at each split. It provides an example of constructing a decision tree on a dataset about factors predicting support for Hillary Clinton.
The document discusses data mining and knowledge discovery from large data sets. It begins by defining the hierarchy from data to wisdom. It then discusses the growth of data from terabytes to petabytes and major sources of data. Key points made include that while data is growing exponentially, most data is not analyzed due to skills shortage. The document defines data mining as the non-trivial extraction of implicit and potentially useful knowledge from large data sets. It outlines the knowledge discovery process and types of knowledge discovery. Finally, it provides examples of data mining applications.
The document discusses memory hierarchy and caching techniques. It begins by explaining the need for a memory hierarchy due to differing access times of memory technologies like SRAM, DRAM, and disk. It then covers topics like direct mapped caches, set associative caches, cache hits and misses, reducing miss penalties through multiple cache levels, and analyzing cache performance. Key goals in memory hierarchy design are reducing miss rates through techniques like larger blocks, higher associativity, and reducing miss penalties with lower level caches.
The document discusses non-uniform cache architectures (NUCA), cache coherence, and different implementations of directories in multicore systems. It describes NUCA designs that map data to banks based on distance from the controller to exploit non-uniform access times. Cache coherence is maintained using directory-based protocols that track copies of cache blocks. Directories can be implemented off-chip in DRAM or on-chip using duplicate tag stores or distributing the directory among cache banks. Examples of systems like SGI Origin2000 and Tilera Tile64 that use these techniques are also outlined.
This document discusses how Analysis Services caching works and provides strategies for warming the Storage Engine cache and Formula Engine cache. It explains that the Storage Engine handles data retrieval from disk while the Formula Engine determines which data is needed for queries. Caching can improve performance but requires understanding when Analysis Services is unable to cache data. The document recommends using the CREATE CACHE statement and running regular queries to pre-populate the caches with commonly used data. Memory usage must also be considered when warming the caches.
Optimizing shared caches in chip multiprocessorsHarry Potter
Chip multiprocessors, which place multiple processors on a single chip, have become common in modern processors. There are different approaches to managing caches in chip multiprocessors, including private caches for each processor or shared caches. The optimal approach balances factors like interconnect traffic, duplication of data, load balancing, and cache hit rates.
The document proposes optimizing DRAM caches for latency rather than hit rate. It summarizes previous work on DRAM caches like Loh-Hill Cache that treated DRAM cache similarly to SRAM cache. This led to high latency and low bandwidth utilization.
The document introduces the Alloy Cache design which avoids tag serialization to reduce latency. It also proposes a Memory Access Predictor to selectively use parallel or serial access models for low latency and bandwidth. Simulation results show Alloy Cache with a predictor outperforms SRAM-tag designs. The design provides benefits with small impact on hit rate even for large caches.
This document discusses caching and the SMRR mechanism introduced by Intel to prevent cache poisoning attacks on SMRAM. It explains that:
1) Memory caching types like write-back can allow data in CPU caches to be modified without writing to physical memory.
2) Early researchers exploited this to poison SMRAM caches and gain unauthorized access to protected memory.
3) Intel addressed this with the System Management Range Register (SMRR) that defines a restricted memory range for SMRAM and prevents caching of that memory when not in SMM.
The document discusses abstract data types (ADTs), specifically queues. It defines a queue as a linear collection where elements are added to one end and removed from the other end, following a first-in, first-out (FIFO) approach. The key queue operations are enqueue, which adds an element, and dequeue, which removes the element that has been in the queue longest. Queues can be implemented using arrays or linked lists. Array implementations use head and tail pointers to track the start and end of the queue.
This document discusses abstract data types (ADTs) and their implementation in various programming languages. It covers the key concepts of ADTs including data abstraction, encapsulation, information hiding, and defining the public interface separately from the private implementation. It provides examples of ADTs implemented using modules in Modula-2, packages in Ada, classes in C++, generics in Java and C#, and classes in Ruby. Parameterized and encapsulation constructs are also discussed as techniques for implementing and organizing ADTs.
This document discusses the key concepts of object-oriented programming including abstraction, encapsulation, classes and objects. It defines abstraction as focusing on the essential characteristics of an object and hiding unnecessary details. Encapsulation hides the implementation of an object's methods and data. A class combines abstraction and encapsulation, defining the data attributes and methods while hiding implementation details. Objects are instantiations of classes that come to life through constructors and die through destructors.
The document discusses abstraction, which is a fundamental concept of object-oriented design. Abstraction involves focusing on essential characteristics of an object that distinguish it from others, separating an object's interface from its implementation. There are different types of abstractions from most to least useful: entity, action, virtual machine, and coincidental. Other key concepts discussed include contracts, invariants, exceptions, static and dynamic properties, and passive vs. active objects.
This document discusses various programming paradigms and concurrency concepts in Java. It covers single process and multi-process programming, as well as multi-core and multi-threaded programming. Key concepts discussed include processes, threads, synchronization, deadlocks, and high-level concurrency objects like locks, executors, and concurrent collections. The document provides examples of implementing threads using subclasses and interfaces, and communicating between threads using interrupts, joins, and guarded blocks.
The document discusses several Java concepts including packages, access modifiers, encapsulation, getters and setters, and anonymous classes. It provides the following key points:
1. Packages provide a mechanism for grouping related types together in a unique namespace and common conventions for naming packages include using the domain name reversed plus the program name.
2. Access modifiers like private, default, protected, and public control the visibility of declarations. Private is most restrictive while public allows access from anywhere.
3. Encapsulation is the idea that an object should not reveal details it does not intend to support. Getters and setters are a common example of encapsulation in Java.
4. Anonymous classes provide a
Abstract classes and interfaces allow for abstraction and polymorphism in object-oriented design. Abstract classes can contain both abstract and concrete methods, while interfaces only contain abstract methods. Abstract classes are used to provide a common definition for subclasses through inheritance, while interfaces define a contract for implementing classes to follow. Both support polymorphism by allowing subclasses/implementing classes to determine the specific implementation for abstract methods.
Object-oriented analysis and design (OOAD) emphasizes investigating requirements rather than solutions, and conceptual solutions that fulfill requirements rather than implementations. OOAD focuses on identifying domain concepts and defining software objects and how they collaborate. The Unified Process includes inception, elaboration, construction, and transition phases with iterations and milestones leading to final product releases.
This document provides an overview of APIs, including what they are, why they are useful, common data formats like JSON and XML, RESTful API design principles, and how to consume and create APIs. It discusses topics like HTTP verbs, resources and URIs, caching, authentication, and error handling for APIs. The document also provides examples of consuming APIs through tools like Postman and creating a simple API in Node.js.
This document discusses Easyrec, an open source recommender engine that provides recommendations based on user actions like views, purchases, and ratings of products. It has an API that allows integrating recommendations into a website by passing a user ID and token. The API can take in user data and return recommendations based on what other similar users viewed or purchased. While Easyrec is easy to use, it has limitations in flexibility and only supports collaborative filtering recommendations.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Dive into the realm of operating systems (OS) with Pravash Chandra Das, a seasoned Digital Forensic Analyst, as your guide. 🚀 This comprehensive presentation illuminates the core concepts, types, and evolution of OS, essential for understanding modern computing landscapes.
Beginning with the foundational definition, Das clarifies the pivotal role of OS as system software orchestrating hardware resources, software applications, and user interactions. Through succinct descriptions, he delineates the diverse types of OS, from single-user, single-task environments like early MS-DOS iterations, to multi-user, multi-tasking systems exemplified by modern Linux distributions.
Crucial components like the kernel and shell are dissected, highlighting their indispensable functions in resource management and user interface interaction. Das elucidates how the kernel acts as the central nervous system, orchestrating process scheduling, memory allocation, and device management. Meanwhile, the shell serves as the gateway for user commands, bridging the gap between human input and machine execution. 💻
The narrative then shifts to a captivating exploration of prominent desktop OSs, Windows, macOS, and Linux. Windows, with its globally ubiquitous presence and user-friendly interface, emerges as a cornerstone in personal computing history. macOS, lauded for its sleek design and seamless integration with Apple's ecosystem, stands as a beacon of stability and creativity. Linux, an open-source marvel, offers unparalleled flexibility and security, revolutionizing the computing landscape. 🖥️
Moving to the realm of mobile devices, Das unravels the dominance of Android and iOS. Android's open-source ethos fosters a vibrant ecosystem of customization and innovation, while iOS boasts a seamless user experience and robust security infrastructure. Meanwhile, discontinued platforms like Symbian and Palm OS evoke nostalgia for their pioneering roles in the smartphone revolution.
The journey concludes with a reflection on the ever-evolving landscape of OS, underscored by the emergence of real-time operating systems (RTOS) and the persistent quest for innovation and efficiency. As technology continues to shape our world, understanding the foundations and evolution of operating systems remains paramount. Join Pravash Chandra Das on this illuminating journey through the heart of computing. 🌟
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on automated letter generation for Bonterra Impact Management using Google Workspace or Microsoft 365.
Interested in deploying letter generation automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
2. Facts about the Web
• Growing fast
• Popular
• Semi-structured data
– Data is presented for ‘human’-processing
– Data is often ‘self-describing’ (including name
of attributes within the data fields)
5. Vision for Web data
• Object-like – it can be represented as a
collection of objects of the form described
by the conceptual data model
• Schemaless – not conformed to any type
structure
• Self-describing – necessary for machine
readable data
7. XML – Overview
• Simplifying the data exchange between
software agents
• Popular thanks to the involvement of W3C
(World Wide Web Consortium –
independent organization
www.w3c.org)
8. XML – Characteristics
• Simple, open, widely accepted
• HTML-like (tags) but extensible by users
(no fixed set of tags)
• No predefined semantics for the tags
(because XML is developed not for the
displaying purpose)
• Semantics is defined by stylesheet (later)
10. XML Documents
• User-defined tags:
<tag> info </tag>
• Properly nested:<tag1>.. <tag2>…</tag1></tag2>
is not valid
• Root element: an element contains all other
elements
• Processing instructions <?command ….?>
• Comments <!--- comment --- >
• CDATA type
• DTD
11. XML element
• Begin with a opening tag of the form
<XML_element_name>
• End with a closing tag
</XML_element_name>
• The text between the beginning tag and the
closing tag is called the content of the
element
12. XML element
<PersonList Type=“Student”>
<Student StudentID=“123”>
<Name> <First>“XYZ”</First>
<Last>“PQR”</Last> </Name>
<CrsTaken CrsName=“CS582” Grade=“A”/>
</Student>
…
</PersonList>
Attribute Value of the attribute
13. Relationship between XML
elements
• Child-parent relationship
– Elements nested directly in an element are the
children of this element (Student is a child of
PersonList, Name is a child of Student, etc.)
• Ancestor/descendant relationship: important
for querying XML documents (extending
the child/parent relationship)
14. XML elements & Database Objects
• XML elements can be converted into
objects by
– considering the tag’s names of the children as
attributes of the objects
– Recursive process
<Student StudentID=“123”>
<Name> “XYZ PQR” </Name>
<CrsTaken>
<CrsName>CS582</CrsName>
<Grade>“A”</Grade> </CrsTaken>
</Student>
(#099,
Name: “XYZ PQR”
CrsTaken:
<CrsName>“CS582”</CrsName>
<Grade>“A”</Grade>
)
Partially converted object
15. XML elements & Database Objects
• Differences: Additional text within XML
elements
<Student StudentID=“123”>
<Name> “XYZ PQR” </Name>
has taken the following course
<CrsTaken>
Database management system II
<CrsName>CS582</CrsName>
with the grade
<Grade>“A”</Grade> </CrsTaken>
</Student>
16. XML elements & Database Objects
• Differences: XML elements are orderd
<CrsTaken>
<CrsName>“CS582”</CrsName>
<Grade>“A”</Grade>
</CrsTaken>
<CrsTaken>
<Grade>“A”</Grade>
<CrsName>“CS582”</CrsName>
</CrsTaken>
{#901, Grade: “A”, CrsName: “CS582”}
17. XML Attributes
• Can occur within an element (arbitrary many
attributes, order unimportant, same attribute only
one)
• Allow a more concise representation
• Could be replaced by elements
• Less powerful than elements (only string value, no
children)
• Can be declared to have unique value, good for
integrity constraint enforcement (next slide)
18. XML Attributes
• Can be declared to be the type of ID,
IDREF, or IDREFS
• ID: unique value throughout the document
• IDREF: refer to a valid ID declared in the
same document
• IDREFS: space-separated list of strings of
references to valid IDs
19. A report document with cross-references.
(continued on next slide)
ID
IDREF
21. Well-formed XML Document
• It has a root element
• Every opening tag is followed by a
matching closing tag, elements are properly
nested
• Any attribute can occur at most once in a
given opening tag, its value must be
provided, quoted
22. So far
• Why XML?
• XML elements
• XML attributes
• Well-formed XML document
24. Namespaces
• For avoiding naming conflicts
• Name of every XML tag must have two parts:
– namespace: a string in the form of a uniform resource
identifier (URI) or a uniform resource locator (URL)
– local name: as regular XML tag but cannot contain ‘:’
• Structure of an XML tag:
namespace:local_name
25. Namespaces
• An XML namespace is a collection of names,
identified by a URI reference, which are used in
XML documents as element types and attribute
names. XML namespaces differ from the
"namespaces" conventionally used in computing
disciplines in that the XML version has internal
structure and is not, mathematically speaking, a
set.
Source: www.w3c.org
26. Uniform Resource Identifier
• URI references which identify namespaces are
considered identical when they are exactly the
same character-for-character. Note that URI
references which are not identical in this sense
may in fact be functionally equivalent. Examples
include URI references which differ only in case,
or which are in external entities which have
different effective base URIs.
Source: www.w3c.org
27. Namespace - Example
<item xmlns=“http://www.acmeinc.com/jp#supplies”
xmlns:toy=“http://www.acmeinc.com/jp#toys”>
<name> backpack </name?
<feature> <toy:item>
<toy:name>cyberpet</toy:name>
</toy:item> </feature>
</item>
Two namespaces are used: the two URLs
xmlns = defined the default namespace,
xmlns:toy = defined the second namespace
28. Namespace declaration
• Defined by
xml : prefix = declaration
• Tags belonging to a namespace should be
prefixed with “prefix:”
• Tags belonging to the default namespace do
not need to have the prefix
• Have its own scope
30. Document Type Definition
• Set of rules (by the user) for structuring an XML
document
• Can be part of the document itself, or can be
specified via a URL where the DTD can be found
• A document that conforms to a DTD is said to be
valid
• Viewed as a grammar that specifies a legal XML
document, based on the tags used in the document
31. DTD Components
• A name – must coincide with the tag of the root
element of the document conforming to the DTD
• A set of ELEMENTs – one ELEMENT for each
allowed tag, including the root tag
• ATTLIST statements – specifies the allow
attributes and their type for each tag
• *, +, ? – like in grammar definition
– * : zero or finitely many number
– + : at least one
– ? : zero or one
32. DTD Components – Element
<!ELEMENT Name definition>
type, element list etc.
Name of the element
definition can be: EMPTY, (#PCDATA), or element
list (e1,e2,…,en) where the list (e1,e2,…,en) can
be shortened using grammar like notation
33. DTD Components – Element
<!ELEMENT Name(e1,…,en)>
nth
– element
1st
– element
Name of the element
<!ELEMENT PersonList (Title,Contents)>
<!ELEMENT Contents(Person *)>
34. DTD Components – Element
<!ELEMENT Name EMPTY>
no child for the element Name
<!ELEMENT Name (#PCDATA)>
value of Name is a character string
<!ELEMENT Title EMPTY>
<!ELEMENT Id (#PCDATA)>
35. DTD Components – Attribute List
<!ATTLIST EName Att {Type} Property>
where
- Ename – name of an element defined in the DTD
- Att – attribute name allowed to occur in the
opening tag of Ename
- {type} – might/might not be there; specify the type
of the attribute (CDATA, ID, IDREF, IDREFS)
- Property – either #REQUIRED or #IMPLIED
37. DTD as Data Definition Language?
• Can specify exactly what is allowed on the
document
• XML elements can be converted into
objects
• Can specify integrity constraints on the
elements
• Is is good enough?
38. Inadequacy of DTP as a
Data Definition Language
• Goal of XML: for specifying documents
that can be exchanged and automatically
processed by software agents
• DTD provides the possibility of querying
Web documents but has many limitations
(next slide)
39. Inadequacy of DTP as a
Data Definition Language
• Designed without namespace in mind
• Syntax is very different than that of XML
• Limited basic types
• Limited means for expressing data consistency
constrains
• Enforcing referential integrity for attributes but
not elements
• XML data is ordered; not database data
• Element definitions are global to the entire
document
41. XML Schema – Main Features
• Same syntax as XML
• Integration with the namespace mechanism
(different schemas can be imported from different
namespaces and integrated into one)
• Built-in types (similar to SQL)
• Mechanism for defining complex types from
simple types
• Support keys and referential integrity constraints
• Better mechanism for specifying documents where
the order of element types does not matter
42. XML Document and Schema
A document conforms to a schema is called
an instance of this schema and is said to be
schema valid.
XML processor does not check for schema
validity
43. XML Schema and Namespaces
• Describes the structure of other XML
documents
• Begins with a declaration of the namespaces
to be used in the schema, including
– http://www.w3.org/2001/XMLSchema
– http://www.w3.org/2001/XMLSchema-instance
– targetnamespace (user-defined namespace)
44. http://www.w3.org/2001/XMLSchema
• Identifies the names of tags and attributes
used in a schema (names defined by the
XML Schema Specification, e.g., schema,
attribute, element)
• Understood by all schema aware XML
processor
• These tags and attributes describe structural
properties of documents in general
46. http://www.w3.org/2001/XMLSchema-instance
• Used in conjunction with the XMLSchema
namespace
• Identifies some other special names which
are defined in the XML Schema
Specification but are used in the instance
documents
48. Target namespace
• identifies the set of names defined by a
particular schema document
• is an attribute of the schema element
(targetNamespace) whose value is the
name space containing all the names defines
by the schema
53. Built-in Datatypes (cont.)
• Primitive Datatypes
– gDay
– gMonth
– hexBinary
– base64Binary
– anyURI
– QName
– NOTATION
• Atomic, built-in
– format: ---DD (note the 3 dashes)
– format: --MM--
– a hex string
– a base64 string
– http://www.xfront.com
– a namespace qualified name
– a NOTATION from the XML spec
54. Built-in Datatypes (cont.)
• Derived types
– normalizedString
– token
– language
– IDREFS
– ENTITIES
– NMTOKEN
– NMTOKENS
– Name
– NCName
– ID
– IDREF
– ENTITY
– integer
– nonPositiveInteger
• Subtype of primitive datatype
– A string without tabs, line feeds, or carriage returns
– String w/o tabs, l/f, leading/trailing spaces, consecutive spaces
– any valid xml:lang value, e.g., EN, FR, ...
– must be used only with attributes
– must be used only with attributes
– must be used only with attributes
– must be used only with attributes
– part (no namespace qualifier)
– must be used only with attributes
– must be used only with attributes
– must be used only with attributes
– 456
– negative infinity to 0
55. Built-in Datatypes (cont.)
• Derived types
– negativeInteger
– long
– int
– short
– byte
– nonNegativeInteger
– unsignedLong
– unsignedInt
– unsignedShort
– unsignedByte
– positiveInteger
• Subtype of primitive datatype
– negative infinity to -1
– -9223372036854775808 to 9223372036854775808
– -2147483648 to 2147483647
– -32768 to 32767
– -127 to 128
– 0 to infinity
– 0 to 18446744073709551615
– 0 to 4294967295
– 0 to 65535
– 0 to 255
– 1 to infinity
Note: the following types can only be used with attributes (which we will discuss later):
ID, IDREF, IDREFS, NMTOKEN, NMTOKENS, ENTITY, and ENTITIES.
56. Simple types
• Primitive types (see built-in)
• Type constructors:
– List: <simpleType name=“myIdrefs”>
<list itemType=“IDREF”/>
</simpleType>
– Union: <simpleType name=“myIdrefs”>
<union memberTypes=“phone7digits phone10digits”/>
</simpleType>
– Restriction: <simpleType name=“phone7digits”>
<restriction base=“integer”>
<minInclusive value=“1000000”/>
<maxInclusive value=“9999999”/>
</simpleType>
Name of Type
Possible values
60. Type Declaration for Elements &Attributes
• Type declaration for simple elements and
attributes
<element name=“CrsName” type=“string”/>
Specify that CrsName has value of type string
61. Type Declaration for Elements &Attributes
• Type declaration for simple elements and
attributes
<element name=“status” type=“adm:studentStatus”/>
Specify that status has value of type
studentStatus that will be defined in the
document
62. Example for the type studentStatus
<simpleType name=“studentStatus”>
<restriction base=“string”>
<enumeration value=“U1”/>
<enumeration value=“U2”/>
…
<enumeration value=“G5”/>
</restriction>
</simpleType>
63. Complex Types
• Use to specify the type of elements with
children or attributes
• Opening tag: complexType
• Can be associated to a name in the same
way a simple type is associated to a name
64. Complex Types
• Special Case: element with simple content and some
attributes/no child with some attributes
<complexType name=“CourseTakenType”>
<attribute name=“CrsCode” type=“adm:courseRef”/>
<attribute name=“Semester” type=“string”/>
</complexType>
65. Complex Types
• Combining elements into group -- <all>
<complexType name=“AddressType”>
<all>
<element name=“StreetName” type=“string”>
<element name=“StreetNumber” type=“string”>
<element name=“City” type=“string”>
</all>
</complexType>
The three elements can appear in arbitrary order! (NOTE:
<all> requires special care – it must occur after
<complexType> - see book for invalid situation)
66. Complex Types
• Combining elements into group – <sequence>
<complexType name=“NameType”>
<sequence>
<element name=“First” type=“string”>
<element name=“Last” type=“string”>
</sequence>
</complexType>
The two elements must appear in order
67. Complex Types
• Combining elements into group – <choice>
<complexType name=“addressType”>
<choice>
<element name=“POBox” type=“string”>
<sequence><element name=“Name” type=“string”>
<element name=“Number” type=“string”>
</sequence>
</choice> ….
</complexType>
Either POBox or Name and Number is needed
68. Complex Types
• Can also refer to local type like – allowing
different elements to have children with the
same name (next slides)
[studentType – courseType] both have the
“Name” element
[studentType – personNameType] both have
the “Name” element
71. Complex Types
• Importing schema: like include but does not
require schemaLocation
instead of
<include schemaLocation=“http://xyz.edu/CoursTypes”/>
we can use
<import namespace=“http://xyz.edu/CoursTypes”/>
72. Complex Types
• Deriving new complex types by extension and
restriction (for modifying imported schema)
….
<import namespace=“http://xyz.edu/CoursTypes”/>
…..
<complexType name=“courseType”>
<complexContent> <extension base=“..”>
<element name=“syllabus” type=“string”/>
</extension>
</complexContent></complexType>
The type that is going
to be extended
76. Integrity Constraints
• ID, IDREF, IDREFS can still be used
• Specified using the attribute xpath (next)
• XML keys, foreign keys
• Keys are associated with collection of
objects not with types
77. Integrity Constraints - Keys
<key name=“PrimaryKeyForClass”>
<selector xpath=“Classes/Class”/>
<field xpath=“CrsCode”/>
<field xpath=“Semester”/>
</key>
The key comprises of two elements (CrsCode and
Semester) – both are children of Class
Collection of elements which are associated with
the key
78. Integrity Constraints - Foreign key
<keyref name=“XXX” refer=“adm:PrimaryKeyForClass”>
<selector xpath=“Students/Student/CrsTaken”/>
<field xpath=“@CrsCode”/>
<field xpath=“@Semester”/>
</keyref> Source Collection: where the elements should
satisfy the key specified by the “Prim … Class”
79. Figure 15.12
Course types at http://xyz.edu/CourseTypes.xsd.
Example of type definitions
Complex type
with only att’s
Complex type
with sequence
Simple type
with restriction
80. Figure 17.10A
Part of a schema with a key and a foreign-key constraint.
(continued on next slide)
Similarly to couseTakenType: type for classOfferings as
a sequence of classes whose type is classType
81. Figure 17.10B
Part of a schema with a key and a foreign-key constraint.
KEY: 2 children CrsCode
and Semester of Class
FOREIGN KEY:
2 attributes CrsCode
and Semester of CrsTaken
82. XML Query Languages
• Market, convenience, …
• XPath, XSLT, XQuery: three query
languages for XML
• XPath – simple & efficient
• XSLT – full feature programming language,
powerful query capabilities
• XQuery – SQL style query language – most
powerful query capabilities
83. XPath
• Idea comes from path expression of OQL in object
databases
• Extends the path expressions with query facilities
by allowing search condition to occur in path
expressions
• XPath data model: view documents as trees (see
picture), providing operators for tree traversing,
use absolute and relative path expression
• A XPath expression takes a document tree, returns
a set of nodes in the tree
85. XPath Expression - Examples
/Students/Student/CrsTaken – returns the set of
references to the nodes that correspond to the
elements CrsTaken
First or ./First refers to the node corresponds to the
same child element First if the current position is
Name
/Students/Student/CrsTaken/@CrsCode – the set
of values of attributes CrsCode
/Students/Student/Name/First/text() – the set of
contents of element First
86. Advanced Navigation
/Students/Student[1]/CrsTaken[2] – first Student
node, second CrsTaken node
//CrsTaken – all CrsTaken elements in the tree
(descendant-or-self)
Student/* - all e-children of the Student children of
the current node
/Students/Student[search_expression] – all Student
node satisfying the expressions; see what
search_expression can be in the book!
87. XPointer
• Use the features of XPath to navigate within
an XML document
• Syntax:
someURL#xpointer(XPathExpr1)xpointer(XPathExpr2)…
• Example:
http://www.foo.edu/Report.xml#xpointer(//Student[…])
88. XSLT
• Part of XSL – an extensible stylesheet langage of
XML, a transformation language for XML:
converting XML documents into any type of
documents (HTML, XML, etc)
• A functional programming language
• XML syntax
• Provide instructions for converting/extracting
information
• Output XML
89. XSLT Basics
• Stylesheet: specifies a transformation of one type
of document into another type
• Specifies by a command in the XML document
<?xml version=“1.0”?>
<?xml-stylesheet type=“text/xsl”
href=“http://xyz.edu/Report/report.xsl”?>
<Report Date=“2002-03-01”
….
</Report>
What parser should be used!
Location of the stylesheet
94. XSLT – Template
• Recursive traversal of the structures of the
document
• Often defined recursively
• Algorithm for processing a XSLT template
(book)
99. XQuery - Example
FOR $t IN
document(“http://xyz.edu/transcripts.xml”)
//Transcript
WHERE $t/CrsTaken/@CrsCode = “MA123”
RETURN $t/Student
Find all transcripts containing “MA123”
Return the set of Student’s elements of those
transcripts
Declare
$t and its
range
101. Putting it in well-formed XML
<StudentList>
(FOR $t IN
document(“http://xyz.edu/transcripts.xml”)
//Transcript
WHERE $t/CrsTaken/@CrsCode = “MA123”
RETURN $t/Student
)
</StudentList>
102. Figure 15.21
Construction of class rosters from transcripts: first try.
For each class $c, find the students attending the class and output
his information
= output one class roster for each CrsTaken node possibly more
than one if different students get different grade
103. Fix ?
• Assume that the list of classes is available –
write a different query
• Use the filter operation
106. FOR $c IN document(“http://xyz.edu/classes.xml”)//Class
RETURN
<ClassRoster CrsCode=$c/@CrsCode Semester=$c/@Semester>
$c/CrsName $c/Instructor
(FOR $t IN document(“http://xyz.edu/transcripts.xml”)//Transcript
WHERE $t/CrsTaken/@CrsCode = $c/@CrsCode
RETURN $t/Student
SORTBY($t/Student/@StudID)
)
</ClassRoster>
SORTBY($c/@CrsCode)
Give the “correct” result:
All ClassRoster, each only once
107. Filtering
• Syntax: filter(argument1, argument2)
• Meaning: return a document fragment
obtained by
– deleting from the set of nodes specified by
argument1 the nodes that do not occur in
argument2
– reconnecting the remaining nodes according to
the child-parent relationship of the document
specified by argument1
109. Root
Classes
Class
CrsName
Class Class
Result of: filter(//Class, //Class|//Class/CrsName)
fragment specified by //Class
//Class
//Class/CrsName
Result:
<Class><CrsName>Market Analysis</CrsName></Class>
<Class><CrsName>Electronic Circuits </CrsName></Class>
…….
110. LET $trs:=document(“http://xyz.edu/transcripts.xml”)//Transcript
LET $ct:=$trs/CrsTaken
FOR $c IN distinct(filter($ct, $ct|$ct/@CrsCode|$ct/@Semester))
RETURN
<ClassRoster CrsCode=$c/@CrsCode Semester=$c/@Semester>
(FOR $t IN $trs
WHERE $t/CrsTaken/@CrsCode = $c/@CrsCode AND
$t/CrsTaken/@Semester = $c/@Semester
RETURN $t/Student
SORTBY($t/Student/@StudID))
</ClassRoster>
SORTBY($c/@CrsCode)
Give the “correct” result:
All ClassRoster, each only once