This document introduces JSON-LD (JSON for Linking Data), which addresses two problems with standard JSON: ambiguity and lack of linking. JSON-LD adds context and identifiers to make JSON data unambiguous and able to link to other related data. It defines JSON documents as labeled, directed graphs. Examples show how JSON-LD can clarify data meaning through context, link related entities through identifiers, and support applications like search engine optimization, APIs, and Gmail actions.
JSON is a lightweight data-interchange format that is easy for humans to read and write and for machines to parse and generate. It is built on two structures: a collection of name/value pairs and an ordered list of values. JSON is widely used to transmit data between a server and web application, and has largely become the default format for asynchronous browser/server communication. While easy for machines to parse and generate, it is also readable by humans.
Slides for Tom Marrs BJUG talk on 2/12/2013. See http://boulderjug.org/2013/01/tuesday-february-12-2013-a-night-with-tom-marrs-covering-json-and-rest.html
The document discusses issues with Slovakia's judicial data and an effort to create an open data project to address these. It notes that justice.gov.sk search is useless, data is scattered across sites and contains errors. The project aims to create a unified dataset by matching judges, normalizing values, fixing links and downloading challenges. Over 400GB of judicial texts and images from various sources would be centralized and standardized through a Rails application and made openly available.
This document discusses mapping and analysis in ElasticSearch. It explains that mapping defines how documents are indexed and stored, including specifying field types and custom analyzers. Different analyzers, like standard, simple, and language-specific analyzers, tokenize and normalize text differently. Inner objects and arrays in documents are flattened during indexing for search. The document provides examples of mapping definitions and using the _analyze API to test analyzers.
This document discusses sorting and relevance in ElasticSearch. It provides examples of sorting search results by date or score. It also covers multilevel sorting, sorting on multivalue fields, and sorting on string fields after analyzing or not analyzing text. The document explains what determines relevance in ElasticSearch, including term frequency, inverse document frequency, and field length norm. It shows how to get explain plans and failure messages for queries. Finally, it provides a brief introduction to doc values in ElasticSearch and references a book for further information.
Aligning Web Services with the Semantic Web to Create a Global Read-Write Gra...Markus Lanthaler
Presentation of the paper "Aligning Web Services with the Semantic Web to Create a Global Read-Write Graph of Data" gave at the 9th IEEE European Conference on Web Services (ECOWS 2011) in Lugano, Switzerland.
Despite significant research and development efforts, the vision of the Semantic Web yielding to a Web of Data has not yet become reality. Even though initiatives such as Linking Open Data gained traction recently, the Web of Data is still clearly outpaced by the growth of the traditional, document-based Web. Instead of releasing data in the form of RDF, many publishers choose to publish their data in the form of Web services. The reasons for this are manifold. Given that RESTful Web services closely resemble the document-based Web, they are not only perceived as less complex and disruptive, but also provide read-write interfaces to the underlying data. In contrast, the current Semantic Web is essentially read-only which clearly inhibits net-working effects and engagement of the crowd. On the other hand, the prevalent use of proprietary schemas to represent the data published by Web services inhibits generic browsers or crawlers to access and understand this data; the consequence are islands of data instead of a global graph of data forming the envisioned Semantic Web. We thus propose a novel approach to integrate Web services into the Web of Data by introducing an algorithm to translate SPARQL queries to HTTP requests. The aim is to create a global read-write graph of data and to standardize the mashup development process. We try to keep the approach as familiar and simple as possible to lower the entry barrier and foster the adoption of our approach. Thus, we based our proposal on SEREDASj, a semantic description language for RESTful data services, for making proprietary JSON service schemas accessible.
This document summarizes research into discovering lost web pages using techniques from digital preservation and information retrieval. Key points include:
- Web pages are frequently lost due to broken links or content being moved/removed, but copies may still exist in search engine caches or archives.
- Techniques like lexical signatures (representing a page's content in a few keywords) and analyzing page titles, tags and link neighborhoods can help characterize lost pages and find similar replacement content.
- Experiments showed that lexical signatures degrade over time but page titles are more stable, and combining techniques improves performance in locating replacement content. The goal is to develop a browser extension to help users find lost web pages.
A Semantic Description Language for RESTful Data Services to Combat SemaphobiaMarkus Lanthaler
The document proposes a semantic description language (SEREDASj) to provide machine-readable descriptions of RESTful web services. It aims to address the lack of standards for describing REST APIs and help combat "semaphobia", the fear of semantics. The language builds on previous work but is tailored specifically for REST by focusing on simplicity and supporting many use cases including discovery and composition of RESTful services.
JSON is a lightweight data-interchange format that is easy for humans to read and write and for machines to parse and generate. It is built on two structures: a collection of name/value pairs and an ordered list of values. JSON is widely used to transmit data between a server and web application, and has largely become the default format for asynchronous browser/server communication. While easy for machines to parse and generate, it is also readable by humans.
Slides for Tom Marrs BJUG talk on 2/12/2013. See http://boulderjug.org/2013/01/tuesday-february-12-2013-a-night-with-tom-marrs-covering-json-and-rest.html
The document discusses issues with Slovakia's judicial data and an effort to create an open data project to address these. It notes that justice.gov.sk search is useless, data is scattered across sites and contains errors. The project aims to create a unified dataset by matching judges, normalizing values, fixing links and downloading challenges. Over 400GB of judicial texts and images from various sources would be centralized and standardized through a Rails application and made openly available.
This document discusses mapping and analysis in ElasticSearch. It explains that mapping defines how documents are indexed and stored, including specifying field types and custom analyzers. Different analyzers, like standard, simple, and language-specific analyzers, tokenize and normalize text differently. Inner objects and arrays in documents are flattened during indexing for search. The document provides examples of mapping definitions and using the _analyze API to test analyzers.
This document discusses sorting and relevance in ElasticSearch. It provides examples of sorting search results by date or score. It also covers multilevel sorting, sorting on multivalue fields, and sorting on string fields after analyzing or not analyzing text. The document explains what determines relevance in ElasticSearch, including term frequency, inverse document frequency, and field length norm. It shows how to get explain plans and failure messages for queries. Finally, it provides a brief introduction to doc values in ElasticSearch and references a book for further information.
Aligning Web Services with the Semantic Web to Create a Global Read-Write Gra...Markus Lanthaler
Presentation of the paper "Aligning Web Services with the Semantic Web to Create a Global Read-Write Graph of Data" gave at the 9th IEEE European Conference on Web Services (ECOWS 2011) in Lugano, Switzerland.
Despite significant research and development efforts, the vision of the Semantic Web yielding to a Web of Data has not yet become reality. Even though initiatives such as Linking Open Data gained traction recently, the Web of Data is still clearly outpaced by the growth of the traditional, document-based Web. Instead of releasing data in the form of RDF, many publishers choose to publish their data in the form of Web services. The reasons for this are manifold. Given that RESTful Web services closely resemble the document-based Web, they are not only perceived as less complex and disruptive, but also provide read-write interfaces to the underlying data. In contrast, the current Semantic Web is essentially read-only which clearly inhibits net-working effects and engagement of the crowd. On the other hand, the prevalent use of proprietary schemas to represent the data published by Web services inhibits generic browsers or crawlers to access and understand this data; the consequence are islands of data instead of a global graph of data forming the envisioned Semantic Web. We thus propose a novel approach to integrate Web services into the Web of Data by introducing an algorithm to translate SPARQL queries to HTTP requests. The aim is to create a global read-write graph of data and to standardize the mashup development process. We try to keep the approach as familiar and simple as possible to lower the entry barrier and foster the adoption of our approach. Thus, we based our proposal on SEREDASj, a semantic description language for RESTful data services, for making proprietary JSON service schemas accessible.
This document summarizes research into discovering lost web pages using techniques from digital preservation and information retrieval. Key points include:
- Web pages are frequently lost due to broken links or content being moved/removed, but copies may still exist in search engine caches or archives.
- Techniques like lexical signatures (representing a page's content in a few keywords) and analyzing page titles, tags and link neighborhoods can help characterize lost pages and find similar replacement content.
- Experiments showed that lexical signatures degrade over time but page titles are more stable, and combining techniques improves performance in locating replacement content. The goal is to develop a browser extension to help users find lost web pages.
A Semantic Description Language for RESTful Data Services to Combat SemaphobiaMarkus Lanthaler
The document proposes a semantic description language (SEREDASj) to provide machine-readable descriptions of RESTful web services. It aims to address the lack of standards for describing REST APIs and help combat "semaphobia", the fear of semantics. The language builds on previous work but is tailored specifically for REST by focusing on simplicity and supporting many use cases including discovery and composition of RESTful services.
JSON is a lightweight data-interchange format that is easy for humans to read and write and for machines to parse and generate. It is built on two structures: a collection of name/value pairs and an ordered list of values. JSON is text-based and language independent, yet closely resembles JavaScript object syntax. It is used primarily to transmit data between a server and web application, serving as an alternative to XML. Compared to XML, JSON is simpler, faster and easier to use.
MongoDB is a non-relational database that uses a document-based data model. It is an alternative to traditional relational databases and is optimized for storing large amounts of unstructured and semi-structured data. MongoDB does not require a predefined schema and allows flexible, dynamic queries against documents using JavaScript. While relational databases are better suited for transactions, MongoDB is designed for horizontal scalability, faster queries, and flexible data modeling.
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB
The popularity of dedicated graph technologies has risen greatly in recent years, at least partly fuelled by the explosion in social media and similar systems, where a friend network or recommendation engine is often a critical component when delivering a successful application. MongoDB 3.4 introduces a new Aggregation Framework graph operator, $graphLookup, to enable some of these types of use cases to be built easily on top of MongoDB. We will see how semantic relationships can be modelled inside MongoDB today, how the new $graphLookup operator can help simplify this in 3.4, and how $graphLookup can be used to leverage these relationships and build a commercially focused news article recommendation system.
This document describes Doc2Graph, an open source tool that transforms JSON documents into a graph database. It discusses how Doc2Graph works, including converting JSON trees into a graph and reusing existing nodes. It also provides examples of using Doc2Graph with CouchbaseDB, MongoDB, and the Spotify API to import music data into Neo4j. The document concludes with information on Doc2Graph's configuration options.
JSON is an important datatype transporting data between servers and many modern applications. Postgres has been at the forefront of bringing these capabilities into the hands of database users. JSONB data type allows for faster operations within PostgreSQL.
At this webinar we will look at:
- How to use JSON from applications
- How to store it in the database
- How to index JSON data
- Tips and tricks to optimize usage
We then closed with a review of the roadmap for new PostgreSQL features for JSON and JSON standards compliance.
The document discusses NoSQL databases and MongoDB. It defines NoSQL, describes different types of NoSQL databases like key-value, document, graph and column-oriented. It then focuses on MongoDB, explaining its advantages like high performance, flexibility and rich queries. It covers MongoDB concepts like collections and documents. It also discusses JSON structure, MongoDB methods for insertion, querying and removal of documents. Finally, it provides examples of when to use MongoDB, such as for unstructured data, high volume read/writes and changing schemas.
FIWARE Wednesday Webinars - Introduction to NGSI-LDFIWARE
Introduction to NGSI-LD Webinar - 27th May 2020
Corresponding webinar recording: https://youtu.be/rZ13IyLpAtA
A data-model driven and linked data first introduction for developers to NGSI-LD and JSON-LD.
Chapter: Core
Difficulty: 3
Audience: Any Technical
Presenter: Jason Fox (Senior Technical Evangelist, FIWARE Foundation)
Streams of information - Chicago crystal language monthly meetupBrian Cardiff
* Let's review and compare a couple of scenarios where data flows in and out of the system.
* What should we look at for better resource utilization?
* What have Crystal std-lib done up until now?
* What are the open questions for future work?
This document discusses techniques for improving JSON parsing performance on Android. It begins by introducing the author and describing LinkedIn's mobile app ecosystem. It then analyzes factors that affect JSON parsing like memory usage and parsing approaches. The document evaluates various JSON parsing libraries and binary formats. It proposes optimizations like using code generation, streaming parsing, removing JSON key comparisons via a trie, and leveraging known data schemas to further optimize parsing. Profiling revealed additional gains from eliminating byte to char conversions and temporary string allocations during parsing. The goal is to close the performance gap with highly optimized binary parsers.
Back to Basics Webinar 3: Schema Design Thinking in DocumentsMongoDB
This is the third webinar of a Back to Basics series that will introduce you to the MongoDB database. This webinar will explain the architecture of document databases.
A brief introduction to JSON and how it can be used with PHP and modern template engines such as Handelbars, AngularJS and Mustache - Source Code / Live Demo - http://r1.my/klmug/11/
The document discusses JSON support in Java EE 8, including the JSON Processing (JSON-P) and JSON Binding APIs. It provides an overview of the JSON-P API for parsing, generating, and manipulating JSON, including the streaming JsonParser and JsonGenerator classes and object model classes. It also discusses the upcoming JSON-P 1.1 specification's support for JSON Pointer, JSON Patch, and JSON Merge Patch standards.
Back to Basics Webinar 3 - Thinking in DocumentsJoe Drumgoole
- The document discusses modeling data in MongoDB based on cardinality and access patterns.
- It provides examples of embedding related data for one-to-one and one-to-many relationships, and references for large collections.
- The document recommends considering read/write patterns and embedding objects for efficient access, while breaking out data if it grows too large.
JSON Schema is an extremely powerful, yet easily approachable, tool for describing data structures. In fact, the OpenAPI has embraced JSON Schema and currently uses it for describing the inputs/outputs of your APIs. JSON Schema is a technology that is often misunderstood and often used in ways that leave people scratching their heads when it does not work the way they expected. This talk will introduce JSON Schema from the ground up, complete with gotchas and best practices. In the end, the hope is that the attendee will see the value of JSON Schema and understand it well enough to use in their OpenAPI documents and even their own applications.
The document discusses using JSON-LD and RDF to add semantic meaning to web APIs while maintaining compatibility with existing JSON formats. It explains how RDF uses triples to make statements about resources, and how JSON-LD allows embedding RDF semantics in JSON without changing the format. This allows merging data from multiple sources and facilitates data interchange and evolution of schemas over time.
EWD 3 Training Course Part 18: Modelling NoSQL Databases using Global StorageRob Tweed
This presentation is Part 18 of the EWD 3 Training Course. It examines how the 4 main NoSQL database types can be modelled using a Global Storage Database
Complex queries in a distributed multi-model databaseMax Neunhöffer
A multi-model database is a document store, a graph database as well as a key/value store. To allow for convenient and powerful querying such a database needs a query language that understands all three data models and allows to mix these models in queries. For example, it should be possible to find some documents in a collection according to some criteria, then follow some edges in a graph in which the documents represent vertices, and finally join the results with documents from yet another collection.
In this talk I will explain how a query engine for such a language works, give an overview of the life of a query from parsing, over translation into an execution plan, the optimisation phase and finally the execution. I will show how distributed query execution plans look like, how the query optimiser reasons about them and how the distributed execution works.
Presentation on various definitions for JSON including JSON-RPC, JSPON, JSON Schema, JSONP and tools for working these definitions including Persevere client and server..
Webinar: Working with Graph Data in MongoDBMongoDB
With the release of MongoDB 3.4, the number of applications that can take advantage of MongoDB has expanded. In this session we will look at using MongoDB for representing graphs and how graph relationships can be modeled in MongoDB.
We will also look at a new aggregation operation that we recently implemented for graph traversal and computing transitive closure. We will include an overview of the new operator and provide examples of how you can exploit this new feature in your MongoDB applications.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
JSON is a lightweight data-interchange format that is easy for humans to read and write and for machines to parse and generate. It is built on two structures: a collection of name/value pairs and an ordered list of values. JSON is text-based and language independent, yet closely resembles JavaScript object syntax. It is used primarily to transmit data between a server and web application, serving as an alternative to XML. Compared to XML, JSON is simpler, faster and easier to use.
MongoDB is a non-relational database that uses a document-based data model. It is an alternative to traditional relational databases and is optimized for storing large amounts of unstructured and semi-structured data. MongoDB does not require a predefined schema and allows flexible, dynamic queries against documents using JavaScript. While relational databases are better suited for transactions, MongoDB is designed for horizontal scalability, faster queries, and flexible data modeling.
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB
The popularity of dedicated graph technologies has risen greatly in recent years, at least partly fuelled by the explosion in social media and similar systems, where a friend network or recommendation engine is often a critical component when delivering a successful application. MongoDB 3.4 introduces a new Aggregation Framework graph operator, $graphLookup, to enable some of these types of use cases to be built easily on top of MongoDB. We will see how semantic relationships can be modelled inside MongoDB today, how the new $graphLookup operator can help simplify this in 3.4, and how $graphLookup can be used to leverage these relationships and build a commercially focused news article recommendation system.
This document describes Doc2Graph, an open source tool that transforms JSON documents into a graph database. It discusses how Doc2Graph works, including converting JSON trees into a graph and reusing existing nodes. It also provides examples of using Doc2Graph with CouchbaseDB, MongoDB, and the Spotify API to import music data into Neo4j. The document concludes with information on Doc2Graph's configuration options.
JSON is an important datatype transporting data between servers and many modern applications. Postgres has been at the forefront of bringing these capabilities into the hands of database users. JSONB data type allows for faster operations within PostgreSQL.
At this webinar we will look at:
- How to use JSON from applications
- How to store it in the database
- How to index JSON data
- Tips and tricks to optimize usage
We then closed with a review of the roadmap for new PostgreSQL features for JSON and JSON standards compliance.
The document discusses NoSQL databases and MongoDB. It defines NoSQL, describes different types of NoSQL databases like key-value, document, graph and column-oriented. It then focuses on MongoDB, explaining its advantages like high performance, flexibility and rich queries. It covers MongoDB concepts like collections and documents. It also discusses JSON structure, MongoDB methods for insertion, querying and removal of documents. Finally, it provides examples of when to use MongoDB, such as for unstructured data, high volume read/writes and changing schemas.
FIWARE Wednesday Webinars - Introduction to NGSI-LDFIWARE
Introduction to NGSI-LD Webinar - 27th May 2020
Corresponding webinar recording: https://youtu.be/rZ13IyLpAtA
A data-model driven and linked data first introduction for developers to NGSI-LD and JSON-LD.
Chapter: Core
Difficulty: 3
Audience: Any Technical
Presenter: Jason Fox (Senior Technical Evangelist, FIWARE Foundation)
Streams of information - Chicago crystal language monthly meetupBrian Cardiff
* Let's review and compare a couple of scenarios where data flows in and out of the system.
* What should we look at for better resource utilization?
* What have Crystal std-lib done up until now?
* What are the open questions for future work?
This document discusses techniques for improving JSON parsing performance on Android. It begins by introducing the author and describing LinkedIn's mobile app ecosystem. It then analyzes factors that affect JSON parsing like memory usage and parsing approaches. The document evaluates various JSON parsing libraries and binary formats. It proposes optimizations like using code generation, streaming parsing, removing JSON key comparisons via a trie, and leveraging known data schemas to further optimize parsing. Profiling revealed additional gains from eliminating byte to char conversions and temporary string allocations during parsing. The goal is to close the performance gap with highly optimized binary parsers.
Back to Basics Webinar 3: Schema Design Thinking in DocumentsMongoDB
This is the third webinar of a Back to Basics series that will introduce you to the MongoDB database. This webinar will explain the architecture of document databases.
A brief introduction to JSON and how it can be used with PHP and modern template engines such as Handelbars, AngularJS and Mustache - Source Code / Live Demo - http://r1.my/klmug/11/
The document discusses JSON support in Java EE 8, including the JSON Processing (JSON-P) and JSON Binding APIs. It provides an overview of the JSON-P API for parsing, generating, and manipulating JSON, including the streaming JsonParser and JsonGenerator classes and object model classes. It also discusses the upcoming JSON-P 1.1 specification's support for JSON Pointer, JSON Patch, and JSON Merge Patch standards.
Back to Basics Webinar 3 - Thinking in DocumentsJoe Drumgoole
- The document discusses modeling data in MongoDB based on cardinality and access patterns.
- It provides examples of embedding related data for one-to-one and one-to-many relationships, and references for large collections.
- The document recommends considering read/write patterns and embedding objects for efficient access, while breaking out data if it grows too large.
JSON Schema is an extremely powerful, yet easily approachable, tool for describing data structures. In fact, the OpenAPI has embraced JSON Schema and currently uses it for describing the inputs/outputs of your APIs. JSON Schema is a technology that is often misunderstood and often used in ways that leave people scratching their heads when it does not work the way they expected. This talk will introduce JSON Schema from the ground up, complete with gotchas and best practices. In the end, the hope is that the attendee will see the value of JSON Schema and understand it well enough to use in their OpenAPI documents and even their own applications.
The document discusses using JSON-LD and RDF to add semantic meaning to web APIs while maintaining compatibility with existing JSON formats. It explains how RDF uses triples to make statements about resources, and how JSON-LD allows embedding RDF semantics in JSON without changing the format. This allows merging data from multiple sources and facilitates data interchange and evolution of schemas over time.
EWD 3 Training Course Part 18: Modelling NoSQL Databases using Global StorageRob Tweed
This presentation is Part 18 of the EWD 3 Training Course. It examines how the 4 main NoSQL database types can be modelled using a Global Storage Database
Complex queries in a distributed multi-model databaseMax Neunhöffer
A multi-model database is a document store, a graph database as well as a key/value store. To allow for convenient and powerful querying such a database needs a query language that understands all three data models and allows to mix these models in queries. For example, it should be possible to find some documents in a collection according to some criteria, then follow some edges in a graph in which the documents represent vertices, and finally join the results with documents from yet another collection.
In this talk I will explain how a query engine for such a language works, give an overview of the life of a query from parsing, over translation into an execution plan, the optimisation phase and finally the execution. I will show how distributed query execution plans look like, how the query optimiser reasons about them and how the distributed execution works.
Presentation on various definitions for JSON including JSON-RPC, JSPON, JSON Schema, JSONP and tools for working these definitions including Persevere client and server..
Webinar: Working with Graph Data in MongoDBMongoDB
With the release of MongoDB 3.4, the number of applications that can take advantage of MongoDB has expanded. In this session we will look at using MongoDB for representing graphs and how graph relationships can be modeled in MongoDB.
We will also look at a new aggregation operation that we recently implemented for graph traversal and computing transitive closure. We will include an overview of the new operator and provide examples of how you can exploit this new feature in your MongoDB applications.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
2. Outline
1 Why Using JSON-LD
2 How To Deal With Ambiguity
3 How To Deal With Linking
4 What Is JSON-LD
5 Applications of JSON-LD
6 Conclusion
2 / 29
Introduction to JSON-LD
3. Why Using JSON-LD
Outline
1 Why Using JSON-LD
2 How To Deal With Ambiguity
3 How To Deal With Linking
4 What Is JSON-LD
5 Applications of JSON-LD
6 Conclusion
3 / 29
Introduction to JSON-LD
4. Why Using JSON-LD
Why Using JSON-LD
What’s the problem with JSON ?
4 / 29
Introduction to JSON-LD
5. Why Using JSON-LD
Why Using JSON-LD
What’s the problem with JSON ?
1 {
2 " i d " : 3 ,
3 "number" : 4 ,
4 " value " : 5 ,
5 " count " : 6
6 }
What’s the meaning of this JSON ?
4 / 29
Introduction to JSON-LD
6. Why Using JSON-LD
Why Using JSON-LD
What’s the problem with JSON ?
1 {
2 " i d " : 3 ,
3 "number" : 4 ,
4 " value " : 5 ,
5 " count " : 6
6 }
What’s the meaning of this JSON ?
The first problem is ambiguity of JSON !!
4 / 29
Introduction to JSON-LD
7. Why Using JSON-LD
Why Using JSON-LD
What’s the meaning of this JSON?
1 {
2 "name" : "Bob" ,
3 "number" : 17 ,
4 " percentage " : 32.3 ,
5 " s t e a l " : 2.2 ,
6 " a s s i s t " 7.5
7 }
5 / 29
Introduction to JSON-LD
8. Why Using JSON-LD
Why Using JSON-LD
What’s the meaning of this JSON?
1 {
2 "name" : "Bob" ,
3 "number" : 17 ,
4 " percentage " : 32.3 ,
5 " s t e a l " : 2.2 ,
6 " a s s i s t " 7.5
7 }
describing a student named Bob?
5 / 29
Introduction to JSON-LD
9. Why Using JSON-LD
Why Using JSON-LD
What’s the meaning of this JSON?
1 {
2 "name" : "Bob" ,
3 "number" : 17 ,
4 " percentage " : 32.3 ,
5 " s t e a l " : 2.2 ,
6 " a s s i s t " 7.5
7 }
describing a student named Bob?
describing a basketball player named Bob?
5 / 29
Introduction to JSON-LD
10. Why Using JSON-LD
Why Using JSON-LD
What’s the meaning of this JSON?
1 {
2 "name" : "Bob" ,
3 "number" : 17 ,
4 " percentage " : 32.3 ,
5 " s t e a l " : 2.2 ,
6 " a s s i s t " 7.5
7 }
describing a student named Bob?
describing a basketball player named Bob?
the same JSON means different?
5 / 29
Introduction to JSON-LD
11. Why Using JSON-LD
Why Using JSON-LD
different JSON means the same?
1 {
2 " p l a y e r " : "Bob" ,
3 " Id " : 17 ,
4 " shooting " : 32.3 ,
5 "ST" : 2.2 ,
6 "AST" : 7.5
7 }
describing a basketball player named Bob?
6 / 29
Introduction to JSON-LD
12. How To Deal With Ambiguity
Outline
1 Why Using JSON-LD
2 How To Deal With Ambiguity
3 How To Deal With Linking
4 What Is JSON-LD
5 Applications of JSON-LD
6 Conclusion
7 / 29
Introduction to JSON-LD
13. How To Deal With Ambiguity
Be Specific
specifying the clear definition by an URI
http://schema.org provides shared common vocabulary
1 {
2 " http :// schema . org /name" : "Bob" ,
3 " http :// example . com/ JerseyNumber " : 17 ,
4 " http :// example . com/ FieldGoalPercentage " : 32.3
5 }
8 / 29
Introduction to JSON-LD
14. How To Deal With Ambiguity
Be Specific
specifying the clear definition by an URI
http://schema.org provides shared common vocabulary
1 {
2 " http :// schema . org /name" : "Bob" ,
3 " http :// example . com/ JerseyNumber " : 17 ,
4 " http :// example . com/ FieldGoalPercentage " : 32.3
5 }
specific but too complicated ?
8 / 29
Introduction to JSON-LD
15. How To Deal With Ambiguity
Be Concise
concise but ambiguous ?
1 {
2 " p l a y e r " : "Bob" ,
3 " Id " : 17 ,
4 " shooting " : 32.3 ,
5 "ST" : 2.2 ,
6 "AST" : 7.5
7 }
9 / 29
Introduction to JSON-LD
16. How To Deal With Ambiguity
Be Specific And Concise
With "@context", being specific and concise!!
1 {
2 " @context " : " b a s k e t b a l l " ,
3 " p l a y e r " : "Bob" ,
4 " Id " : 17 ,
5 " shooting " : 32.3 ,
6 "ST" : 2.2 ,
7 "AST" : 7.5
8 }
10 / 29
Introduction to JSON-LD
17. How To Deal With Ambiguity
Be Specific And Concise
With "@context", being concise and specific !!
1 {
2 " @context " : {
3 "name" : " http :// schema . org /name" ,
4 "number" : " http :// example . com/ JerseyNumber " ,
5 "FGP" : " http :// example . com/ FieldGoalPercentage "
6 } ,
7 "name" : "Bob" ,
8 "number" : 17 ,
9 "FGP" : 32.3
10 }
JSON-LD uses "@context" to define document context
11 / 29
Introduction to JSON-LD
18. How To Deal With Ambiguity
Context
"@context" defines
1 What do properties exactly means?
2 What type is the value?
What type is "02.15"?
1 Date (February 15) ?
2 Time (a quarter past two) ?
3 Decimal (2.15) ?
4 Percentage (2.15%)?
12 / 29
Introduction to JSON-LD
19. How To Deal With Ambiguity
Be Specific And Concise
define type
1 {
2 " @context " : { . . . } ,
3 "name" : "Bob" ,
4 "number" : {
5 " @value " : "17" ,
6 "@type" : " http ://www. w3 . org /2001/XMLSchema#
i n t e g e r "
7 }
8 "FGP" : {
9 " @value " : " 32.3 " ,
10 "@type" : " http ://www. w3 . org /2001/XMLSchema#
decimal "
11 }
12 }
13 / 29
Introduction to JSON-LD
20. How To Deal With Ambiguity
Be Specific And Concise
define type in "@context"
1 {
2 " @context " : {
3 "name" : " http :// schema . org /name" ,
4 "number" : {
5 "@id" : " http :// example . com/ JerseyNumber " ,
6 "@type" : " http ://www . . . / XMLSchema#i n t e g e r "
7 }
8 "FGP" : {
9 "@id" : " http :// example . com/ FieldGoalPe . . . " ,
10 "@type" : " http ://www . . . / XMLSchema#decimal "
11 }
12 } ,
13 "name" : "Bob" ,
14 "number" : "17" ,
15 "FGP" : " 32.3 "
16 }
14 / 29
Introduction to JSON-LD
21. How To Deal With Ambiguity
Be Specific And Concise
context separation
1 {
2 " @context " : " http :// json −l d . org / c o n t e x t s / b a s k e t b a l l .
j s o n l d " ,
3 "name" : "Bob" ,
4 "number" : "17" ,
5 "FGP" : " 32.3 "
6 }
HTTP Link Header
JSON-LD is as simple as JSON and compatible with ordinary JSON
document
15 / 29
Introduction to JSON-LD
22. How To Deal With Ambiguity
Be Specific And Concise
which Bob is ?
1 {
2 " @context " : " http :// json −l d . . . / b a s k e t b a l l . j s o n l d " ,
3 "@id" : " http :// example . com/ b a s k e t b a l l / p l a y e r s /bob" ,
4 "@type" : " http :// schema . org / Person " ,
5 "name" : "Bob" ,
6 "number" : "17" ,
7 "FGP" : " 32.3 "
8 }
JSON-LD gives your data an identifier
16 / 29
Introduction to JSON-LD
23. How To Deal With Linking
Outline
1 Why Using JSON-LD
2 How To Deal With Ambiguity
3 How To Deal With Linking
4 What Is JSON-LD
5 Applications of JSON-LD
6 Conclusion
17 / 29
Introduction to JSON-LD
24. How To Deal With Linking
Why Using JSON-LD
The second problem is how to link JSON !!
18 / 29
Introduction to JSON-LD
25. How To Deal With Linking
Why Using JSON-LD
The second problem is how to link JSON !!
1 {
2 "name" : " A l i c e " ,
3 "number" : "27" ,
4 "FGP" : " 50.3 "
5 }
1 {
2 "name" : "Bob" ,
3 "number" : "17" ,
4 "FGP" : " 32.3 " ,
5 " help " : [ " A l i c e " , . . . ]
6 }
which Alice is ?
JSON has no built-in support for hyperlink
18 / 29
Introduction to JSON-LD
26. How To Deal With Linking
Linking By Identifier
JSON-LD gives your data an identifier, which is convenient for linking
1 {
2 " @context " : " http :// json −l d . . . / b a s k e t b a l l . j s o n l d " ,
3 "@id" : " http :// example . com/ b a s k e t b a l l / p l a y e r s / a l i c e " ,
4 "name" : " A l i c e " ,
5 }
1 {
2 " @context " : " http :// json −l d . . . / b a s k e t b a l l . j s o n l d " ,
3 "@id" : " http :// example . com/ b a s k e t b a l l / p l a y e r s /bob" ,
4 "name" : "Bob" ,
5 " help " : [ {
6 "@id" : " http :// example . com/ b a s k e t b a l l / p l a y e r s /
a l i c e "
7 } , . . .
8 ]
9 }
19 / 29
Introduction to JSON-LD
27. How To Deal With Linking
Linking By Identifier
JSON-LD serializes a labeled directed graph
JSON-LD can describe almost everything
without ambiguity, JSON-LD is a machine-readable data
20 / 29
Introduction to JSON-LD
28. What Is JSON-LD
Outline
1 Why Using JSON-LD
2 How To Deal With Ambiguity
3 How To Deal With Linking
4 What Is JSON-LD
5 Applications of JSON-LD
6 Conclusion
21 / 29
Introduction to JSON-LD
29. What Is JSON-LD
JSON for Linking Data
JSON
a kind of simple properties value pairs data format used in transmitting
data between websites
22 / 29
Introduction to JSON-LD
30. What Is JSON-LD
JSON for Linking Data
JSON
a kind of simple properties value pairs data format used in transmitting
data between websites
Linking Data
a way to create network of sdandard-based machine-readable data acrose
websites
22 / 29
Introduction to JSON-LD
31. What Is JSON-LD
JSON for Linking Data
JSON
a kind of simple properties value pairs data format used in transmitting
data between websites
Linking Data
a way to create network of sdandard-based machine-readable data acrose
websites
JSON-LD
a lightweight syntax to serialize Linking Data based on JSON
JSON-LD 1.0 specification http://www.w3.org/TR/json-ld/
22 / 29
Introduction to JSON-LD
32. Applications of JSON-LD
Outline
1 Why Using JSON-LD
2 How To Deal With Ambiguity
3 How To Deal With Linking
4 What Is JSON-LD
5 Applications of JSON-LD
6 Conclusion
23 / 29
Introduction to JSON-LD
33. Applications of JSON-LD
Search Engine Optimization
embedding JSON-LD in HTML document
1 <s c r i p t type=" a p p l i c a t i o n / l d+js o n ">
2 {
3 " @context " : " http :// schema . org " ,
4 "@type" : " Person " ,
5 "name" : "Bo−Kai" ,
6 "age" : "25"
7 }
8 </s c r i p t >
making search engine know the meaning of data
24 / 29
Introduction to JSON-LD
34. Applications of JSON-LD
Gmail
tagging action information in Gmail
1 <s c r i p t type=" a p p l i c a t i o n / l d+js o n ">
2 {
3 " @context " : " http :// schema . org " ,
4 "@type" : " RestaurantReserveAction " ,
5 " l o c a t i o n " : " Taipei " ,
6 " p a r t i c i p a n t s " : [ {"@id" : " http : / / . . . " } , . . . ]
7 }
8 </s c r i p t >
making active service possible
25 / 29
Introduction to JSON-LD
36. Conclusion
Outline
1 Why Using JSON-LD
2 How To Deal With Ambiguity
3 How To Deal With Linking
4 What Is JSON-LD
5 Applications of JSON-LD
6 Conclusion
27 / 29
Introduction to JSON-LD
37. Conclusion
JSON-LD
1 make data readable for machine without ambiguity
2 make data link together
3 be applied in SEO, Gmail, and API documentation
28 / 29
Introduction to JSON-LD