CosmosDB service is a NoSQL is a globally distributed, multi-model database database service designed for scalable and high performance modern applications. CosmosDB is delivered as a fully managed service with an enterprise grade SLA. It supports querying of documents using a familiar SQL over hierarchical JSON documents. Azure Cosmos DB is a superset of the DocumentDB service. It allows you to store and query noSQL data, regardless of schema. In this presentation, you will learn: • How to get started with DocumentDB you provision a new database account. • How to index documents • How to create applications using CosmosDb (using REST API or programming libraries for several popular language) • Best practices designing applications with CosmosDB • Best practices creating queries.
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionDataWorks Summit
A Fortune 100 company recently introduced Hadoop into their data warehouse environment and ETL workflow to save $30 Million. This session examines the specific use case to illustrate the design considerations, as well as the economics behind ETL offload with Hadoop. Additional information about how the Hadoop platform was leveraged to support extended analytics will also be referenced.
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
The traditional data warehouse has served us well for many years, but new trends are causing it to break in four different ways: data growth, fast query expectations from users, non-relational/unstructured data, and cloud-born data. How can you prevent this from happening? Enter the modern data warehouse, which is able to handle and excel with these new trends. It handles all types of data (Hadoop), provides a way to easily interface with all these types of data (PolyBase), and can handle “big data” and provide fast queries. Is there one appliance that can support this modern data warehouse? Yes! It is the Analytics Platform System (APS) from Microsoft (formally called Parallel Data Warehouse or PDW) , which is a Massively Parallel Processing (MPP) appliance that has been recently updated (v2 AU1). In this session I will dig into the details of the modern data warehouse and APS. I will give an overview of the APS hardware and software architecture, identify what makes APS different, and demonstrate the increased performance. In addition I will discuss how Hadoop, HDInsight, and PolyBase fit into this new modern data warehouse.
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionDataWorks Summit
A Fortune 100 company recently introduced Hadoop into their data warehouse environment and ETL workflow to save $30 Million. This session examines the specific use case to illustrate the design considerations, as well as the economics behind ETL offload with Hadoop. Additional information about how the Hadoop platform was leveraged to support extended analytics will also be referenced.
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
The traditional data warehouse has served us well for many years, but new trends are causing it to break in four different ways: data growth, fast query expectations from users, non-relational/unstructured data, and cloud-born data. How can you prevent this from happening? Enter the modern data warehouse, which is able to handle and excel with these new trends. It handles all types of data (Hadoop), provides a way to easily interface with all these types of data (PolyBase), and can handle “big data” and provide fast queries. Is there one appliance that can support this modern data warehouse? Yes! It is the Analytics Platform System (APS) from Microsoft (formally called Parallel Data Warehouse or PDW) , which is a Massively Parallel Processing (MPP) appliance that has been recently updated (v2 AU1). In this session I will dig into the details of the modern data warehouse and APS. I will give an overview of the APS hardware and software architecture, identify what makes APS different, and demonstrate the increased performance. In addition I will discuss how Hadoop, HDInsight, and PolyBase fit into this new modern data warehouse.
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
You’ve heard the marketing buzz, maybe you have been to a workshop and worked with some Spark, Delta, SQL, Python, or R, but you still need some help putting all the pieces together? Join us as we review some common techniques to build a lakehouse using Delta Lake, use SQL Analytics to perform exploratory analysis, and build connectivity for BI applications.
In this session, Sergio covered the Lakehouse concept and how companies implement it, from data ingestion to insight. He showed how you could use Azure Data Services to speed up your Analytics project from ingesting, modelling and delivering insights to end users.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes.
Level: Intermediate
Speakers:
Ryan Malecky - Solutions Architect, EdTech, AWS
Rajakumar Sampathkumar - Sr. Technical Account Manager, AWS
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze big data for a fraction of the cost of traditional data warehouses. In this session, we take an in-depth look at data warehousing with Amazon Redshift for big data analytics. We cover best practices to take advantage of Amazon Redshift's columnar technology and parallel processing capabilities to deliver high throughput and query performance. We also discuss how to design optimal schemas, load data efficiently, and use workload management.
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Apache Iceberg - A Table Format for Hige Analytic Datasets
Speaker:
Ryan Blue, Netflix
For more Alluxio events: https://www.alluxio.io/events/
Take an in-depth look at data warehousing with Amazon Redshift and get answers to your technical questions. We will cover performance tuning techniques that take advantage of Amazon Redshift's columnar technology and massively parallel processing architecture. We will also discuss best practices for migrating from existing data warehouses, optimizing your schema, loading data efficiently, and using work load management and interleaved sorting.
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
Data Lakes have been built with a desire to democratize data - to allow more and more people, tools, and applications to make use of data. A key capability needed to achieve it is hiding the complexity of underlying data structures and physical data storage from users. The de-facto standard has been the Hive table format addresses some of these problems but falls short at data, user, and application scale. So what is the answer? Apache Iceberg.
Apache Iceberg table format is now in use and contributed to by many leading tech companies like Netflix, Apple, Airbnb, LinkedIn, Dremio, Expedia, and AWS.
Watch Alex Merced, Developer Advocate at Dremio, as he describes the open architecture and performance-oriented capabilities of Apache Iceberg.
You will learn:
• The issues that arise when using the Hive table format at scale, and why we need a new table format
• How a straightforward, elegant change in table format structure has enormous positive effects
• The underlying architecture of an Apache Iceberg table, how a query against an Iceberg table works, and how the table’s underlying structure changes as CRUD operations are done on it
• The resulting benefits of this architectural design
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
Recently, a set of modern table formats such as Delta Lake, Hudi, Iceberg spring out. Along with Hive Metastore these table formats are trying to solve problems that stand in traditional data lake for a long time with their declared features like ACID, schema evolution, upsert, time travel, incremental consumption etc.
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Media owners are turning to MongoDB to drive social interaction with their published content. The way customers consume information has changed and passive communication is no longer enough. They want to comment, share and engage with publishers and their community through a range of media types and via multiple channels whenever and wherever they are. There are serious challenges with taking this semi-structured and unstructured data and making it work in a traditional relational database. This webinar looks at how MongoDB’s schemaless design and document orientation gives organisation’s like the Guardian the flexibility to aggregate social content and scale out.
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
You’ve heard the marketing buzz, maybe you have been to a workshop and worked with some Spark, Delta, SQL, Python, or R, but you still need some help putting all the pieces together? Join us as we review some common techniques to build a lakehouse using Delta Lake, use SQL Analytics to perform exploratory analysis, and build connectivity for BI applications.
In this session, Sergio covered the Lakehouse concept and how companies implement it, from data ingestion to insight. He showed how you could use Azure Data Services to speed up your Analytics project from ingesting, modelling and delivering insights to end users.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes.
Level: Intermediate
Speakers:
Ryan Malecky - Solutions Architect, EdTech, AWS
Rajakumar Sampathkumar - Sr. Technical Account Manager, AWS
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze big data for a fraction of the cost of traditional data warehouses. In this session, we take an in-depth look at data warehousing with Amazon Redshift for big data analytics. We cover best practices to take advantage of Amazon Redshift's columnar technology and parallel processing capabilities to deliver high throughput and query performance. We also discuss how to design optimal schemas, load data efficiently, and use workload management.
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Apache Iceberg - A Table Format for Hige Analytic Datasets
Speaker:
Ryan Blue, Netflix
For more Alluxio events: https://www.alluxio.io/events/
Take an in-depth look at data warehousing with Amazon Redshift and get answers to your technical questions. We will cover performance tuning techniques that take advantage of Amazon Redshift's columnar technology and massively parallel processing architecture. We will also discuss best practices for migrating from existing data warehouses, optimizing your schema, loading data efficiently, and using work load management and interleaved sorting.
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
Data Lakes have been built with a desire to democratize data - to allow more and more people, tools, and applications to make use of data. A key capability needed to achieve it is hiding the complexity of underlying data structures and physical data storage from users. The de-facto standard has been the Hive table format addresses some of these problems but falls short at data, user, and application scale. So what is the answer? Apache Iceberg.
Apache Iceberg table format is now in use and contributed to by many leading tech companies like Netflix, Apple, Airbnb, LinkedIn, Dremio, Expedia, and AWS.
Watch Alex Merced, Developer Advocate at Dremio, as he describes the open architecture and performance-oriented capabilities of Apache Iceberg.
You will learn:
• The issues that arise when using the Hive table format at scale, and why we need a new table format
• How a straightforward, elegant change in table format structure has enormous positive effects
• The underlying architecture of an Apache Iceberg table, how a query against an Iceberg table works, and how the table’s underlying structure changes as CRUD operations are done on it
• The resulting benefits of this architectural design
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
Recently, a set of modern table formats such as Delta Lake, Hudi, Iceberg spring out. Along with Hive Metastore these table formats are trying to solve problems that stand in traditional data lake for a long time with their declared features like ACID, schema evolution, upsert, time travel, incremental consumption etc.
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Media owners are turning to MongoDB to drive social interaction with their published content. The way customers consume information has changed and passive communication is no longer enough. They want to comment, share and engage with publishers and their community through a range of media types and via multiple channels whenever and wherever they are. There are serious challenges with taking this semi-structured and unstructured data and making it work in a traditional relational database. This webinar looks at how MongoDB’s schemaless design and document orientation gives organisation’s like the Guardian the flexibility to aggregate social content and scale out.
MongoDB for Coder Training (Coding Serbia 2013)Uwe Printz
Slides of my MongoDB Training given at Coding Serbia Conference on 18.10.2013
Agenda:
1. Introduction to NoSQL & MongoDB
2. Data manipulation: Learn how to CRUD with MongoDB
3. Indexing: Speed up your queries with MongoDB
4. MapReduce: Data aggregation with MongoDB
5. Aggregation Framework: Data aggregation done the MongoDB way
6. Replication: High Availability with MongoDB
7. Sharding: Scaling with MongoDB
Cosmos DB is Microsoft's flagship Serverless database service in the Azure cloud. This slide-deck, presented at the Nashville Azure Meetup event on 09/20/2018 covers the why and what of Cosmos DB was is meant to be a good segue into further detailed and advanced topics. The slide-deck presents 3 use-cases for using Cosmos DB in E-Commerce, Healthcare, and IoT. Stay Tuned!
MongoDB Munich 2012: MongoDB for official documents in BavariaMongoDB
Christian Brensing, Senior Developer, State of Bavaria
The Bavarian government runs a document template application (RTF or ODF with Groovy, Python, Ruby or Tcl as scripting language) serving different government offices. Having complex and hierarchical data structures to organize the templates, MongoDB was selected to replace the Oracle-based persistence layer. In this talk you will hear about the improvements we have achieved with the migration to MongoDB, problems we had to solve underway and unit testing of the persistence layer in order to keep our quality level.
Replacing Oracle with MongoDB for a templating application at the Bavarian go...Comsysto Reply GmbH
Bavarian government runs a document template application (RTF or ODF with Groovy, Python, Ruby or Tcl as scripting language) serving different government offices. Having complex and hierarchical data structures to organize the templates, MongoDB was selected to replace the Oracle-based persistence layer. This presentation is about the improvements they have achieved with the migration to MongoDB, problems they had to solve underway and unit testing of the persistence layer in order to keep their quality level. Presentation slides by Christian Brensing, Senior Developer at Rechenzentrum Süd, shown at Munich MongoDB User Group Meetup on 18th September 2012
Spatial Data is very important for the new applications, related with Data Visualization and BI. Microsoft Azure offers possibility to use advantages of spatial data suing cloud computing. In this lecture will talk about the use of spatial data in the Microsoft Azure - loading data from Windows Azure SQL Database Spatial, optimizing Windows Azure applications and their use of different types of customers: WEB based, WPF, WP. We will learn how to import spatial data in different formats in Microsoft Azure SQL Database Spatial and will create a several demo applications, that use this data. We will also discuss the specifics, when you need to create and deploy claus applications like Azure Web Sites, Azure Cloud Services using spatial data.
Node.js is an exciting platform that has been increasing in popularity for the past few years. It offers a server side JavaScript programming model ideal for building highly scalable and performant network applications whether on premise or in the cloud. In this talk we will take a look at various options for building and deploying Node applications on Microsoft Azure. This session is how to useAzure SQL Databse and Azure Storage with Node.js
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
6. What is Azure Cosmos DB
Global Distribution
Worldwide presence
Automatic multi-region replication
Multi-homing APIs
Manual and automatic failovers
7. What is Azure Cosmos DB
Five Consistency Models
Helps navigate Brewer's CAP theorem
Intuitive Programming
• Tunable well-defined consistency levels
• Override on per-request basis
Clear PACELC tradeoffs
• Partition – Availability vs Consistency
• Else – Latency vs Consistency
8. What is Azure Cosmos DB
Comprehensive SLAs
99.99% availability
Durable quorum committed writes
Latency, consistency, and throughput also covered by
financially backed SLAs
Made possible with highly-redundant architecture
Operation
type Single region
Multi-region (single
region writes)
Multi-region (multi-
region writes)
Writes 99.99 99.99 99.999
Reads 99.99 99.999 99.999
11. Cosmos DB Data Formats
• The query syntax is geared at navigating
graphs – you could say e.g. .has(‘person’, ‘name’,
‘Thomas’).outE(‘Knows’) to find people who
Thomas knows.
25. Cosmos DB Resources
• Invite in an existing azure account
• Allows you to set permissions on
each concept of the database
26. Cosmos DB Resources
• Authorization token
• Associated with a user
• Grants access to a given
resource
27. Cosmos DB Resources
• Most like a “table”
• Structure is not defined
• Dynamic shapes based on
what
you put in it
28. Cosmos DB Resources
• A blob of JSON representing your data
• Can be a deeply nested shape
• No specialty types
• No specific encoding types
29. Cosmos DB Resources
• Think media – at the
document level!
The maximum size for a document
and it's attachment in CosmosDB now is 2 MB.
In DocumentDB the maximum size of document
and it’s attachment was 512KB
Media size – 2GB
30. Cosmos DB Resources
• Written in JavaScript!
• Is transactional
• Executed by the database engine
• Can live in the store
• Can be sent over the wire
31. Cosmos DB Resources
• Can be Pre or Post (before or after)
• Can operate on the following actions
• Create
• Replace
• Delete
• All
• Also written in javascript!
+ Azure
Functions
32. Cosmos DB Resources
• Can only be ran on a query
• Modifies the result of a
given query
• mathSqrt()
34. Cosmos DB Resources
JSON array {"exports": [{"city": “Moscow"}, {"city": Athens"}]} correspond to the
paths /"exports"/0/"city"/"Moscow" and /"exports"/1/"city"/"Athens".
38. Cosmos DB Resources
Property User settable or system generated? Purpose
_rid System generated System generated, unique and hierarchical
identifier of the resource.
_etag System generated etag of the resource required for
optimistic concurrency control.
_ts System generated Last updated timestamp of the resource.
_self System generated Unique addressable URI of the resource.
id User settable User defined unique name of the
resource.
39. Cosmos DB Resources
Value of the _self Description
/dbs Feed of databases under a database account.
/dbs/{_rid-db} Database with the unique id property with the value {_rid-db}.
/dbs/{_rid-db}/colls/ Feed of collections under a database.
/dbs/{_rid-db}/colls/{_rid-coll} Collection with the unique id property with the value {_rid-coll}.
/dbs/{_rid-db}/users/ Feed of users under a database.
/dbs/{_rid-db}/users/{_rid-user} User with the unique id property with the value {_rid-user}.
/dbs/{_rid-db}/users/{_rid-user}/permissions Feed of permissions under a database.
/dbs/{_rid-db}/users/{_rid-user}/permissions/{_rid-permission} Permission with the unique id property with the value {_rid-permission}.
41. Request Units
Request Units (RUs) is a rate-based currency – e.g. 1000 RU/second
Abstracts physical resources for performing requests
% IOPS% CPU% Memory
43. Request Units
Each request consumes # of RU
Approx. 1 RU = 1 read of 1 KB document
Approx. 5 RU = 1 write of a 1KB document
Query: Depends on query & documents involved
GET
POST
PUT
Query
…
=
=
=
=
44. Request Units- Provisioned throughput
Provisioned in terms of RU/sec – e.g. 1000 RU/s
Billed for highest RU/s in 1 hour
Easy to increase and decrease on demand
Rate limiting based on amount of throughput provisioned
Background processes like TTL expiration, index
transformations scheduled when quiescent
Min RU/sec
Max
RU/sec
IncomingRequests
No rate limiting,
process background
operations
Rate limiting –
SDK retry
No rate limiting
45. Cosmos DB Resources
Partitioned
collection
Single partition
collection (only via
SDK v.2) S1 S2 S3
Maximum
throughput
Unlimited 10K RU/s 250 RU/s 1 K RU/s 2.5 K RU/s
Minimum
throughput
2.5K
400 RU/s
400 RU/s 250 RU/s 1 K RU/s 2.5 K RU/s
Maximum storage Unlimited 10 GB 10 GB 10 GB 10 GB
Price Throughput: $6 /
100 RU/s
Storage: $0.25/GB
Throughput: $6 /
100 RU/s
Storage: $0.25/GB
$25 USD $50 USD $100 USD
53. Partitioning
Logical partition: Stores all data associated with the same partition
key value
Physical partition: Fixed amount of reserved SSD-backed storage +
compute.
Cosmos DB distributes logical partitions among a smaller number of
physical partitions.
From user’s perspective: define 1 partition key per container
54. Containers support unlimited storage by dynamically
allocating additional physical partitions
Storage for single partition key value (logical partition)
is quota'ed to 10GB.
When a partition key reaches its provisioned storage
limit, requests to create new resources will return a
HTTP Status Code of 403 (Forbidden).
Azure Cosmos DB will automatically add partitions, and
may also return a 403 if:
• An authorization token has expired
• A programmatic element (UDF, Stored Procedure,
Trigger) has been flagged for repeated violations
Partition Key Storage Limits
HTTP 403
66. Querying Cosmos DB SQL API
var createDocumentStoredProc = {
id: "createCustomDocument",
body: function createCustomDocument(documentToCreate) {
var context = getContext();
var collection = context.getCollection();
var accepted = collection.createDocument(collection.getSelfLink(),
documentToCreate,
function (err, documentCreated) {
if (err) throw new Error('Error' + err.message);
context.getResponse().setBody(documentCreated.id)
});
if (!accepted) return;
}
}
67. Querying Cosmos DB SQL API
var result =
client.ExecuteStoredProcedureAsync(
createdStoredProcedure._self);
74. Cosmos DB: Table API
Azure Table Storage
Azure Cosmos DB: Table storage
(preview)
Latency Fast, but no upper bounds on latency Single-digit millisecond latency for reads and
writes, backed with <10-ms latency reads
and <15-ms latency writes at the 99th
percentile, at any scale, anywhere in the
world
Throughput variable throughput model. Tables have a
scalability limit of 20,000 operations/s
Highly scalable with dedicated reserved
throughput per table, that is backed by
SLAs. Accounts have no upper limit on
throughput, and support >10 million
operations/s per table
Global Distribution Single region with one optional readable
secondary read region for HA. You cannot
initiate failover
Turn-key global distribution from one to 30+
regions, Support for automatic and manual
failovers at any time, anywhere in the world
Indexing Only primary index on PartitionKey and
RowKey. No secondary indexes
Automatic and complete indexing on all
properties, no index management
75. Cosmos DB: Table API
Azure Table Storage
Azure Cosmos DB: Table storage
(preview)
Query Query execution uses index for primary
key, and scans otherwise.
Queries can take advantage of
automatic indexing on properties for
fast query times. Azure Cosmos DB's
database engine is capable of
supporting aggregates, geo-spatial, and
sorting.
Consistency Strong within primary region, Eventual
with secondary region
Five well-defined consistency levels to
trade off availability, latency,
throughput, and consistency based on
your application needs
Pricing Storage-optimized Throughput-optimized
SLAs 99.9% availability 99.99% availability within a single
region, and ability to add more regions
for higher availability. Industry-leading
comprehensive SLAs on general
availability
86. Three different ways to use the Change Feed
Implementation Use Case Advantages
Azure Functions
Serverless
applications
Easy to implement.
Used as a trigger, input or output binding to an Azure
Function.
Change Feed
Processor Library
Distributed
applications
Ability to distribute the processing of events towards
multiple clients. Requires a “leases collection”.
SQL API SDK for
.NET or Java
Not
recommended
Requires manual implementation in a .NET or Java
application.
100. Billing Model
Two components: Consumed Storage + Provisioned Throughput
You are billed on consumed storage and provisioned throughput
Containers in a database can share throughput
Unit Price (for most Azure regions)
SSD Storage (per GB) $0.25 per month
Provisioned Throughput (single region
writes)
$0.008/hour per 100 RU/s
Provisioned Throughput (multi-region
writes)
$0.016/hour per 100 multi-region write
RU/s
* pricing may vary by region; for up-to-date pricing, see: https://azure.microsoft.com/pricing/details/cosmos-
101. Billing Model
Automatically configuring provisioned throughput with Autopilot Preview
You are billed on consumed storage and provisioned throughput
Containers in a database can share throughput
Autopilot Throughput – Unit (100 RU/s
per hour)
Price
100 Autopilot RU/s, single-region account $0.012/hour
100 Autopilot RU/s, multi-region, single
master account with N regions
N regions x $0.012/hour, where N > 1
100 RU/s multi-region, multi-master
account with N regions
N regions x $0.016/hour, where N > 1
* pricing may vary by region; for up-to-date pricing, see: https://azure.microsoft.com/pricing/details/cosmos-
102. Billing Model
Reserved capacity for provisioned throughput
* pricing may vary by region; for up-to-date pricing, see: https://azure.microsoft.com/pricing/details/cosmos-
1 YEAR RESERVATION 3 YEAR RESERVATION
THROUGHPUT SINGLE REGION WRITE MULTIPLE REGION WRITE SINGLE REGION WRITE MULTIPLE REGION WRITE
PRICE/SAVINGS PRICE PER 100 RU/S
(SAVINGS OVER PAYG)
PRICE PER 100 RU/S
(SAVINGS OVER PAYG)
PRICE PER 100 RU/S
(SAVINGS OVER PAYG)
PRICE PER 100 RU/S
(SAVINGS OVER PAYG)
First 50K RU/s $0.0068 (~15%) $0.0128 (~20%) $0.006 (~25%) $0.0112 (~30%)
Next 450K RU/s $0.006 (~25%) $0.0112 (~30%) $0.0052 (~35%) $0.0096 (~40%)
Next 2.5M RU/s $0.0056 (~30%) $0.0104 (~35%) $0.0044 (~45%) $0.008 (~50%)
Over 3M RU/s $0.0044 (~45%) $0.008 (~50%) $0.0032 (~60%) $0.0056 (~65%)
103. Billing Model
Free Cosmos DB Tier
Azure Cosmos DB Free Tier. Develop and test applications, or
run small production workloads free within the Azure
environment.
Get Started: Enable Free Tier on a new account to receive 400
RU/s throughput and 5 GBs storage free each month for the life
of your account.
* pricing may vary by region; for up-to-date pricing, see: https://azure.microsoft.com/pricing/details/cosmos-