- Data modeling for NoSQL databases is different than relational databases and requires designing the data model around access patterns rather than object structure. Key differences include not having joins so data needs to be duplicated and modeling the data in a way that works for querying, indexing, and retrieval speed.
- The data model should focus on making the most of features like atomic updates, inner indexes, and unique identifiers. It's also important to consider how data will be added, modified, and retrieved factoring in object complexity, marshalling/unmarshalling costs, and index maintenance.
- The _id field can be tailored to the access patterns, such as using dates for time-series data to keep recent
“not only SQL.”
NoSQL databases are databases store data in a format other than relational tables.
NoSQL databases or non-relational databases don’t store relationship data well.
Understanding the architecture of MariaDB ColumnStoreMariaDB plc
MariaDB ColumnStore extends MariaDB Server, a relational database for transaction processing, with distributed columnar storage and parallel query processing for scalable, high-performance analytical processing. This session helps MariaDB users understand how MariaDB ColumnStore works and why it’s needed for more demanding analytical workloads, and covers:
Use cases
Query processing
Bulk data insertion
Distributed partitions
Query optimization
“not only SQL.”
NoSQL databases are databases store data in a format other than relational tables.
NoSQL databases or non-relational databases don’t store relationship data well.
Understanding the architecture of MariaDB ColumnStoreMariaDB plc
MariaDB ColumnStore extends MariaDB Server, a relational database for transaction processing, with distributed columnar storage and parallel query processing for scalable, high-performance analytical processing. This session helps MariaDB users understand how MariaDB ColumnStore works and why it’s needed for more demanding analytical workloads, and covers:
Use cases
Query processing
Bulk data insertion
Distributed partitions
Query optimization
Modeling Data and Queries for Wide Column NoSQLScyllaDB
Discover how to model data for wide column databases such as ScyllaDB and Apache Cassandra. Contrast the differerence from traditional RDBMS data modeling, going from a normalized “schema first” design to a denormalized “query first” design. Plus how to use advanced features like secondary indexes and materialized views to use the same base table to get the answers you need.
Every business today wants to leverage data to drive strategic initiatives with machine learning, data science and analytics — but runs into challenges from siloed teams, proprietary technologies and unreliable data.
That’s why enterprises are turning to the lakehouse because it offers a single platform to unify all your data, analytics and AI workloads.
Join our How to Build a Lakehouse technical training, where we’ll explore how to use Apache SparkTM, Delta Lake, and other open source technologies to build a better lakehouse. This virtual session will include concepts, architectures and demos.
Here’s what you’ll learn in this 2-hour session:
How Delta Lake combines the best of data warehouses and data lakes for improved data reliability, performance and security
How to use Apache Spark and Delta Lake to perform ETL processing, manage late-arriving data, and repair corrupted data directly on your lakehouse
SQL vs NoSQL, an experiment with MongoDBMarco Segato
A simple experiment with MongoDB compared to Oracle classic RDBMS database: what are NoSQL databases, when to use them, why to choose MongoDB and how we can play with it.
This presentation explains the major differences between SQL and NoSQL databases in terms of Scalability, Flexibility and Performance. It also talks about MongoDB which is a document-based NoSQL database and explains the database strutre for my mouse-human research classifier project.
Materialized Column: An Efficient Way to Optimize Queries on Nested ColumnsDatabricks
In data warehouse area, it is common to use one or more columns in complex type, such as map, and put many subfields into it. It may impact the query performance dramatically because: 1) It is a waste of IO. The whole column (in map), which may contain tens of subfields, need to be read. And Spark will traverse the whole map and get the value of the target key. 2) Vectorized read can not be exploit when nested type column is read. 3) Filter pushdown can not be utilized when nested columns is read. Over the last year, we have added a series of optimizations in Apache Spark to solve the above problems for Parquet.
This presentation contains the introduction to NOSQL databases, it's types with examples, differentiation with 40 year old relational database management system, it's usage, why and we should use it.
Modeling Data and Queries for Wide Column NoSQLScyllaDB
Discover how to model data for wide column databases such as ScyllaDB and Apache Cassandra. Contrast the differerence from traditional RDBMS data modeling, going from a normalized “schema first” design to a denormalized “query first” design. Plus how to use advanced features like secondary indexes and materialized views to use the same base table to get the answers you need.
Every business today wants to leverage data to drive strategic initiatives with machine learning, data science and analytics — but runs into challenges from siloed teams, proprietary technologies and unreliable data.
That’s why enterprises are turning to the lakehouse because it offers a single platform to unify all your data, analytics and AI workloads.
Join our How to Build a Lakehouse technical training, where we’ll explore how to use Apache SparkTM, Delta Lake, and other open source technologies to build a better lakehouse. This virtual session will include concepts, architectures and demos.
Here’s what you’ll learn in this 2-hour session:
How Delta Lake combines the best of data warehouses and data lakes for improved data reliability, performance and security
How to use Apache Spark and Delta Lake to perform ETL processing, manage late-arriving data, and repair corrupted data directly on your lakehouse
SQL vs NoSQL, an experiment with MongoDBMarco Segato
A simple experiment with MongoDB compared to Oracle classic RDBMS database: what are NoSQL databases, when to use them, why to choose MongoDB and how we can play with it.
This presentation explains the major differences between SQL and NoSQL databases in terms of Scalability, Flexibility and Performance. It also talks about MongoDB which is a document-based NoSQL database and explains the database strutre for my mouse-human research classifier project.
Materialized Column: An Efficient Way to Optimize Queries on Nested ColumnsDatabricks
In data warehouse area, it is common to use one or more columns in complex type, such as map, and put many subfields into it. It may impact the query performance dramatically because: 1) It is a waste of IO. The whole column (in map), which may contain tens of subfields, need to be read. And Spark will traverse the whole map and get the value of the target key. 2) Vectorized read can not be exploit when nested type column is read. 3) Filter pushdown can not be utilized when nested columns is read. Over the last year, we have added a series of optimizations in Apache Spark to solve the above problems for Parquet.
This presentation contains the introduction to NOSQL databases, it's types with examples, differentiation with 40 year old relational database management system, it's usage, why and we should use it.
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014NoSQLmatters
Sebastian Cohnen – Building a Startup with NoSQL
At StormForger we use several NoSQL systems to handle all kinds of different data. We have a lot of time series data based on the fact, that we do load testing and performance analysis of HTTP-based infrastructure and services. For time series data, we use InfluxDB. We also use several Redis instances for caching and storing structured data, that needs to be fast on read and write access. Lately we also started to integrate ArangoDB into our architecture, which is a perfect fit for storing and working with our complex test case definition data structures. In this talk I’d like to present how we build our startup on the foundation provided by several NoSQL databases, how we came to choose those systems and how we use them.
Wordnik's technical co-founder Tony Tam describes the reason for going NoSQL. During his talk Tony will discuss the selection criteria, testing + evaluation and successful, zero-downtime migration to MongoDB. Additionally details on Wordnik's speed and stability will be covered as well as how NoSQL technologies have changed the way Wordnik scales.
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Lucas Jellema
This presentation gives an brief overview of the history of relational databases, ACID and SQL and presents some of the key strentgths and potential weaknesses. It introduces the rise of NoSQL - why it arose, what is entails, when to use it. The presentation focuses on MongoDB as prime example of NoSQL document store and it shows how to interact with MongoDB from JavaScript (NodeJS) and Java.
PostgreSQL is a well-known relational database. But in the last few years, it has gained capabilities that previously belonged only to "NoSQL" databases. In this talk, I describe several of PostgreSQL that give it such capabilities.
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
Presented by Mark Miller, Software Engineer, Cloudera
As the NoSQL ecosystem looks to integrate great search, great search is naturally beginning to expose many NoSQL features. Will these Goliath's collide? Or will they remain specialized while intermingling – two sides of the same coin.
Come learn about where SolrCloud fits into the NoSQL landscape. What can it do? What will it do? And how will the big data, NoSQL, Search ecosystem evolve. If you are interested in Big Data, NoSQL, distributed systems, CAP theorem and other hype filled terms, than this talk may be for you.
Reuven Lerner's first talk from Open Ruby Day, at Hi-Tech College in Herzliya, Israel, on June 27th 2010. An overview of what makes Rails a powerful framework for Web development -- what attracted Reuven to it, what are the components that most speak to him, and why others should consider Rails for their Web applications.
Slides for the talk at AI in Production meetup:
https://www.meetup.com/LearnDataScience/events/255723555/
Abstract: Demystifying Data Engineering
With recent progress in the fields of big data analytics and machine learning, Data Engineering is an emerging discipline which is not well-defined and often poorly understood.
In this talk, we aim to explain Data Engineering, its role in Data Science, the difference between a Data Scientist and a Data Engineer, the role of a Data Engineer and common concepts as well as commonly misunderstood ones found in Data Engineering. Toward the end of the talk, we will examine a typical Data Analytics system architecture.
A Tasty deep-dive into Open API Specification LinksTony Tam
From the March APICraft meetup in San Francisco, we dive into the details of one of the newest features of the Open API Specification (fka Swagger Specification) called links.
While not intended as a replacement for Hypermedia, the OAS 3.0 Links feature provides design-time designation for rich traversals between operations
Presented at JavaOne 2016.
Using Swagger has become the most popular way to describe REST APIs across the web, enabling people to more quickly understand and communicate with services, with developer-friendly documentation and rich, autogenerated client SDKs. As the API has moved more into being one of the most important aspects of a service, the Swagger definition has become increasingly more important and essential to the design phase. This presentation explains how the Swagger definition can be used to streamline the iteration process and enable client and server engineers to develop concurrently with complex APIs.
Supporting slide deck for Tony Tam's presentation at I Love APIs 2015. Covers the new swagger project, Swagger Inflector, which allows an API-first definition for REST APIs.
Writer APIs in Java faster with Swagger InflectorTony Tam
Swagger provides a clean contract for your REST API. Swagger Inflector is a project which uses Swagger as the language of the API, automatically wiring REST endpoints directly to controllers in the Jersey 2.x framework. By doing so, the specification and code are always up to date, removing potentially error-prone redundant code and bringing development on the JDK up to speed with typeless languages.
Presentation by Tony Tam on using the Scalatra micro web framework with native support for Swagger. This gives the fastest possible server-to-mobile integration with Scala
Swagger APIs for Humans and Robots (Gluecon)Tony Tam
Presentation to Gluecon 2014 about Swagger for API development and adoption of services. Reverb also announced the Swagger 2.0 Working Group, with Apigee as a founding member
A deck on the practical reasons why Wordnik moved to the Scala programming language. Also covered is the Swagger REST API framework which is available at http://swagger.wordnik.com
A presentation on the selection criteria, testing + evaluation and successful, zero-downtime migration to MongoDB. Additionally details on Wordnik's speed and stability are covered as well as how NoSQL technologies have changed the way Wordnik scales.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
3. Why Modeling Matters
• NoSQL => no joins
• What replaces joins?
• Hierarchy
• Duplication of data
• Different models for querying, indexing
• Your optimal data model is (probably) very
different than with relational
• Simpler
• More like you develop
10. Hierarchy with NoSQL
• Write operations
• Atomic upsert (create, update or fail)
• Saves all levels of object atomically
• Reduces need for transactions
11. Hierarchy with NoSQL
• Write operations
• Atomic upsert (create, update or fail)
• Saves all levels of object atomically
• Reduces need for transactions
Convenienc
e not magic
All or
nothing
12. Unique Identifiers in your Data
• Relational design => PK/FK
• Often not “meaningful” identifiers for data
• User Data Model
13. Unique Identifiers in your Data
• Relational design => PK/FK
• Often not “meaningful” identifiers for data
Unique by
• User Data Model username
15. Data Duplication
• Without Joins, what about SQL lookup
tables?
• Duplication of data in NoSQL is required
• Trade storage for speed
16. Data Duplication
…Can
• Without Joins, what about SQL lookup
move logic
tables? to app
• Duplication of data in NoSQL is required
• Trade storage for speed
17. Data Duplication
• Many fields don’t change, ever
• But… many do
• New decisions for the developer!
• Often background updates
18. Data Duplication
How often
• Many fields don’t change, ever does this
• But… many do
change?
• New decisions for the developer!
• Often background updates
20. Reaching into Objects
• Incredible feature of MongoDB
• Dot syntax safely** traverses the object graph
21. Inner Indexes
• Convenience at a cost
• No index => table scan
• No value? => table scan
• No child value? => table scan
• Table scan with big collection?
• Can’t index everything!
96GB of
Indexes?
22. Inner Indexes
• This will should drive your Data Model
• Sparse Data test
Even with only
2000 non-
empty values!
23. Adding & Modifying
• Append in mongo is blazing fast
• “tail” of data is always in memory
• Pre-allocated data files
• Main expense is “index maintenance”
• Some marshalling/unmarshalling cost**
• Modifying? Object growth
• Pre-allocation of space built in collection design
24. Adding & Modifying
• Each object has allocated space
• Exceed that space, need to relocate object
• Leaves “hole” in collection
• Large increases to documents hurts your
overall performance
• Your data model should strive for equally-
sized objects as much as possible
25. Retrieval
• Many same rules apply as relational
• Indexes
• complex/inner or not
• Indexes in RAM? Yes
• Cardinality matters
• New(ish) considerations
• Complex hierarchy not free
• Marshalling unmarshalling
27. Marshalling & Unmarshalling
• All you can eat from your Data Model?
• Techniques have tremendous impact
• Development ease until it matters
• 50% speed bump with manual mapping
Only demand
what you can
consume!
28. Making the most of _id
• Indexes matter
• Tailor your _id to be meaningful by access
pattern
• It’s your first defense when auto-sharding
• Date-driven data?
• Monotonically _id value
• Ensures recent data is “hot”
29. Making the most of _id
• Other time-based data techniques
• Flexibility in querying
30. Making the most of _id
• Other time-based data techniques
• Flexibility in querying
Case-
sensitive
REGEX is
your pal
31. Making the most of _id
• Hot indexes are happy indexes
• Access should strive for right bias
• Random access with large indexes hit disk
1
7
1 2
5 7
32. Your Data Model
• NoSQL gets you started faster
• Many relational pain points are gone
• New considerations (easier?)
• Migration should be real effort
• Designed by access patterns over object
structure
• Don’t prematurely optimize, but know
where the knobs are