This document discusses how to scale relational databases by sharding. Sharding involves partitioning or splitting large databases across multiple machines to improve performance and allow for unlimited scaling. The key aspects of sharding covered are:
- Sharding partitions data into independent chunks that can be placed on separate hardware resources. Each partition functions as its own database.
- Various sharding techniques exist like sharding by a hash of the primary key, by ranges of primary keys, or by other columns. The choice depends on query patterns.
- Sharding allows scaling simply by adding more hardware resources without replacing existing systems. It provides high availability since the failure of one shard does not impact the entire database.
Scalable Data Models with ElasticsearchBeyondTrees
At bol.com, a leading ecommerce platform in The Netherlands, we have done extensive research into what it would take to use ElasticSearch as the main search provider. We will explain the specific challenges and requirements of running an Elasticsearch cluster at bol.com-scale, and show how we have used generated data to do performance and scalability tests on different ways to model a hierarchical data model into Elasticsearch. We will describe the benefits and drawbacks of the different data model options, and their consequences for the design of the index and search applications.
Scalable Data Models with ElasticsearchBeyondTrees
At bol.com, a leading ecommerce platform in The Netherlands, we have done extensive research into what it would take to use ElasticSearch as the main search provider. We will explain the specific challenges and requirements of running an Elasticsearch cluster at bol.com-scale, and show how we have used generated data to do performance and scalability tests on different ways to model a hierarchical data model into Elasticsearch. We will describe the benefits and drawbacks of the different data model options, and their consequences for the design of the index and search applications.
ElasticSearch in Production: lessons learnedBeyondTrees
With Proquest Udini, we have created the worlds largest online article store, and aim to be the center for researchers all over the world. We connect to a 700M solr cluster for search, but have recently also implemented a search component with ElasticSearch. We will discuss how we did this, and how we want to use the 30M index for scientific citation recognition. We will highlight lessons learned in integrating ElasticSearch in our virtualized EC2 environments, and challenges aligning with our continuous deployment processes.
Elasticsearch is a powerful, distributed, open source searching technology. By integrating Elasticsearch into your application, you instantly provide a way to search a lot of data very quickly. Elasticsearch has a RESTful API, it scales, its super fast, you can use plugins to customize it, and much more. In this talk I go over the basics of setting up Elasticsearch, creating a search index, importing your data, and doing some basic searching. I also touch on a few advanced topics that will show the flexibility of this awesome service.
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
Introduction to Elastic Search
Elastic Search Terminology
Index, Type, Document, Field
Comparison with Relational Database
Understanding of Elastic architecture
Clusters, Nodes, Shards & Replicas
Search
How it works?
Inverted Index
Installation & Configuration
Setup & Run Elastic Server
Elastic in Action
Indexing, Querying & Deleting
Elasticsearch as a search alternative to a relational databaseKristijan Duvnjak
The volume of data that we are working with is growing every day, the size of data is pushing us to find new intelligent solutions for problem’s put in front of us. Elasticsearch server has proved it self as an excellent full text search solution for big volume’s of data.
A brief introduction to Elasticsearch and the many possibilities Elasticsearch offers in terms of search, data exploration and data aggregation. The presentation includes a brief introduction to search engine fundamentals and core features of Elasticsearch. The talk focuses on how we can navigate structured and unstructured data for search as well as aggregating and visualizing data for analytical purposes.
The talk aims to demonstrate case studies beyond traditional full-text-search, and hopefully show that Elasticsearch can help us build so much more than just a search engine.
How Solr Search Works - A tech Talk at Atlogys Delhi Office by our Senior Technologist Rajat Jain. The lecture takes a deep dive into Solr - what it is, how it works, what it does and its inbuilt architecture. A wonderful technical session with many live examples, a sneak peak into solr code and config files and a live demo. Part of Atlogys Academy Series.
Initiallly presented at Scala User Group in SLC, UT on 3 Sep 2014.
Anorm is part of the Typesafe Play! framework stack. It has nice features, but is lacking performance. Relate is a new library, inspired by Anorm, whose performance is closer to the underlying JDBC library. This makes it a better candidate for enterprise and time sensitive workloads.
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
Slides from my talk during ApacheCon EU 2012 - "Battle of the giants: Apache Solr vs ElasticSearch". Video available at http://player.vimeo.com/video/55645629
Eclipse JNoSQL: The Definitive Solution for Java and NoSQL DatabasesWerner Keil
JNoSQL is a framework and collection of tools that make integration between Java applications and NoSQL quick and easy—for developers as well as vendors. The API is easy to implement, so NoSQL vendors can quickly implement, test, and become compliant by themselves. And with its low learning curve and just a minimal set of artifacts, Java developers can start coding by worrying not about the complexity of specific NoSQL databases but only their core aspects (such as graph or document properties). Built with functional programming in mind, it leverages all the features of Java 8. This session covers how the API is structured, how it relates to the multiple NoSQL database types, and how you can get started and involved in this open source technology.
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...NoSQLmatters
Simon Elliston Ball – When to NoSQL and When to Know SQL
With NoSQL, NewSQL and plain old SQL, there are so many tools around it’s not always clear which is the right one for the job.This is a look at a series of NoSQL technologies, comparing them against traditional SQL technology. I’ll compare real use cases and show how they are solved with both NoSQL options, and traditional SQL servers, and then see who wins. We’ll look at some code and architecture examples that fit a variety of NoSQL techniques, and some where SQL is a better answer. We’ll see some big data problems, little data problems, and a bunch of new and old database technologies to find whatever it takes to solve the problem.By the end you’ll hopefully know more NoSQL, and maybe even have a few new tricks with SQL, and what’s more how to choose the right tool for the job.
To scale or not to scale: Key/Value, Document, SQL, JPA – What’s right for my...Uri Cohen
This presentation will focuses on the various data and querying models available in today’s distributed data stores landscape. It reviews what models and APIs are available and discusses the capabilities each of them provides, the applicable use cases and what it means for your application’s performance and scalability.
ElasticSearch in Production: lessons learnedBeyondTrees
With Proquest Udini, we have created the worlds largest online article store, and aim to be the center for researchers all over the world. We connect to a 700M solr cluster for search, but have recently also implemented a search component with ElasticSearch. We will discuss how we did this, and how we want to use the 30M index for scientific citation recognition. We will highlight lessons learned in integrating ElasticSearch in our virtualized EC2 environments, and challenges aligning with our continuous deployment processes.
Elasticsearch is a powerful, distributed, open source searching technology. By integrating Elasticsearch into your application, you instantly provide a way to search a lot of data very quickly. Elasticsearch has a RESTful API, it scales, its super fast, you can use plugins to customize it, and much more. In this talk I go over the basics of setting up Elasticsearch, creating a search index, importing your data, and doing some basic searching. I also touch on a few advanced topics that will show the flexibility of this awesome service.
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
Introduction to Elastic Search
Elastic Search Terminology
Index, Type, Document, Field
Comparison with Relational Database
Understanding of Elastic architecture
Clusters, Nodes, Shards & Replicas
Search
How it works?
Inverted Index
Installation & Configuration
Setup & Run Elastic Server
Elastic in Action
Indexing, Querying & Deleting
Elasticsearch as a search alternative to a relational databaseKristijan Duvnjak
The volume of data that we are working with is growing every day, the size of data is pushing us to find new intelligent solutions for problem’s put in front of us. Elasticsearch server has proved it self as an excellent full text search solution for big volume’s of data.
A brief introduction to Elasticsearch and the many possibilities Elasticsearch offers in terms of search, data exploration and data aggregation. The presentation includes a brief introduction to search engine fundamentals and core features of Elasticsearch. The talk focuses on how we can navigate structured and unstructured data for search as well as aggregating and visualizing data for analytical purposes.
The talk aims to demonstrate case studies beyond traditional full-text-search, and hopefully show that Elasticsearch can help us build so much more than just a search engine.
How Solr Search Works - A tech Talk at Atlogys Delhi Office by our Senior Technologist Rajat Jain. The lecture takes a deep dive into Solr - what it is, how it works, what it does and its inbuilt architecture. A wonderful technical session with many live examples, a sneak peak into solr code and config files and a live demo. Part of Atlogys Academy Series.
Initiallly presented at Scala User Group in SLC, UT on 3 Sep 2014.
Anorm is part of the Typesafe Play! framework stack. It has nice features, but is lacking performance. Relate is a new library, inspired by Anorm, whose performance is closer to the underlying JDBC library. This makes it a better candidate for enterprise and time sensitive workloads.
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
Slides from my talk during ApacheCon EU 2012 - "Battle of the giants: Apache Solr vs ElasticSearch". Video available at http://player.vimeo.com/video/55645629
Eclipse JNoSQL: The Definitive Solution for Java and NoSQL DatabasesWerner Keil
JNoSQL is a framework and collection of tools that make integration between Java applications and NoSQL quick and easy—for developers as well as vendors. The API is easy to implement, so NoSQL vendors can quickly implement, test, and become compliant by themselves. And with its low learning curve and just a minimal set of artifacts, Java developers can start coding by worrying not about the complexity of specific NoSQL databases but only their core aspects (such as graph or document properties). Built with functional programming in mind, it leverages all the features of Java 8. This session covers how the API is structured, how it relates to the multiple NoSQL database types, and how you can get started and involved in this open source technology.
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...NoSQLmatters
Simon Elliston Ball – When to NoSQL and When to Know SQL
With NoSQL, NewSQL and plain old SQL, there are so many tools around it’s not always clear which is the right one for the job.This is a look at a series of NoSQL technologies, comparing them against traditional SQL technology. I’ll compare real use cases and show how they are solved with both NoSQL options, and traditional SQL servers, and then see who wins. We’ll look at some code and architecture examples that fit a variety of NoSQL techniques, and some where SQL is a better answer. We’ll see some big data problems, little data problems, and a bunch of new and old database technologies to find whatever it takes to solve the problem.By the end you’ll hopefully know more NoSQL, and maybe even have a few new tricks with SQL, and what’s more how to choose the right tool for the job.
To scale or not to scale: Key/Value, Document, SQL, JPA – What’s right for my...Uri Cohen
This presentation will focuses on the various data and querying models available in today’s distributed data stores landscape. It reviews what models and APIs are available and discusses the capabilities each of them provides, the applicable use cases and what it means for your application’s performance and scalability.
Hekaton is the original project name for In-Memory OLTP and just sounds cooler for a title name. Keeping up the tradition of deep technical “Inside” sessions at PASS, this half-day talk will take you behind the scenes and under the covers on how the In-Memory OLTP functionality works with SQL Server.
We will cover “everything Hekaton”, including how it is integrated with the SQL Server Engine Architecture. We will explore how data is stored in memory and on disk, how I/O works, how native complied procedures are built and executed. We will also look at how Hekaton integrates with the rest of the engine, including Backup, Restore, Recovery, High-Availability, Transaction Logging, and Troubleshooting.
Demos are a must for a half-day session like this and what would an inside session be if we didn’t bring out the Windows Debugger. As with previous “Inside…” talks I’ve presented at PASS, this session is level 500 and not for the faint of heart. So read through the docs on In-Memory OLTP and bring some extra pain reliever as we move fast and go deep.
This session will appear as two sessions in the program guide but is not a Part I and II. It is one complete session with a small break so you should plan to attend it all to get the maximum benefit.
Agenda:
MongoDB Overview/History
Workshop
1. How to perform operations to MongoDB – Workshop
2. Using MongoDB in your Java application
Advance usage of MongoDB
1. Performance measurement comparison – real life use cases
3. Doing Cluster setup
4. Cons of MongoDB with other document oriented DB
5. Map-reduce/ Aggregation overview
Workshop prerequisite
1. All participants must bring their laptops.
2. https://github.com/geek007/mongdb-examples
3. Software prerequisite
a. Java version 1.6+
b. Your favorite IDE, Preferred http://www.jetbrains.com/idea/download/
c. MongoDB server version – 2.6.3 (http://www.mongodb.org/downloads - 64 bit version)
d. Participants can install MongoDB client – http://robomongo.org/
About Speaker:
Akbar Gadhiya is working with Ishi Systems as Programmer Analyst. Previously he worked with PMC, Baroda and HCL Technologies.
MongoDB's architecture features built-in support for horizontal scalability, and high availability through replica sets. Auto-sharding allows users to easily distribute data across many nodes. Replica sets enable automatic failover and recovery of database nodes within or across data centers. This session will provide an introduction to scaling with MongoDB by one of MongoDB's early adopters.
ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...Saurabh Nanda
Slides for our talk given at Functional Conf 2017. Shared our experience of putting 34,000 lines of Haskell code in production at Vacation Labs. Please ping me on https://twitter.com/saurabhnanda if you'd like help with deploying Haskell in an industrial setting.
Xlab #1: Advantages of functional programming in Java 8XSolve
Presentation from xlab workshop about functional programming components introduced to the Java 8. How to operate the streams and lambdas in theory and practice.
Atlas Search combines the power of Apache Lucene - the technology behind the world’s most popular search engines - with the developer productivity, scale, and resilience of MongoDB Atlas to make it easier than ever to integrate fast, relevance-based search capabilities into all of your MongoDB applications.
Watch the Getting Started with MongoDB Atlas Search webinar where, with a few clicks and keystrokes, we unravel the mystery behind the search bar. The session searches through different data types, including text, numbers, dates, and geoJSON while exploring a variety of search capabilities.
MYSQL Query Anti-Patterns That Can Be Moved to SphinxPythian
PalominoDB European Team lead, Vladimir Fedorkov will be discussing how to handle query bottlenecks that can result from increases in dataset and traffic
Slides to the Hands On Spring Data lab, presented in Paris on Dec 10th, 2012. Code exercises are here: https://github.com/ericbottard/hands-on-spring-data
A hands-on introduction to the ELF Object file formatrety61
In our 6th semester we developed miASMa - a 2 pass Macro Assembler for an x86 machine. miASMa generates Relocatable Object Files that conforming to the ELF Format.
From qconsf 2010 - this presentation focuses on how the classic querying models like plain SQL and JPA map to distributed data stores. It first reviews the current distributed data stores landscape and its querying models, and then discuss the wide range of APIs for data extraction from these data stores. It then discusses the main challenges of mapping various APIs to a distributed data model and the trade offs to be aware off.
Turning a Search Engine into a Relational DatabaseMatthias Wahl
About the How and Why of taking Lucene and Elasticsearch and turning it into a Relational Database.
Talk I gave at Search User Group Berlin September Meetup http://www.meetup.com/de/Search-UG-Berlin/events/224765731/
Visualizing ORACLE performance data with R @ #C16LVMaxym Kharchenko
A picture is worth a thousand words.
This is especially true during performance problems investigations where a well done graph of the issue can often cut resolution time from days to mere minutes.
ORACLE database provides a wealth of performance information, but unfortunately only a small part of it is currently visualized by standard tools, such as Enterprise Manager.
Enter R: a well known (and free) statistical analysis and graphing framework that can create relevant and interesting visualizations on pretty much any data.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
29. Let’s “shard” a simple table
CREATE TABLE books (
id number PRIMARY KEY,
title varchar2(200),
author varchar2(200)
);
30. CREATE TABLE books (
id number PRIMARY KEY,
title varchar2(200),
author varchar2(200)
) SHARD BY <method> (<shard_key>) (
SPLIT SIZE evenly
SPLIT LOAD evenly
PREFER SINGLE SHARD ACCESS
DISCOURAGE DATA MOVE
USING <N> DATABASES
);
Let’s “shard” a simple table
Not a “real”
ORACLE command
(yet)
31. Hey, let’s shard it by “name” range
SHARD BY LIST (first_letter(author))
(
…
SPLIT SIZE evenly
);
A-G
H-M N-T
U-Z
32. Hey, let’s shard it by “id” range
SHARD BY RANGE (id) (
…
SPLIT LOAD evenly
);
1-100 101-200 201-300 301-400
33. Hashes are your friend
SHARD BY HASH (id) (
SPLIT SIZE evenly
SPLIT LOAD evenly
);
34. But (especially for OLTP)
be sure to chose the right hash column
SHARD BY HASH (id) (
PREFER SINGLE SHARD ACCESS
);
SELECT title FROM books
WHERE id = 34567876;
35. But (especially for OLTP)
be sure to chose the right hash column
SHARD BY HASH (id) (
PREFER SINGLE SHARD ACCESS
);
SELECT title FROM books
WHERE author = 'Isaac Asimov'
ORDER BY title;
36. SHARD BY HASH (author) (
PREFER SINGLE SHARD ACCESS
);
0 1 2 3
SELECT title FROM books
WHERE author = 'Isaac Asimov'
ORDER BY title;
But (especially for OLTP)
be sure to chose the right hash column
37. Think about eventual re-sharding
SHARD BY hash(author) (
DISCOURAGE DATA MOVE
USING 4 DATABASES
);
0 1 2 3
38. Think about eventual re-sharding
SHARD BY mod(hash(author), 4) (
DISCOURAGE DATA MOVE
);
0 1 2 3
54. Why shards are awesome
• (potentially) Unlimited scaling
– 100s or 1000s of shards “in range”
• Once routed in, “it’s pure ORACLE”:
– Transactions, ACID, foreign keys etc
• Better maintenance:
– Smaller data, smaller load
• Eggs not in one basket:
– Even if a shard is down, “most of the system” is still up
• “Apples to apples comparison” with other shards
55. Why shards are NOT so great
• More systems
– Power, rack space etc
– Needs automation … bad
– More likely to fail overall
• Some operations become difficult:
– Transactions across shards
– Foreign keys across shards
• More work:
– Applications, developers, DBAs
– High skill, DIY everything