Outrageous ideas for Graph Databases
Almost every graph database vendor raised money in 2021. I am glad they did, because they are going to need the money. Our current Graph Databases are terrible and need a lot of work. There I said it. It's the ugly truth in our little niche industry. That's why despite waiting for over a decade for the "Year of the Graph" to come we still haven't set the world on fire. Graph databases can be painfully slow, they can't handle non-graph workloads, their APIs are clunky, their query languages are either hard to learn or hard to scale. Most graph projects require expert shepherding to succeed. 80% of the work takes 20% of the time, but that last 20% takes forever. The graph database vendors optimize for new users, not grizzly veterans. They optimize for sales not solutions. Come listen to a Rant by an industry OG on where we could go from here if we took the time to listen to the users that haven't given up on us yet.
"The boom, the bust, the adjust and the unknown"
The industry around us changes at a faster pace than ever before.
This will force the different stakeholders to reevaluate their strategy and how they will decide to move forward.
#Zer0Con2024
Scaling ScyllaDB Storage Engine with State-of-Art CompactionScyllaDB
Log Structured Merge (LSM) tree storage engines are known for very fast writes. This LSM tree structure is used by ScyllaDB to immutable Sorted Strings Tables (SSTables) on disk. These fast writes come with a tradeoff in terms of read and space amplification. While compaction processes can help mitigate this, the RUM conjecture states that only two amplification factors can be optimized at the extent of a third. Learn how ScyllaDB leverages RUM conjecture and controller theory, to deliver a state-of-art LSM-tree compaction for its users.
URP? Excuse You! The Three Kafka Metrics You Need to KnowTodd Palino
What do you really know about how to monitor a Kafka cluster for problems? Is your most reliable monitoring your users telling you there’s something broken? Are you capturing more metrics than the actual data being produced? Sure, we all know how to monitor disk and network, but when it comes to the state of the brokers, many of us are still unsure of which metrics we should be watching, and what their patterns mean for the state of the cluster. Kafka has hundreds of measurements, from the high-level numbers that are often meaningless to the per-partition metrics that stack up by the thousands as our data grows.
We will thoroughly explore three key monitoring concepts in the broker, that will leave you an expert in identifying problems with the least amount of pain:
Under-replicated Partitions: The mother of all metrics
Request Latencies: Why your users complain
Thread pool utilization: How could 80% be a problem?
We will also discuss the necessity of availability monitoring and how to use it to get a true picture of what your users see, before they come beating down your door!
MyRocks is an open source LSM based MySQL database, created by Facebook. This slides introduce MyRocks overview and how we deployed at Facebook, as of 2017.
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...Databricks
Working with our customers, developers and partners around the world, it's clear DevOps has become increasingly critical to a team's success. Continuous integration (CI) and continuous delivery (CD) which is part of DevOps, embody a culture, set of operating principles, and collection of practices that enable application development teams to deliver code changes more frequently and reliably. In this session, we will cover how you can automate your entire process from code commit to production using CI/CD pipelines in Azure DevOps for Azure Databricks applications. Using CI/CD practices, you can simplify, speed and improve your cloud development to deliver features to your customers as soon as they're ready.
Amazon AWS basics needed to run a Cassandra Cluster in AWSJean-Paul Azar
There is a lot of advice on how to configure a Cassandra cluster on AWS. Not every configuration meets every use case.
Best way to know how to deploy Cassandra on AWS is to know the basics of AWS. Part 1: We start covering AWS (as it applies to Cassandra). Later we go into detail with AWS Cassandra specifics.
"The boom, the bust, the adjust and the unknown"
The industry around us changes at a faster pace than ever before.
This will force the different stakeholders to reevaluate their strategy and how they will decide to move forward.
#Zer0Con2024
Scaling ScyllaDB Storage Engine with State-of-Art CompactionScyllaDB
Log Structured Merge (LSM) tree storage engines are known for very fast writes. This LSM tree structure is used by ScyllaDB to immutable Sorted Strings Tables (SSTables) on disk. These fast writes come with a tradeoff in terms of read and space amplification. While compaction processes can help mitigate this, the RUM conjecture states that only two amplification factors can be optimized at the extent of a third. Learn how ScyllaDB leverages RUM conjecture and controller theory, to deliver a state-of-art LSM-tree compaction for its users.
URP? Excuse You! The Three Kafka Metrics You Need to KnowTodd Palino
What do you really know about how to monitor a Kafka cluster for problems? Is your most reliable monitoring your users telling you there’s something broken? Are you capturing more metrics than the actual data being produced? Sure, we all know how to monitor disk and network, but when it comes to the state of the brokers, many of us are still unsure of which metrics we should be watching, and what their patterns mean for the state of the cluster. Kafka has hundreds of measurements, from the high-level numbers that are often meaningless to the per-partition metrics that stack up by the thousands as our data grows.
We will thoroughly explore three key monitoring concepts in the broker, that will leave you an expert in identifying problems with the least amount of pain:
Under-replicated Partitions: The mother of all metrics
Request Latencies: Why your users complain
Thread pool utilization: How could 80% be a problem?
We will also discuss the necessity of availability monitoring and how to use it to get a true picture of what your users see, before they come beating down your door!
MyRocks is an open source LSM based MySQL database, created by Facebook. This slides introduce MyRocks overview and how we deployed at Facebook, as of 2017.
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...Databricks
Working with our customers, developers and partners around the world, it's clear DevOps has become increasingly critical to a team's success. Continuous integration (CI) and continuous delivery (CD) which is part of DevOps, embody a culture, set of operating principles, and collection of practices that enable application development teams to deliver code changes more frequently and reliably. In this session, we will cover how you can automate your entire process from code commit to production using CI/CD pipelines in Azure DevOps for Azure Databricks applications. Using CI/CD practices, you can simplify, speed and improve your cloud development to deliver features to your customers as soon as they're ready.
Amazon AWS basics needed to run a Cassandra Cluster in AWSJean-Paul Azar
There is a lot of advice on how to configure a Cassandra cluster on AWS. Not every configuration meets every use case.
Best way to know how to deploy Cassandra on AWS is to know the basics of AWS. Part 1: We start covering AWS (as it applies to Cassandra). Later we go into detail with AWS Cassandra specifics.
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
NoSQL includes a wide range of different database technologies and were developed as a result of surging volume of data stored. Relational databases are not capable of coping with this huge volume and faces agility challenges. This is where NoSQL databases have come in to play and are popular because of their features. The session covers the following topics to help you choose the right NoSQL databases:
Traditional databases
Challenges with traditional databases
CAP Theorem
NoSQL to the rescue
A BASE system
Choose the right NoSQL database
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
Piotr Sarna, Software Engineer at Turso
Understanding and tapping your driver’s performance potential.
Piotr Sarna discusses how to get the most out of a driver, particularly from the performance perspective, and select a driver that’s a good fit for your needs.
ScyllaDB recently launched our Scylla Cloud database as a service, which combines the speed and power of the Scylla NoSQL database with the ease of a fully managed cloud service. Scylla Cloud relieves your team of day-to-day cluster management so you can focus on creating modern, interactive applications that respond to queries in milliseconds.
Join us for an overview of Scylla Cloud, including a live demo of how to launch and connect to a cluster, how to create and query a table, and how to run a few operations, all in minutes.
Storing time series data with Apache CassandraPatrick McFadin
If you are looking to collect and store time series data, it's probably not going to be small. Don't get caught without a plan! Apache Cassandra has proven itself as a solid choice now you can learn how to do it. We'll look at possible data models and the the choices you have to be successful. Then, let's open the hood and learn about how data is stored in Apache Cassandra. You don't need to be an expert in distributed systems to make this work and I'll show you how. I'll give you real-world examples and work through the steps. Give me an hour and I will upgrade your time series game.
Building Open Data Lakes on AWS with Debezium and Apache HudiGary Stafford
Build a simple open data lake on AWS using a combination of open-source software (OSS), including Red Hat’s Debezium, Apache Kafka, and Kafka Connect for change data capture (CDC), and Apache Hive, Apache Spark, Apache Hudi, and Hudi’s DeltaStreamer for managing our data lake. We will use fully-managed AWS services to host the open data lake components, including Amazon RDS, Amazon MKS, Amazon EKS, and EMR.
Link to the blog post and video: https://garystafford.medium.com/building-open-data-lakes-with-debezium-and-apache-hudi-c3370d3f86fb
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
The traditional data warehouse has served us well for many years, but new trends are causing it to break in four different ways: data growth, fast query expectations from users, non-relational/unstructured data, and cloud-born data. How can you prevent this from happening? Enter the modern data warehouse, which is able to handle and excel with these new trends. It handles all types of data (Hadoop), provides a way to easily interface with all these types of data (PolyBase), and can handle “big data” and provide fast queries. Is there one appliance that can support this modern data warehouse? Yes! It is the Analytics Platform System (APS) from Microsoft (formally called Parallel Data Warehouse or PDW) , which is a Massively Parallel Processing (MPP) appliance that has been recently updated (v2 AU1). In this session I will dig into the details of the modern data warehouse and APS. I will give an overview of the APS hardware and software architecture, identify what makes APS different, and demonstrate the increased performance. In addition I will discuss how Hadoop, HDInsight, and PolyBase fit into this new modern data warehouse.
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
Pavel Emelyanov, Principal Engineer at ScyllaDB
Botond Denes, C++ Developer at ScyllaDB
What performance-minded engineers need to know.
Hear from Pavel Emelyanov and Botond Dénes on the impact of database internals – specifically, what to look for if you need latency and/or throughput improvements.
Build the foundation for success with ScyllaDB
Ready to try out ScyllaDB and want to make sure you’re “doing it right?” We’ll help you get up and running, fast. Spend an hour with our architects for a crash course in what ScyllaDB is all about, the core concepts you need to know, and a step-by-step demonstration of how to get started.
During the live, interactive one-hour session, you will learn:
- Critical considerations for designing a NoSQL system and NoSQL data model
- The technology underlying ScyllaDB’s high performance, availability, and scalability – and best practices for taking advantage of it
- How to install, deploy and operate a full working ScyllaDB system, including multi-data center deployment, monitoring, and connecting an application to the ScyllaDB cluster
By the end of the session, you’ll have the knowledge and tools you need to get ScyllaDB running on your laptop, connect your application to it, and see what it’s like to use ScyllaDB for your specific use case.
Ceph is an open source project, which provides software-defined, unified storage solutions. Ceph is a distributed storage system which is massively scalable and high-performing without any single point of failure. From the roots, it has been designed to be highly scalable, up to exabyte level and beyond while running on general-purpose commodity hardware.
Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
In this talk I will introduce you to a Docker container that provides you an easy way to do distributed graph processing using Apache Spark GraphX and a Neo4j graph database. You'll learn how to analyze big data graphs that are exported from Neo4j and consequently updated from the results of a Spark GraphX analysis. The types of analysis I will be talking about are PageRank, connected components, triangle counting, and community detection.
Database technologies have evolved to be able to store big data, but are largely inflexible. For complex graph data models stored in a relational database there may be tedious transformations and shuffling around of data to perform large scale analysis.
Fast and scalable analysis of big data has become a critical competitive advantage for companies. There are open source tools like Apache Hadoop and Apache Spark that are providing opportunities for companies to solve these big data problems in a scalable way. Platforms like these have become the foundation of the big data analysis movement.
Speakers
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
NoSQL includes a wide range of different database technologies and were developed as a result of surging volume of data stored. Relational databases are not capable of coping with this huge volume and faces agility challenges. This is where NoSQL databases have come in to play and are popular because of their features. The session covers the following topics to help you choose the right NoSQL databases:
Traditional databases
Challenges with traditional databases
CAP Theorem
NoSQL to the rescue
A BASE system
Choose the right NoSQL database
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
Piotr Sarna, Software Engineer at Turso
Understanding and tapping your driver’s performance potential.
Piotr Sarna discusses how to get the most out of a driver, particularly from the performance perspective, and select a driver that’s a good fit for your needs.
ScyllaDB recently launched our Scylla Cloud database as a service, which combines the speed and power of the Scylla NoSQL database with the ease of a fully managed cloud service. Scylla Cloud relieves your team of day-to-day cluster management so you can focus on creating modern, interactive applications that respond to queries in milliseconds.
Join us for an overview of Scylla Cloud, including a live demo of how to launch and connect to a cluster, how to create and query a table, and how to run a few operations, all in minutes.
Storing time series data with Apache CassandraPatrick McFadin
If you are looking to collect and store time series data, it's probably not going to be small. Don't get caught without a plan! Apache Cassandra has proven itself as a solid choice now you can learn how to do it. We'll look at possible data models and the the choices you have to be successful. Then, let's open the hood and learn about how data is stored in Apache Cassandra. You don't need to be an expert in distributed systems to make this work and I'll show you how. I'll give you real-world examples and work through the steps. Give me an hour and I will upgrade your time series game.
Building Open Data Lakes on AWS with Debezium and Apache HudiGary Stafford
Build a simple open data lake on AWS using a combination of open-source software (OSS), including Red Hat’s Debezium, Apache Kafka, and Kafka Connect for change data capture (CDC), and Apache Hive, Apache Spark, Apache Hudi, and Hudi’s DeltaStreamer for managing our data lake. We will use fully-managed AWS services to host the open data lake components, including Amazon RDS, Amazon MKS, Amazon EKS, and EMR.
Link to the blog post and video: https://garystafford.medium.com/building-open-data-lakes-with-debezium-and-apache-hudi-c3370d3f86fb
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
The traditional data warehouse has served us well for many years, but new trends are causing it to break in four different ways: data growth, fast query expectations from users, non-relational/unstructured data, and cloud-born data. How can you prevent this from happening? Enter the modern data warehouse, which is able to handle and excel with these new trends. It handles all types of data (Hadoop), provides a way to easily interface with all these types of data (PolyBase), and can handle “big data” and provide fast queries. Is there one appliance that can support this modern data warehouse? Yes! It is the Analytics Platform System (APS) from Microsoft (formally called Parallel Data Warehouse or PDW) , which is a Massively Parallel Processing (MPP) appliance that has been recently updated (v2 AU1). In this session I will dig into the details of the modern data warehouse and APS. I will give an overview of the APS hardware and software architecture, identify what makes APS different, and demonstrate the increased performance. In addition I will discuss how Hadoop, HDInsight, and PolyBase fit into this new modern data warehouse.
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
Pavel Emelyanov, Principal Engineer at ScyllaDB
Botond Denes, C++ Developer at ScyllaDB
What performance-minded engineers need to know.
Hear from Pavel Emelyanov and Botond Dénes on the impact of database internals – specifically, what to look for if you need latency and/or throughput improvements.
Build the foundation for success with ScyllaDB
Ready to try out ScyllaDB and want to make sure you’re “doing it right?” We’ll help you get up and running, fast. Spend an hour with our architects for a crash course in what ScyllaDB is all about, the core concepts you need to know, and a step-by-step demonstration of how to get started.
During the live, interactive one-hour session, you will learn:
- Critical considerations for designing a NoSQL system and NoSQL data model
- The technology underlying ScyllaDB’s high performance, availability, and scalability – and best practices for taking advantage of it
- How to install, deploy and operate a full working ScyllaDB system, including multi-data center deployment, monitoring, and connecting an application to the ScyllaDB cluster
By the end of the session, you’ll have the knowledge and tools you need to get ScyllaDB running on your laptop, connect your application to it, and see what it’s like to use ScyllaDB for your specific use case.
Ceph is an open source project, which provides software-defined, unified storage solutions. Ceph is a distributed storage system which is massively scalable and high-performing without any single point of failure. From the roots, it has been designed to be highly scalable, up to exabyte level and beyond while running on general-purpose commodity hardware.
Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
In this talk I will introduce you to a Docker container that provides you an easy way to do distributed graph processing using Apache Spark GraphX and a Neo4j graph database. You'll learn how to analyze big data graphs that are exported from Neo4j and consequently updated from the results of a Spark GraphX analysis. The types of analysis I will be talking about are PageRank, connected components, triangle counting, and community detection.
Database technologies have evolved to be able to store big data, but are largely inflexible. For complex graph data models stored in a relational database there may be tedious transformations and shuffling around of data to perform large scale analysis.
Fast and scalable analysis of big data has become a critical competitive advantage for companies. There are open source tools like Apache Hadoop and Apache Spark that are providing opportunities for companies to solve these big data problems in a scalable way. Platforms like these have become the foundation of the big data analysis movement.
Speakers
What does OOP stand for?
When Object Oriented Programming(OOP) is taught so extensively, do computer programmers, specifically within games development, realise what it's possibly doing to productivity and performance? I explain my own view from experience in personal projects and professional work.
This talk was given to the Edinburgh meet of IGDA Scotland, on 2011/07/27.
Rental Cars and Industrialized Learning to Rank with Sean DownesDatabricks
Data can be viewed as the exhaust of online activity. With the rise of cloud-based data platforms, barriers to data storage and transfer have crumbled. The demand for creative applications and learning from those datasets has accelerated. Rapid acceleration can quickly accrue disorder, and disorderly data design can turn the deepest data lake into an impenetrable swamp.
In this talk, I will discuss the evolution of the data science workflow at Expedia with a special emphasis on Learning to Rank problems. From the heroic early days of ad-hoc Spark exploration to our first production sort model on the cloud, we will explore the process of industrializing the workflow. Layered over our story, I will share some best practices and suggestions on how to keep your data productive, or even pull your organization out of the data swamp.
A production of software stacks is an important part of a healthy software ecosystem. This talk is about most advanced open technology for the software stacks creation and validation, provided by Apache BigTop (incubating). I am going to discuss the advantages of the project, challenges our project and community is facing, and future plans.
Presenter: Konstantin Boudnik, PhD
Linked Data: The Real Web 2.0 (from 2008)Uche Ogbuji
"Linking Open Data (LOD) is a community initiative moving the Web from the idea of separated documents to a wide information space of data. The key principles of LOD are that it is simple, readily adaptable by Web developers, and complements many other popular Web trends. Linked, open data is the real substance of Web 2.0, and not flashy AJAX effects. Learn how to make your data more widely used by making its components easier to discover, more valuable, and easier for people to reuse—in ways you might not anticipate."
business model, business model canvas, mission model, mission model canvas, customer development, lean launchpad, lean startup, stanford, startup, steve blank, entrepreneurship, I-Corps, Stanford
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" EcosystemsUwe Korn
As a Data Scientist/Engineer in Python, we focus in our work to solve problems with large amounts of data but still stay in Python. This is where we are the most effective and feel comfortable. Libraries like Pandas and NumPy provide us with efficient interfaces to deal with this data while still getting optimal performance. The main problem appears when we have to deal with systems outside of our comfort ecosystem. We need to write cumbersome and mostly slow conversion code that ingests data from there into our pipeline until we can work efficiently. Using Apache Arrow and Parquet as base technologies, we get a set of tools that eases this interaction and also brings us a huge performance improvement. As part of the talk we will show a basic problem where we take data coming from a Java application through Python into using these tools.
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...PyData
Modern Data Science is enabling NASA's engineers uncover actionable information from our "dark" data coffers. From starting small to operating at scale, Rob will discuss applications in telemetry, workforce analytics and liberating data from the Mars Rovers. Tools include iPython, Pandas, Boto and more.
The need to process huge data is increasing day by day. Processing huge data involves compute, network and storage. In terms of Big Data, What it takes to innovate and what is innovation at the end? This talk provide high level details on the need of big data and capabilities of Mapr converged data platform.
Speaker: Vijaya Saradhi Uppaluri, Technical Director at MapR Technologies
Outrageous ideas for Graph Databases
Almost every graph database vendor raised money in 2021. I am glad they did, because they are going to need the money. Our current Graph Databases are terrible and need a lot of work. There I said it. It's the ugly truth in our little niche industry. That's why despite waiting for over a decade for the "Year of the Graph" to come we still haven't set the world on fire. Graph databases can be painfully slow, they can't handle non-graph workloads, their APIs are clunky, their query languages are either hard to learn or hard to scale. Most graph projects require expert shepherding to succeed. 80% of the work takes 20% of the time, but that last 20% takes forever. The graph database vendors optimize for new users, not grizzly veterans. They optimize for sales not solutions. Come listen to a Rant by an industry OG on where we could go from here if we took the time to listen to the users that haven't given up on us yet.
Los estafadores ahora están utilizando métodos más sofisticados y dinámicos con tarjetas de crédito, el blanqueo de dinero y otros tipos de fraude. El aprovechamiento de la tecnología gráfica le permitirá ver más allá de los puntos de datos individuales y descubrir patrones difíciles de detectar.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
1. Outrageous Ideas
Data Day Texas - January 28, 2023
For Graph Databases
Welcome to this talk on Outrageous Ideas for Graph Databases.
2. @maxdemarzi
maxdemarzi.com
GitHub.com/maxdemarzi
Max De Marzi
My name is Max De Marzi. Follow me on twitter at maxdemarzi, checkout my blog at maxdemarzi.com or read my bad code on github.com slash maxdemarzi. I’ve spent
the last part of my career teaching people about graph databases.
3. In fact, if you go to the earliest blog post and check out the date. You’ll see… January 2012.
4. Ten Years
In the Graph game.
That’s 10 years, in the graph game telling people about graphs.
5. But I am not an ivory tower academic, writing about stu
ff
I don’t actually have
fi
rst hand experience with.
6. I work the
fi
eld. I write code. I get my hands dirty and I am the one getting yelled at when the stu
ff
doesn’t work.
7. But I’m not here to talk about me, I’m here to talk about Graphs… and that ladies, gentlemen and everyone else, is called a chart not a graph. That chart tells us that
Graph Databases have grown in popularity more than the other categories.
8. 1.8%
However, even after 10 years, we’re haven’t broken past 2%. Look at Document Stores which broke two digits and we can’t break 2 percent.
12. Ideas are Wrong
• Too Many Back-ends (aka
Tinkerpop is wrong)
• No lessons applied from
Relational Databases
• API is incomplete (bulk)
• Query Languages are
Incompetent
- Peter Boncz 2018
He started o
ff
saying that the ideas were wrong. Tinkerpop as a front end has too many back end systems (we’ll get to that). That we learned nothing from relational
databases. That we provided an incomplete API, speci
fi
cally APIs to do bulk operations. And then he said the query languages were incompetent.
13. Implementation is Wrong
• Nodes as Objects sucks
• No internal algebras
• Incompetent Query Optimizers
• Incompetent Query Executors
• Incompetent Engineering
• Incompetent Engineers
(allegedly)
- Peter Boncz 2018
It wasn’t just the ideas he was criticizing, it was the implementations too. Representing nodes as full on Java Objects takes way too much memory. No compression, no
way to do fast scans, no internal algebras, incompetent query optimizers, incompetent query executors, incompetent engineering. The word of that day was
incompetence….and yes I added that last one, because he might as well have said that at this point. He couldn’t hurt anyone’s feelings any harder.
14. Last year, Peter Boncz was back to talk about: The Sorry State of Graph Database Systems. Had we learned nothing in 4 years?
18. 1. Data that should be accessed together is all over the place. You see it in Triple Stores and in the way Schema-less Graphs store property chains of Nodes and
Relationships.
19. 2. Too many joins. You have to go chase this data down, which means your query planner has to work very hard instead of just scanning a record.
20. 3. Triple stores have no concept of Objects so the query optimizer treats each property independently
21. 4. Graph Databases should stop being special little snow
fl
akes and be more like Relational Databases.
22. 5. Graph Databases built on Key Value stores can’t do bulk operations and the API overhead will kill you.
23. 6.The query languages are a trap. If the optimizer can’t do it, you’ll have to forget the query language. This is how Cypher betrays you.
25. It is at this point that we stand at a fork in the road ahead of us. There are two directions in which we could go. We could explore some of Peter’s suggestions. But then
this talk would be called something like….
26. Completely Sensible and
Utterly Boring Ideas
Data Day Texas - January 28, 2023
For Graph Databases
Completely Sensible and Utterly Boring Ideas for Graph Databases. But it’s not.
27. Outrageous Ideas
Data Day Texas - January 28, 2023
For Graph Databases
It’s called outrageous ideas, so let’s get on with it.
31. We’re going to 1969. October 1969. So we’ll catch a ride with Bill and Ted instead.
32. We are going back to our roots. The Codasyl Model. The Network Model… and in keeping with our theme of 1969 we are dropping acid.
33. Drop ACID
Idea One
First idea, is to drop ACID because in almost all use cases, we are NOT the primary database.
34. We are the Robin to the Batman. We are a sidekick.
35. We are the Emotional Support Database. We help keep it together, but we are not the primary database of record.
36. We are the Mini-Me to the Dr. Evil. We complete them, and as much as we may try to look like them, we aren’t.
37. Vendor: I bet they are thinking about buying a
Graph Database
Customer: Why did someone take a photo
of us trying to sleep?
No Customer lies in bed at night thinking about buying a graph database. Let’s face it. They already have a database. But it can’t satisfy all their needs. They already tried
some kinky solutions like denormalizing data, adding materialized views, and clustered indexes, but it didn’t do the trick and now they need something new to spice
things up. But we’re there to help, not take over.
46. Let’s talk about A1, this is a 2020 paper about a distributed in memory graph database from Microsoft. I’ll skip the details and jump right into the performance testing for
which they went all out. They built a cluster of 245 Machines with Intel E5-2673 processors. I had to look that one up.
49. 12 Cores
245 Servers
5880
Cores
X
=
X 2
So 5880 Cores. Almost 6000 Cores on this Cluster. This is the ultimate dream for a lot of people. A massively distributed in memory graph database. Can you imagine
what kind of performance they got? Well you don’t have to imagine because the paper tell us.
50. 2 hop Query
They performed a two hop query. Start with Steven Spielberg, go to the movies he directed and then to the actors who where in those movies and get a count. They
managed 20,000 queries per second.
53. The distributed the nodes randomly across the cluster. Can you imagine? Every single time they traverse a relationship they have to take a network hit? My mind is
blown, hope yours is too.
54. Distribute On Cores
not on Servers
Idea Two
So idea number two. Distribute on Cores, and not on Servers.
55. Why are we here? It’s the Big Question. We aren’t here to have an existential crisis. I’m talking about why are you here at this tech conference? I’ll tell you why.
56. To prepare for the future. To do that, we have to answer one simple question:
57. Before the future comes the present and today Intel Xeon processors have up to 60 cores.
58. Who knows how many cores they will have in the future?
59. The internet knows. Late this year we get 64 cores, in 2024 we’re getting 128 cores and soon there after at least 344 cores, with a potential for 512 or 528 cores
according to internal leaks at Intel. https://www.youtube.com/watch?v=h20inMLeDnE
64. 64 cores in the cloud.
Hold on you say. You need big RAM to feed all these cores?
65. Take a look at this beauty. Oh not this kind of RAM?
66. 4TB
Computer RAM, ok. How about 4TB today on a single socket? Is your graph bigger than 4TB?
Tomorrow that will be 8TB and before you know it 32 and 64TB.
68. …and that’s not all. Much like SANs today can let you use a scalable shared pool of hard drive space across a network, CXL technology will let you use a scalable shared
pool of memory across a network.
69. If you want to learn more, watch this presentation from Gustavo Alonso.
https://www.youtube.com/watch?v=KekKAKI0Aho
70. At Google, 90% of all analytics workloads operate on less than 1 TB of data.
Dr. Hannes Mühleisen, creator of the DuckDB reminding us that at Google, 90% of all analytics workloads operated on less than 1 terabyte of data.
71. Does your data
fi
t in a single server today? Will it
fi
t in a single server tomorrow?
73. You don’t have a single gremlin, you have many of them. The Groovy one, the Python one, the Ruby one, the Scala one, the Rust one, they all look similar but they aren’t
the same.
75. Tinkerpop
Standard?
Around 100 vendor
dependent features
Do they allow Lambdas?
What kind of Indexing?
But is it the Standard? No way. Each vendor sets which combination of 100 features they support along with a bunch of other di
ff
erences amongst them. Like allowing
lambdas and the indexing behind the scenes. This is what Peter was complaining about earlier. What I know is that Gremlin is good at two things:
76. One is giving developers impostor syndrome because it is so hard to learn it turns many people away from graphs.
77. The second thing Gremlin is good at is allowing those that do make it through the learning curve to start thinking in paths. Start thinking “depth
fi
rst”, which is an
important concept to understand when it comes to graph queries. So it’s not all bad.
78. Then we have Cypher. Here he is eating the juicy steak in the matrix. It tastes so good, but you know it’s not real.
79. Customer
• Between a Dozen and a
Hundred Trivial Queries
• Between 0 and a Dozen Non-
Trivial Queries
• A lucky few have All Trivial
Queries
• Most have 1 Non-Trivial Query
and small variations
Workloads
Cypher can handle the Trivial queries just
fi
ne. Some customers have all trivial queries and are blissfully happy. But most have at least 1 big non-trivial query. That
recommendation engine, that shortest path
fi
nding query, that multi source bi-directional weighted traversal, etc. This is where Cypher dies. Literally. He gets electrocuted
by Tank.
80. So when that happens, we have APOC! Awesome Procedures on Cypher. A library of 450 plus Java Stored Procedures that actually make Cypher usable out of the
matrix and in the real world.
81. Wait
use graph ldbc
drop query i_short_2
create query i_short_2(INT vid) for graph ldbc {
SetAccum<INT> @@postSet;
SetAccum<INT> @@commentsSet;
SetAccum<INT> @@creatorSet;
SetAccum<INT> @@messageSet;
SetAccum<INT> @@replySet;
SetAccum<INT> @@postFromReplySet;
SetAccum<INT> @@replyToPostSet;
SumAccum<INT> @@current;
SetAccum<INT> @@resultID;
SetAccum<INT> @@visitedSet;
SumAccum<INT> @postID;
SumAccum<INT> @creatorID;
SumAccum<STRING> @creatorFirst;
SumAccum<STRING> @creatorLast;
INT tempMessageID;
INT tempCreator;
STRING tempFirst;
STRING tempLast;
INT postID;
INT tempPostID;
INT length;
INT size;
INT cur;
Person = {person.*};
Creator = {person.*};
Message = {post.*, comments.*};
Prev = {comments.*};
Post = {post.*};
Comments ={comments.*};
Reply = {comments.*};
Reply1 ={comments.*};
ReplyToPost = {comments.*};
Result = {post.*, comments.*};
CurrentReply = {comments.*};
length = Comments.size();
//get person from vid
Person = SELECT s
FROM Person:s
WHERE s.id == vid;
//get latest message
Message = SELECT s
FROM Message:s-((post_hasCreator_person|comments_hasCreator_person):e)->person:t
WHERE t.id == vid
ORDER BY s.creationDate DESC
LIMIT 10;
Message = SELECT s FROM Message:s
ACCUM @@messageSet += s.id,
@@visitedSet += s.id;
PostSet = SELECT s FROM Message:s-(post_hasCreator_person)->:t
ACCUM @@postSet += s.id;
// PRINT PostTest;
//get comment in message
Reply = SELECT s
FROM Message:s-(comments_hasCreator_person)->:t
WHERE t.id == vid
ACCUM @@replySet += s.id,
@@visitedSet += s.id;
Reply1 = SELECT s FROM Comments:s WHERE s.id IN @@replySet;
// PRINT Reply1, @@replySet;
ReplyToPost = SELECT s FROM Reply1:s-(comments_replyOf_post)->:t
ACCUM @@replyToPostSet += s.id,
@@visitedSet += s.id;
// PRINT @@replyToPostSet;
// PRINT ReplyToPost, @@replyToPostSet;
// //for each comment in message, get 1 hop comment to post
FOREACH item IN @@replySet DO
IF item != -1 THEN
CurrentReply = SELECT s FROM Reply1:s WHERE s.id == item;
//PRINT CurrentReply;
size = CurrentReply.size();
WHILE size != 0 LIMIT 100 DO
Prev = SELECT s FROM CurrentReply:s ACCUM cur = s.id;
CurrentReply = SELECT t
FROM Comments:s-(comments_replyOf_comments)->:t
WHERE s.id == cur
ACCUM @@visitedSet += t.id;
size = CurrentReply.size();
IF size == 0 THEN BREAK; END;
END;
CurrentReply = SELECT s
FROM Prev:s
ACCUM @@replyToPostSet += s.id;
//PRINT CurrentReply;
END;
END;
// PRINT @@replyToPostSet;
//
//get post from 1 hop comment
Post = SELECT s
FROM Post:s-(comments_replyOf_post_reverse)->:t
WHERE t.id IN @@replyToPostSet
ACCUM @@postFromReplySet += s.id;
// PRINT Post;
//get post creator info
Post = SELECT s
FROM Post:s-(post_hasCreator_person)->:t
ACCUM s.@creatorID = t.id,
s.@creatorFirst = t.firstName,
s.@creatorLast = t.lastName;
What about GSQL?
ACCUM @@postFromReplySet += s.id;
// PRINT Post;
//get post creator info
Post = SELECT s
FROM Post:s-(post_hasCreator_person)->:t
ACCUM s.@creatorID = t.id,
s.@creatorFirst = t.firstName,
s.@creatorLast = t.lastName;
// PRINT Post;
//pass person info and postID to 1 hop comment
ReplyToPost = SELECT t
FROM Post:s-(comments_replyOf_post_reverse)->:t
ACCUM t.@postID = s.id,
t.@creatorID = s.@creatorID,
t.@creatorFirst = s.@creatorFirst,
t.@creatorLast = s.@creatorLast,
@@replyToPostSet += t.id;
// PRINT ReplyToPost;
// //the foreach block pass person info and postID to visited comments in post
FOREACH item IN @@replyToPostSet DO
IF item != 0 THEN
Temp = SELECT s FROM ReplyToPost:s WHERE s.id == item
ACCUM tempMessageID = s.id,
tempCreator = s.@creatorID,
tempFirst = s.@creatorFirst,
tempLast = s.@creatorLast,
tempPostID = s.@postID;
//
//// //save person info and PostID from 1 kop comments to message set
Result = SELECT s
FROM Result:s
WHERE s.id IN @@visitedSet
ACCUM CASE WHEN s.id == item THEN
s.@creatorID = tempCreator,
s.@creatorFirst = tempFirst,
s.@creatorLast = tempLast,
s.@postID = tempPostID,
@@resultID += s.id
END;
size = Temp.size();
//filter result set by visited comments
Result = SELECT s FROM Result:s WHERE s.id IN @@visitedSet;
// PRINT tempCreator;
// PRINT "-----------------debug--------------------";
//
// PRINT Result;
//
// PRINT "------debug-----";
//pass post creator info to all visited comment
WHILE size != 0 LIMIT 100 DO
TempReplyTemp= SELECT t
FROM Temp:s-(comments_replyOf_comments_reverse)->:t
ACCUM tempMessageID = s.@creatorID,
tempFirst = s.@creatorFirst,
tempLast = s.@creatorLast,
tempPostID = s.@postID;
IF TempReplyTemp.size() == 1 THEN
Result = SELECT s
FROM Result:s
ACCUM CASE WHEN s.id == tempMessageID THEN
s.@creatorID = tempCreator,
s.@creatorFirst = tempFirst,
s.@creatorLast = tempLast,
s.@postID = postID
END;
size = TempReplyTemp.size();
END;
END;
END;
END;
//
//
//
// PRINT "---------------Result-------------------------";
//pass post creator to post in message set
FOREACH item IN @@postSet DO
IF item != -1 THEN
TempPost = SELECT s
FROM Result:s-(post_hasCreator_person)->:t
WHERE s.id == item
ACCUM tempCreator = t.id,
tempFirst = t.firstName,
tempLast = t.lastName;
Result = SELECT s FROM Result:s
ACCUM CASE WHEN s.id IN @@postSet THEN
s.@postID = s.id,
s.@creatorID = tempCreator,
s.@creatorFirst = tempFirst,
s.@creatorLast = tempLast
END;
END;
END;
Result = SELECT s FROM Result:s
WHERE s.id IN @@messageSet
Order by s.creationDate DESC, s.id DESC;
PRINT Result.id, Result.content, Result.imageFile, Result.creationDate, Result.@postID,
Result.@creatorID, Result.@creatorFirst, Result.@creatorLast;
}
install query i_short_2
•
GSQL can’t decide if it’s a query language or a programming language, so it’s just kind of accumulates a lot of lines of code and it’s a pain to work with for all but the
people who get paid by the hour to write this stu
ff
.
82. So GQL? That’s the new Standard like SQL the vendors have been building? The problem here is that it will still need APOC. Or APOG I guess, and then you can kiss
your standard goodbye.
83. Programming Languages
instead of Query Languages
Idea Three
Idea Three is to use actual programming languages instead of query languages.
84. There is a blog post from Ted Neward called the “Vietnam of Computer Science” talking about the war of ORMs and Relational Databases. This is my spin on the subject
about Declarative Query Languages.
85. The Lie
• In Declarative Query
Languages (like SQL, Cypher,
GQL, etc) developers are
supposed to:
• specify what is to be done
• instead of how to do it.
Let’s start o
ff
with the L I E. Can you spot it? It’s subtle. It says “In Declarative Query Languages developers are supposed to specify what is to be done instead of how to
do it”.
86. The Problem
• Find the customers who
decreased their purchase
amounts on their most
recent order
• A contest for who
could beat Joe Celko
performance wise
on 10k rows of data
A “simple” query
Let’s look at an example. The problem is a simple query. Find the customers who ordered less on their most recent order than the one before that. This was a subject for
a contest Joe Celko ran back in the day to see who could write a faster query on 10k rows of data. Look at that horrible mess, that was Joe’s query. https://www.red-
gate.com/simple-talk/databases/sql-server/t-sql-programming-sql-server/celkos-sql-stumper-the-data-warehouse-problem/
87. 44 Different
• There are at least 44 different
ways to write:
“Find the customers who
decreased their purchase
amounts on their most recent
order”
• 30 Unique Timings
• At least 30 ways for the Query
Planner and Optimizer to execute
Queries
I remember this challenge because I entered two queries. There were 44 in total. 44 di
ff
erent ways to write that sentence in SQL, and 30 unique timings to go with them.
So at least 30 ways for the query planner and query optimizer to execute those queries. The queries range in performance from 46ms to 10 seconds. Just on 10
thousand rows of data. Can you imagine the timing range on 10 million rows of data? The fastest queries are 10x faster than the middle of the pack and 20x faster than
all but the worst which we will ignore because Ramesh was probably Trolling.
88. You end up not only having to be an expert in the query language, but also how to manipulate the query planner and query optimizer to take full advantage of the
mechanical sympathy of the database engine to run your queries optimally. This is worse than just telling the database how to execute the query.
89. It’s not the fossil fuel industry killing the planet, its all those ine
ffi
cient database queries running on ever growing data that will doom us all.
91. Idea Four
No More
Database
Drivers
Idea Four is No More Database Drivers. It’s just one more thing to get in the way. You’ll spend your time answering “oh sorry we don’t have a Go Driver or Rust Driver or
Zig Driver or Julia Driver or whatever the cool kids are using this month”…and you’ll have to hire a bunch of people to build and maintain these things. It’s going to cost a
lot of money and be a royal pain. Trust me on this one.
92. Some of Peter’s Ideas
Schema,
Vectorization,
JIT, SIMD
A sprinkle of Peter’s ideas like actual Schema, vectorized query execution where possible, Just In Time Queries Compilation, take advantage of SIMD where possible. I
mean sure why not, these aren’t bad ideas.
93. Never trust vendor Benchmarks
Before I say anything more, please remember to never trust vendor benchmarks. Never ever.
94. Anyone, one day I got really mad at the performance I was getting. And I do mean really mad. Mad enough to write a few thousand lines of C code.
95. 8.3m vs 330m r/s/c*
3m vs 175m r/s/c*
*Relationships Traversed Per
Second Per Core
40-60x
Faster
So I wrote the bare in memory data structures needed to duplicate what Neo4j was doing in C and compared a couple of traversals. The top one goes through 50 million
relationships per query, the second does the same, but checks a property on those relationships before traversing. From 8 million to 330 million. From 3 million to 174
million. That’s 40 to 60 times faster.
96. But I’m comparing apples and oranges. One is a database meant to handle any workload. The other is handcrafted code meant to handle two queries that we have
complete control over.
97. So does that mean everyone should just do a couple of shots and build their own handmade graph services? Not really. What it means is that there is plenty of room to
make the current databases better and build new and faster databases.
98. I got no
patience
and I hate
waiting
Just like Jay-Z. I have no patience and I hate waiting.
102. Sorry, I meant Sel
fi
sh. Graphs are the only thing I know and if the current vendors don’t
fi
x their o
ff
erings then I might be in the same sinking ship as the Hadoop Experts.
103. I want to build 4 me
• Better performance
• A lot faster (hopefully)
• Can handle diverse workloads
• Properties in Traversals
• An easy interface
• HTTP + JSON
• A programming language
• For complex queries
A graph db that has:
I want to build for my needs. A graph database that is Faster, Better, Easier, and more Flexible by following some of the hardware trends we talked about in this
presentation.
104. —Paul Barham
“You can have a second computer
once you’ve shown
you know how to use the
f
irst one.”
And planning for a Scale Up System using Lots of Ram, Lots of Cores, on a Single Server. Replicated (eventually) but not Distributed.
105. Seastar
• Shared Nothing Multicore
• “Server per core”
• Message Passing
• Futures and Promises
• High Performance
Networking
Framework
Using the Seastar framework with it’s “server per core”, futures and promises, and high performance networking.
106. We avoid shared memory and locking, think of each core as a server message passing events within the physical box instead of via the network. No ACID needed
(maybe).
107. On 4 Cores
190k Requests / Second
Stupid fast, with latencies low enough for AdTech use cases.
108. On 4 Cores with DPDK
280k Requests / Second
We can use DPDK (Data Plane Development Kit) to go even faster skipping the network driver and talking to the network card directly… even on the Cloud. Yes. I’m only
getting an empty node, but the other graph databases can’t even say hello that fast.
109. Schema
• Nodes have a single Type
• No multiple labels
• Properties have a Type
• Bool, Int, Double, String, List
• Nodes of the same Type have
the same properties
• Like any sane database
Not Optional
With a Schema, because in the real world, data has schema. A single type for Nodes and Relationships, because multiple labels were a terrible mistake. Let’s make
things sane again.
110. HTTP + JSON
• You can talk to it from your
browser
• You can talk to it from any
programming language
• No drivers needed, no custom
protocol
Universal
Let’s talk via HTTP and JSON, from any language, no drivers needed, no custom binary protocols, you can even talk to it from your browser window.
111. Lua
• Proven
• Used in embedded systems
and games
• Fast
• Fastest scripting language I
know of, and using LuaJIT
• Powerful, small and free (MIT)
“Moon” in Portuguese
Using Lua as the Query Language because it’s proven in the
fi
eld and used in embedded systems and games where performance matters. Using LuaJIT the fastest
scripting language I know of.
112. Lua
• Simple Queries
As a Query Language
We’ll take whatever the last line of the query is and turn it into JSON. For example getting a node.
113. Lua
• Simple Queries
• Pipelined Queries
As a Query Language
Or doing a bunch of stu
ff
, related or unrelated, in a pipeline or batch.
114. Lua
• Simple Queries
• Pipelined Queries
• Complex Queries
As a Query Language
You have a real programming language to do complex queries plus helper functions for accessing the database and soon to com vectorized procedures for faster data
processing.
115. Look at that pretty UI. I built that myself. Let’s traverse 50M relationships in 10 seconds. Too Slow?
116. Remember about 100 slides ago when Peter Boncz was complaining about Graph databases not having Bulk APIs. Turns out he was right. Here we can go about 5x
faster by traversing in bulk instead of one at a time. Makes the query simpler too.
117. Oh hey I forgot to talk about Dgraph and GraphQL. Do we really need it here? We are already returning JSON and can return it in any way we want. A single request can
be one query or one hundred, related or not.
118. SIMD
• Already in Find with Predicate
• Will be added to Math and Data
Manipulation Functions
• Sprinkled in wherever it can to
speed things up
For Vectorized Execution
Borrowing the EVE library for SIMD vectorized execution. Already making
fi
nding nodes and relationships with a predicate faster, will be adding math and data
manipulation functions as well sprinkling it in wherever we can.
119. 4 Layer Design
HTTP
Lua (in Thread)
Peered
Shard
A very simple 4 layer design. HTTP in the front, Lua if needed in Thread, a Peered method to coordinate multi shard requests and a shard layer to actually work with the
data.
120. Blog Posts
maxdemarzi.com
I’ve been writing my progress on my blog at maxdemarzi.com so you don’t walk blind into a 20,000 line C++ codebase. A little behind on where the code base is, but will
catch up soon.
126. Todos
• C++ Dev: ragedb
• Java Dev: rage-assured
• Scala Dev: benchmarks
• JavaScript Dev: UI
• DevRel: Home Page
• DevOps: Docker + Packaging
• Anyone: Use it, report bugs,
request features
Means all of us
Just remember that “todos” in Spanish means all of us, whatever your skill set is, I have something you can help with.