MySQL 8.0 makes it possible to write queries that do more. MySQL can now traverse hierarchies, analyze data in new ways, and combine JSON and spatial data with traditional types — all in the same query.
In this presentation, we'll look at common table expressions (CTEs), window functions, geography support and JSON functionality, and how these can be used to do things no MySQL query has ever done before.
MySQL 8.0.17 has the ability to create indexes over JSON arrays to speed up your queries. This presentation shows examples and explain all you need to know about this new feature: How are array indexes created? How do they work? When are they used? Are there any limitations?
Since version 8.0.14, MySQL supports LATERAL derived tables, sometimes called the for each loop of SQL. What are they? How do they work? Why do you need them? What can they do? How can you use them? Should you use them? What is all this talk about for each loops?
How to Take Advantage of Optimizer Improvements in MySQL 8.0Norvald Ryeng
MySQL 8.0 introduces several improvements to the query optimizer that may give improved performance for your queries. This presentation looks at what kind of queries the different improvements apply to, and the focus is on what you can do to get the most out of the optimizer improvements. The main topics are changes to the optimizer cost model, histograms, and new optimizer hints, but other improvements to how MySQL executes queries are also covered. The presentation includes many practical examples of how you can get a significant speedup for your MySQL queries.
Is there a perfect data-parallel programming language? (Experiments with More...Julian Hyde
The perfect data parallel language has not yet been invented. SQL queries can achieve great performance and scale, but there are many general purpose algorithms that it cannot express. In Morel, we build on the functional and relational roots of MapReduce in an elegant and strongly-typed general-purpose programming language. But Morel is, in a real sense, a query language; programs are executed on relational frameworks such as Google BigQuery and Spark.
In this talk, we describe the principles that drove Morel’s design, the problems that we had to solve in order to implement a hybrid functional/relational language, and how Morel can be applied to implement data-intensive systems.
We also introduce Apache Calcite, the popular open source framework for query planning, and describe how Morel's compiler uses Calcite's relational algebra and rewrite rules to generate efficient plans.
MySQL 8.0.17 has the ability to create indexes over JSON arrays to speed up your queries. This presentation shows examples and explain all you need to know about this new feature: How are array indexes created? How do they work? When are they used? Are there any limitations?
Since version 8.0.14, MySQL supports LATERAL derived tables, sometimes called the for each loop of SQL. What are they? How do they work? Why do you need them? What can they do? How can you use them? Should you use them? What is all this talk about for each loops?
How to Take Advantage of Optimizer Improvements in MySQL 8.0Norvald Ryeng
MySQL 8.0 introduces several improvements to the query optimizer that may give improved performance for your queries. This presentation looks at what kind of queries the different improvements apply to, and the focus is on what you can do to get the most out of the optimizer improvements. The main topics are changes to the optimizer cost model, histograms, and new optimizer hints, but other improvements to how MySQL executes queries are also covered. The presentation includes many practical examples of how you can get a significant speedup for your MySQL queries.
Is there a perfect data-parallel programming language? (Experiments with More...Julian Hyde
The perfect data parallel language has not yet been invented. SQL queries can achieve great performance and scale, but there are many general purpose algorithms that it cannot express. In Morel, we build on the functional and relational roots of MapReduce in an elegant and strongly-typed general-purpose programming language. But Morel is, in a real sense, a query language; programs are executed on relational frameworks such as Google BigQuery and Spark.
In this talk, we describe the principles that drove Morel’s design, the problems that we had to solve in order to implement a hybrid functional/relational language, and how Morel can be applied to implement data-intensive systems.
We also introduce Apache Calcite, the popular open source framework for query planning, and describe how Morel's compiler uses Calcite's relational algebra and rewrite rules to generate efficient plans.
Overview of a few ways to group and summarize data in R using sample airfare data from DOT/BTS's O&D Survey.
Starts with naive approach with subset() & loops, shows base R's tapply() & aggregate(), highlights doBy and plyr packages.
Presented at the March 2011 meeting of the Greater Boston useR Group.
UKOUG Tech14 - Using Database In-Memory Column Store with Complex DatatypesMarco Gralike
Presentation used during the UKOUG Tech14 conference in Liverpool (UK) discussing possibilities of the use of the 12.1.0.2 In-Memory Column Store option with XMLType data type storage options common to the Oracle 12.1 database
The slide shows a full gist of reading different types of data in R thanks to coursera it was much comprehensive and i made some additional changes too.
UKOUG Tech14 - Getting Started With JSON in the DatabaseMarco Gralike
Presentation used during the UKOUG Tech14 conference in Liverpool (UK) discussing possibilities of the use of, and explaining, the new JSON database functionality in the Oracle 12.1.0.2 database
In this talk I will show Visualbox, a "visualization server" based on LODSPeaKr that can make easy for non javascript experts to create simple but meaningful visualizations.
Analysis of data in Python with SciPy and pandas, Ubuntu installation, PyCharm configuration, Series, DataFrame, big data, medical data, merging data, groupby, graphing data, iPython using Wakari.io, and analyzing stock prices of US automakers including Ford and Telsa. As presented at Penguicon 2016.
Overview of a few ways to group and summarize data in R using sample airfare data from DOT/BTS's O&D Survey.
Starts with naive approach with subset() & loops, shows base R's tapply() & aggregate(), highlights doBy and plyr packages.
Presented at the March 2011 meeting of the Greater Boston useR Group.
UKOUG Tech14 - Using Database In-Memory Column Store with Complex DatatypesMarco Gralike
Presentation used during the UKOUG Tech14 conference in Liverpool (UK) discussing possibilities of the use of the 12.1.0.2 In-Memory Column Store option with XMLType data type storage options common to the Oracle 12.1 database
The slide shows a full gist of reading different types of data in R thanks to coursera it was much comprehensive and i made some additional changes too.
UKOUG Tech14 - Getting Started With JSON in the DatabaseMarco Gralike
Presentation used during the UKOUG Tech14 conference in Liverpool (UK) discussing possibilities of the use of, and explaining, the new JSON database functionality in the Oracle 12.1.0.2 database
In this talk I will show Visualbox, a "visualization server" based on LODSPeaKr that can make easy for non javascript experts to create simple but meaningful visualizations.
Analysis of data in Python with SciPy and pandas, Ubuntu installation, PyCharm configuration, Series, DataFrame, big data, medical data, merging data, groupby, graphing data, iPython using Wakari.io, and analyzing stock prices of US automakers including Ford and Telsa. As presented at Penguicon 2016.
An overview of two types of graph databases: property databases and knowledge/RDF databases, together with their dominant respective query languages, Cypher and SPARQL. Also a quick look at some property DB frameworks, including TinkerPop and its query language, Gremlin.
RELATIONAL DATABASES & Database design
CIS276
EmployeeNumFirstNameLastNameDeptNum2173BarbaraHennessey274519LeeNoordsy318005PatAmidon27
Employee
Table Name
Field Names
Records (rows or tuples)
Fields (columns or attributes)
Tables
StateAbbrevStateNameEnterUnionOrderStateBirdStatePopulationCTConnecticut5Robin3,590,347MIMichigan26Robin9,883,360SDSouth Dakota40Pheasant833,354
Primary Key
Alternate keys
Keys
State
StateAbbrevStateNameEnterUnionOrderStateBirdStatePopulationCTConnecticut5Robin3,590,347MIMichigan26Robin9,883,360SDSouth Dakota40Pheasant833,354StateAbbrevCityNameCityPopulationCTHartford124,062CTMadison18,803CTPortland9,551MILansing119,128SDMadison6,482SDPierre13,899
Primary key (State table)
Keys
Composite primary key (City table)
Foreign Key
State
City
Relationships- One to ManyEmployeeNumFirstNameLastNameDeptNum2173BarbaraHennessey274519LeeNoordsy318005PatAmidon27DeptNumDeptNameDeptHead24Finance811227Marketing217331Technology4519
Primary key for the one to many relationship
Primary Key
Foreign key for the one to many relationship
Employee
Department
1:M or 1:N
Relationships- One to OneEmployeeNumFirstNameLastNameDeptNum2173BarbaraHennessey274519LeeNoordsy318005PatAmidon27EmployeeNumUserNamePassword2173bhennessey********4519lnoordsy********8005Pamidon********
Employee
Credential
Primary key for the one to one relationship
Foreign key for the one to one relationship
1:1
Relationships- Many to ManyEmployeeNumFirstNameLastNameDeptNum2173BarbaraHennessey274519LeeNoordsy318005PatAmidon27PositIDPositDescPayGrade1Director452Manager403Analyst30EmployeeNumPositIDStartDateEndDate2173212/14/20114519104/23/20134519311/11/200704/22/20138005306/05/201208/25/20138005207/02/201006/04/2012
Employee
Position
Employment
Primary Key (Employee table)
Primary Key (Position table)
Composite primary key of join table
Foreign keys related to the Employee and Position tables
M:N
Integrity Constraints
Entity integrity constraint
Primary key cannot be null
Referential integrity
Each non-null foreign key value must match a primary key value in the primary table
Domain integrity constraint
A domain is a set of values from which one or more fields draw their actual values
A rule you specify for a field (text size, validation rule, etc.)
Dependencies and DeterminantsEmployeeNumPositIDLastNamePositDescStartDateHealthPlanPlanDesc21732HennesseyManager12/14/2011BManaged HMO45191NoordsyDirector04/23/2013AManaged PPO45193NoordsyAnalyst11/11/2007AManaged PPO80053AmidonAnalyst06/05/2012CHealth Savings80054AmidonClerk07/02/2010CHealth Savings
StartDate
EmployeeNum
PositID
HealthPlan
LastName
PlanDesc
PositDesc
Composite Key
Transitive Dependancy
AnomaliesEmployeeNumPositIDLastNamePositDescStartDateHealthPlanPlanDesc21732HennesseyManager12/14/2011BManaged HMO45191NoordsyDirector04/23/2013AManaged PPO45193NoordsyAnalyst11/11/2007AManaged PPO80053AmidonAnalyst06/05/2012CHealth Savings80054AmidonClerk07/02/2010CHealth Savings
Composite Key
Insertion anomal ...
Dives into how MySQL indexes work under the hood, and provides strategies for efficiently indexing your data to reduce query times.
Presented at Western Slope Tech Meetup in Montrose, CO 3/29/17
Meetup - Exabyte Big Data - HPCC Systems - SQL to ECLFujio Turner
How to do SQL like queries is ECL (Enterprise Control Language)
Install HPCC and get started in a few minutes.
"How to install HPCC in 5 minutes" Youtube on last slide.
https://www.youtube.com/watch?v=8SV43DCUqJg
SQL (Structured Query Language) is a domain-specific language used for managing and manipulating relational databases. It serves as a standardized way to communicate with and interact with databases, enabling users to create, modify, retrieve, and manipulate data stored in structured tables.
Key points about SQL:
1. **Database Management**: SQL is used to manage databases, which are organized collections of structured data stored in tables with rows and columns.
2. **Data Manipulation**: SQL provides commands for querying, inserting, updating, and deleting data within databases. This allows users to extract specific information from large datasets.
3. **Data Definition**: SQL is also used to define the structure of the database, including creating tables, defining their attributes (columns), specifying data types, and setting constraints.
4. **Querying**: The core strength of SQL lies in its querying capabilities. Users can write complex queries to filter, sort, join, and aggregate data to generate meaningful insights.
5. **Relational Databases**: SQL is primarily associated with relational databases, where data is organized into tables with defined relationships between them. This ensures data integrity and reduces redundancy.
6. **Standardized Language**: SQL is an industry-standard language, supported by most relational database management systems (RDBMS), including MySQL, PostgreSQL, Microsoft SQL Server, Oracle, and SQLite.
7. **SQL Variants**: While the core syntax remains consistent across databases, different database systems might have slight variations and extensions in their SQL implementations.
8. **Data Integrity**: SQL allows the specification of various constraints, such as primary keys, foreign keys, and unique constraints, to maintain data integrity and enforce relationships between tables.
9. **Transactions**: SQL supports transaction management, allowing users to group multiple operations into a single transaction that can be rolled back if an error occurs.
10. **Administration**: Database administrators (DBAs) use SQL to manage database users, permissions, backups, and other administrative tasks.
11. **Non-Relational Databases**: While SQL is mainly associated with relational databases, there are also versions of SQL adapted for non-relational databases like NoSQL databases.
In summary, SQL is a powerful language used to interact with and manage structured data stored in databases. It's essential for tasks ranging from data retrieval to database design and administration.
EXPLAIN ANALYZE is a new query profiling tool first released in MySQL 8.0.18. This presentation covers how this new feature works, both on the surface and on the inside, and how you can use it to better understand your queries, to improve them and make them go faster.
This presentation is for everyone who has ever had to understand why a query is executed slower than anticipated, and for everyone who wants to learn more about query plans and query execution in MySQL.
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZENorvald Ryeng
This presentation focuses on two of the new features in MySQL 8.0.18: hash joins and EXPLAIN ANALYZE. It covers how these features work, both on the surface and on the inside, and how you can use them to improve your queries and make them go faster.
Both features are the result of major refactoring of how the MySQL executor works. In addition to explaining and demonstrating the features themselves, the presentation looks at how the investment in a new iterator based executor prepares MySQL for a future with faster queries, greater plan flexibility and even more SQL features.
MySQL 8.0 was a huge step in terms of spatial support — a step up from flat Cartesian coordinate systems to ellipsoidal geography. This presentation gives you a quick tour of the spatial support in MySQL, covering data types, functions, indexes, coordinate reference systems and other core topics.
Presented at FOSS4G Bucharest on August 30, 2019.
MySQL 8.0: What Is New in Optimizer and Executor?Norvald Ryeng
There are substantial improvements in the MySQL 8.0 optimizer. Most noticeably, support for advanced SQL features including common table expressions and window functions. The updates also make DBAs' lives easier with invisible indexes and additional hints that can be used together with the query rewrite plugin. On the performance side, cost model changes will make a huge impact. JSON support is even more powerful with the new JSON table function, aggregation functions, and more.
Tips on how to prepare MySQL 5.7 GIS databases for the upgrade to MySQL 8.0 and the introduction of geography support.
Presentation given at the Pre-FOSDEM MySQL Day in Brussels, February 3, 2017.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found