This document provides an overview of pivot tables and how to create them using SQL in MySQL. It begins with definitions and examples of pivot tables using sample data. It then discusses common solutions like using spreadsheets or dedicated tools. The main part of the document focuses on doing pivot tables with standard SQL using techniques like CASE statements and GROUP BY clauses. It also covers using stored procedures and cursors to dynamically generate column headers. The document concludes by introducing a open-source crosstab package for MySQL that implements pivot tables using these SQL techniques.
Using and Creating SQL Functions with Ammar Hassan Brohi.
String Functions
Numeric Functions
String / Number Conversion Functions
Group Functions
Date and Time Functions
Date Conversion Functions
2016.11.09 Keynote SQL und Pivot Tabellen im Kontext der Microsoft BI RoadmapRobert Lochner
Die Keynote "SQL und Pivot Tabellen im Kontext der Microsoft BI Roadmap" wurde im Rahmen des Anwendertages der Saxess Software (Leipzig, D) am 09. November 2016 gehalten. Zentrale Abschnitte sind der Übersicht über die BI Komponenten in SQL Server, Excel und Power BI sowie die erwartbaren Entwicklungen in der nahen und mittleren Zukunft.
Using and Creating SQL Functions with Ammar Hassan Brohi.
String Functions
Numeric Functions
String / Number Conversion Functions
Group Functions
Date and Time Functions
Date Conversion Functions
2016.11.09 Keynote SQL und Pivot Tabellen im Kontext der Microsoft BI RoadmapRobert Lochner
Die Keynote "SQL und Pivot Tabellen im Kontext der Microsoft BI Roadmap" wurde im Rahmen des Anwendertages der Saxess Software (Leipzig, D) am 09. November 2016 gehalten. Zentrale Abschnitte sind der Übersicht über die BI Komponenten in SQL Server, Excel und Power BI sowie die erwartbaren Entwicklungen in der nahen und mittleren Zukunft.
The United States' public education system is failing. National rankings in science and math are at an all time low, and test scores have remained relatively stagnant since the 1970's. It's not about who should be blamed. A better question is who or what is going to lead that change. We believe the answer is technology. Come with us and explore the role technology will play in the future of American education.
Write your own Web Copy - Webinar with Professional Copywriter, Jackie BarrieAnn Halloran
In this webinar with In-Tuition, Jackie Barrie discussed 4 objectives for your home page, 4 words you need to avoid and why and 4 elements you need on your website.
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Citus Data
As a developer using PostgreSQL one of the most important tasks you have to deal with is modeling the database schema for your application. In order to achieve a solid design, it’s important to understand how the schema is then going to be used as well as the trade-offs it involves.
As Fred Brooks said: “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.”
In this talk we're going to see practical normalisation examples and their benefits, and also review some anti-patterns and their typical PostgreSQL solutions, including Denormalization techniques thanks to advanced Data Types.
Why PostgreSQL for Analytics Infrastructure (DW)?Huy Nguyen
Talk given at Grokking TechTalk 14 - Database Systems
Why starting out with Postgres for reporting/analytics database.
Huy Nguyen
CTO of Holistics.io - BI & Infrastructure SaaS.
MariaDB Server 10.3 is a culmination of features from MariaDB Server 10.2+10.1+10.0+5.5+5.3+5.2+5.1 as well as a base branch from MySQL 5.5 and backports from MySQL 5.6/5.7. It has many new features, like a GA-ready sharding engine (SPIDER), MyRocks, as well as some Oracle compatibility, system versioned tables and a whole lot more.
The United States' public education system is failing. National rankings in science and math are at an all time low, and test scores have remained relatively stagnant since the 1970's. It's not about who should be blamed. A better question is who or what is going to lead that change. We believe the answer is technology. Come with us and explore the role technology will play in the future of American education.
Write your own Web Copy - Webinar with Professional Copywriter, Jackie BarrieAnn Halloran
In this webinar with In-Tuition, Jackie Barrie discussed 4 objectives for your home page, 4 words you need to avoid and why and 4 elements you need on your website.
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Citus Data
As a developer using PostgreSQL one of the most important tasks you have to deal with is modeling the database schema for your application. In order to achieve a solid design, it’s important to understand how the schema is then going to be used as well as the trade-offs it involves.
As Fred Brooks said: “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.”
In this talk we're going to see practical normalisation examples and their benefits, and also review some anti-patterns and their typical PostgreSQL solutions, including Denormalization techniques thanks to advanced Data Types.
Why PostgreSQL for Analytics Infrastructure (DW)?Huy Nguyen
Talk given at Grokking TechTalk 14 - Database Systems
Why starting out with Postgres for reporting/analytics database.
Huy Nguyen
CTO of Holistics.io - BI & Infrastructure SaaS.
MariaDB Server 10.3 is a culmination of features from MariaDB Server 10.2+10.1+10.0+5.5+5.3+5.2+5.1 as well as a base branch from MySQL 5.5 and backports from MySQL 5.6/5.7. It has many new features, like a GA-ready sharding engine (SPIDER), MyRocks, as well as some Oracle compatibility, system versioned tables and a whole lot more.
This presentation is an attempt do demystify the practice of building reliable data processing pipelines. We go through the necessary pieces needed to build a stable processing platform: data ingestion, processing engines, workflow management, schemas, and pipeline development processes. The presentation also includes component choice considerations and recommendations, as well as best practices and pitfalls to avoid, most learnt through expensive mistakes.
Apache Hadoop has emerged as the storage and processing platform of choice for Big Data. In this tutorial, I will give an overview of Apache Hadoop and its ecosystem, with specific use cases. I will explain the MapReduce programming framework in detail, and outline how it interacts with Hadoop Distributed File System (HDFS). While Hadoop is written in Java, MapReduce applications can be written using a variety of languages using a framework called Hadoop Streaming. I will give several examples of MapReduce applications using Hadoop Streaming.
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Citus Data
As a developer using PostgreSQL one of the most important tasks you have to deal with is modeling the database schema for your application. In order to achieve a solid design, it’s important to understand how the schema is then going to be used as well as the trade-offs it involves.
As Fred Brooks said: “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.”
In this talk we're going to see practical normalisation examples and their benefits, and also review some anti-patterns and their typical PostgreSQL solutions, including Denormalization techniques thanks to advanced Data Types.
Introduction to MariaDB. Covers the history of Structured Query language, MySQL and MariaDB, shows how to install on Windows, Mac or Linux desktop, and practical examples.
MariaDB's Andrew Hutchings and Shane Johnson walk through new features of the MariaDB ColumnStore storage engine, tools and adapters, then provide a sneak peak at what's planned for the next release.
MariaDB 10.2 New Features for Developers, Administrators and DevOps. Window Functions, Common Table Expressions, Check Constraints, GeoJSON, GIS, JSON, Oracle compatibility and MariaDB Connectors
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
4. Introducing pivot tables (1)
●
Pivot tables, or CROSSTABS
●
Statistical reports
●
Spreadsheets tools
●
Usually on the client side
●
On servers, dedicated tools
●
Some vendors have language
extensions
●
Can be done in standard SQL
4
5. Introducing pivot tables (2)
Some online resources
●
MySQL wizardry (2001)
●
DBIx::SQLCrosstab (2004)
●
See the addresses on my site
(http://datacharmer.org)
5
6. Introducing pivot tables (3)
An example (raw data)
++
| person |
+++++
| id | name | gender | dept |
+++++
| 1 | John | m | pers |
| 2 | Mario | m | pers |
| 7 | Mary | f | pers |
| 8 | Bill | m | pers |
| 3 | Frank | m | sales |
| 5 | Susan | f | sales |
| 6 | Martin | m | sales |
| 4 | Otto | m | dev |
| 9 | June | f | dev |
+++++
6
7. Introducing pivot tables (4)
An example (crosstab)
++
| dept by gender |
+++++
| dept | m | f | total |
+++++
| dev | 1 | 1 | 2 |
| pers | 3 | 1 | 4 |
| sales | 2 | 1 | 3 |
+++++
7
16. The spreadsheet problems (4)
●
Copying the raw data could
be a long operation
●
The spreadsheet can handle a
limited number of records
●
The results could be different
in different clients
●
It is not easy to scale
16
18. Dedicated tools
CON
●
Client side tools are resource
hogs
●
Server side tools are either
complicated to install and
maintain or expensive (or
both)
18
19. The SQL way (1)
The query for that crosstab
SELECT dept,
COUNT(CASE
WHEN gender = 'm' THEN id
ELSE NULL END) AS m,
COUNT(CASE
WHEN gender = 'f'
THEN id ELSE NULL END) AS f,
COUNT(*) AS total
FROM person
GROUP BY dept
19
row
header
20. The SQL way (2)
The query for that crosstab
SELECT dept,
COUNT(CASE
WHEN gender = 'm' THEN id
ELSE NULL END) AS m,
COUNT(CASE
WHEN gender = 'f'
THEN id ELSE NULL END) AS f,
COUNT(*) AS total
FROM person
GROUP BY dept
20
column
headers
21. The SQL way (3)
Two passes
SELECT dept,
COUNT(CASE
WHEN gender = 'm' THEN id
ELSE NULL END) AS m,
COUNT(CASE
WHEN gender = 'f'
THEN id ELSE NULL END) AS f,
COUNT(*) AS total
FROM person
GROUP BY dept
21
Previously
determined
Calculated
in this query
22. The SQL way (4)
Two passes
# 1
SELECT DISTINCT gender FROM person;
++
| gender |
++
| m |
| f |
++
# build the second query
# 2
CROSSTAB QUERY
22
23. The SQL way (5)
PROBLEMS
●
The two passes are not
coordinated
●
External languages needed
(Perl, PHP, Java, etc.)
●
For each language, you need
appropriate routines
23
24. Pivot tables and stored
routines (1)
PRO
●
Bundling the two steps for
crosstab creation
●
Adding features easily
●
Same interface from different
host languages
24
25. Pivot tables and stored
routines (2)
CON
●
Less expressive than most
languages
●
More burden for the server
25
34. Cursor + temp table (1)
# dynamic query
SET @q = CONCAT('CREATE TABLE temp
SELECT DISTINCT ', col_name, ' AS mycol
FROM ', table_name);
PREPARE d FROM @q
EXECUTE d;
# routine with cursor
DECLARE dp char(10);
DECLARE column_list TEXT default '';
DECLARE CURSOR get_cols FOR
SELECT mycol FROM temp;
34
35. Cursor + temp table (2)
PRO
●
No memory limits
●
dynamic
35
36. Cursor + temp table (2)
CON
●
Requires a new table for each
crosstab
●
Data needs to be read twice
36
37. Cursor + dedicated routine
(1)
# Create a routine from a template
# use routine with cursor
DECLARE dp char(10);
DECLARE column_list TEXT default '';
DECLARE CURSOR get_cols FOR
SELECT gender FROM person;
37
38. Cursor + dedicated routine
(2)
PRO
●
No memory limits
●
data is read only once
●
no temporary tables
38
39. Cursor + dedicated routine
(3)
CON
●
Not dynamic
●
Dedicated routines need to
be created by external
language
39
40. Cursor + Higher Order
Routines (1)
●
Higher Order Routines are
routines that create other
routines
●
It IS NOT STANDARD
●
It is, actually, a hack
40
41. Cursor + Higher Order
Routines (2)
●
Very UNOFFICIALLY
●
You can create a routine from
a MySQL routine
41
42. Cursor + Higher Order
Routines (3)
PRO
●
Dynamic
●
No memory limits
●
No temporary tables
●
Data is read only once
●
Efficient
42
43. Cursor + Higher Order
Routines (4)
CON
●
You need just ONE ROUTINE
created with SUPER
privileges (access to
mysql.proc table)
43
44. 44
DISCLAIMERDISCLAIMER
The Very UnofficialVery Unofficial method
described in this presentation:
➢
Is notnot recommended by MySQL AB
➢
Is notnot guaranteed to work in future
releases of MySQL
➢
May result in any number of
damages to your data, from bird
flu to accidental explosions.
➢
You use it at your own risk
HACK!HACK!
45. Cursor + Higher Order
Routines (5)
HOW
●
Download the code (all SQL)
●
(http://datacharmer.org)
●
Unpack and run
mysql t < crosstab_install.mysql
●
then use it
mysql> CALL crosstab_help()
45
54. One step further
Object Oriented Crosstabs (1)
●
Based on the Crosstab
package
●
Requires the MySQL Stored
Routines Library (mysql-sr-lib)
http://datacharmer.org
●
It's a O-O like approach
54
55. Object Oriented crosstab (2)
HOW
●
Download the code (all SQL)
●
(http://datacharmer.org)
●
Unpack and run
mysql < oo_crosstab_install.mysql
●
then use it
mysql> CALL oo_crosstab_help()
55
64. Object Oriented Crosstab (11)
# Exporting crosstab query
CALL xtab_query('gender_by_dept')G
****************** 1. row ********************
query for crosstab:
SELECT gender
, sum( CASE WHEN department = 'pers' THEN salary
ELSE NULL END) AS `pers`
, sum( CASE WHEN department = 'sales' THEN salary
ELSE NULL END) AS `sales`
, sum( CASE WHEN department = 'dev' THEN salary
ELSE NULL END) AS `dev`
, sum(salary) as total
FROM person inner join departments using(dept_id)
GROUP BY gender
64
65. Object Oriented Crosstab (12)
# Listing crosstabs
CALL xtab_list();
+++
| CROSSTAB | executable |
+++
| category_by_rating | ok |
| rating_by_category | ok |
| location_by_dept | ok |
| dept_by_location | ok |
| gender_by_location | ok |
| location_by_gender | ok |
| dept_by_gender | ok |
| gender_by_dept | ok |
+++
65
66. Parting thoughts
●
Pivot tables can be made in
SQL
●
Pivot tables in normal SQL
require external languages
●
With stored routines, there
are various techniques
●
Efficient crosstabs are
feasible with well designed
stored routines
66