Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SQL coding at Sydney Measure Camp 2018

101 views

Published on

Quick Talk about the best practices on SQL, databases and clean code.

Published in: Technology
  • Be the first to comment

SQL coding at Sydney Measure Camp 2018

  1. 1. SQL Adilson Mendonca
  2. 2. select i_ITEM_id,i_item_desc,i_category,i_class,i_CURRENT_price, sum(cs_EXT_sales_price) as itemrevenue, sum(cs_ext_sales_price) *100/sum(sum(cs_ext_sales_price)) over (partition by I_class) as revenueratio From catalog_sales,item,date_dim where cs_item_sk = i_item_sk AND i_category in ('Jewelry', 'Sports', 'Books') and cs_sold_date_sk = d_date_sk and cast(d_date as timestamp) between cast('2001-01-12' as timestamp) and (cast('2001-01-12' as timestamp) + interval 30 days) group by i_item_id,i_item_desc,i_category,i_class,i_current_price order by i_category,i_class,i_item_id,i_item_desc,revenueratio limit 100;
  3. 3. Code Visibility
  4. 4. select i_ITEM_id, I_item_desc, I_category, I_class, i_CURRENT_price, sum(cs_EXT_sales_price) as itemrevenue, sum(cs_ext_sales_price) *100/sum(sum(cs_ext_sales_price)) over (partition by I_class) as revenueratio FROM catalog_sales,item,date_dim where cs_item_sk = i_item_sk AND i_category in ('Jewelry', 'Sports', 'Books') and cs_sold_date_sk = d_date_sk and cast(d_date as timestamp) between cast('2001-01-12' as timestamp) and (cast('2001-01-12' as timestamp) + interval 30 days) group by i_item_id,i_item_desc,i_category,i_class,i_current_price order by i_category,i_class,i_item_id,i_item_desc,revenueratio limit 100;
  5. 5. select i_ITEM_id, I_item_desc, I_category, I_class, i_CURRENT_price, sum(cs_EXT_sales_price) as itemrevenue, sum(cs_ext_sales_price) *100/sum(sum(cs_ext_sales_price)) over (partition by I_class) as revenueratio FROM catalog_sales,item,date_dim where cs_item_sk = i_item_sk AND i_category in ('Jewelry', 'Sports', 'Books') and cs_sold_date_sk = d_date_sk and cast(d_date as timestamp) between cast('2001-01-12' as timestamp) and (cast('2001-01-12' as timestamp) + interval 30 days) group by i_item_id, I_item_desc, I_category, I_class, i_current_price order by i_category, I_class, I_item_id, I_item_desc, revenueratio limit 100;
  6. 6. SELECT i_item_id, ,i_item_desc ,i_category ,i_class ,i_current_price ,SUM(cs_ext_sales_price) AS item_revenue ,SUM(cs_ext_sales_price) * 100 / SUM(SUM(cs_ext_sales_price)) OVER (PARTITION BY i_class) AS revenue_ratio FROM catalog_sales JOIN item ON cs_item_sk = i_item_sk JOIN date_dim ON cs_sold_date_sk = d_date_sk WHERE i_category IN ('Jewelry', 'Sports', 'Books') AND CAST(d_date AS TIMESTAMP) BETWEEN CAST('2001-01-12' AS TIMESTAMP) AND CAST('2001-01-12' AS TIMESTAMP) + INTERVAL 30 DAYS GROUP BY 1, 2, 3, 4, 5 ORDER BY 3, 4, 1 LIMIT 100
  7. 7. SELECT item.i_item_id, item.i_item_desc, item.i_category, item.i_class, item.i_current_price, SUM(catalog_sales.cs_ext_sales_price) AS item_revenue, SUM(catalog_sales.cs_ext_sales_price) * 100 / SUM(SUM(catalog_sales.cs_ext_sales_price)) OVER (PARTITION BY item.i_class) AS revenue_ratio FROM catalog_sales JOIN item ON catalog_sales.cs_item_sk = item.i_item_sk JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk WHERE item.i_category IN ('Jewelry', 'Sports', 'Books') AND CAST(date_dim.d_date AS TIMESTAMP) BETWEEN CAST('2001-01-12' AS TIMESTAMP) AND CAST('2001-01-12' AS TIMESTAMP) + INTERVAL 30 DAYS GROUP BY 1, 2, 3, 4, 5 ORDER BY 3, 4, 1 LIMIT 100
  8. 8. Over commenting -- get id, name and open only once SELECT DISTINCT t1.id , t1.name , t3.open -- select names from table 1, order them by date and return first 3 FROM (SELECT id , name FROM table1 ORDER BY date DESC LIMIT 3) AS t1 -- get open from table 2 IF id is there, open = 1 and type =2 LEFT JOIN ( ( SELECT open , name_id FROM table2 WHERE open=1 AND type=2 ) AS t3 ) ON t1.id = t3.name_id -- order by name from A-Z ORDER BY t1.name ASC
  9. 9. Follow Patterns Indentation Don’t over comment - just clear code Remove commented lines of code Use alias on all column when joining Know your PKs and Unique keys Make easy future maintenance Execution time JUST select the tables/columns which be in USE
  10. 10. Code block
  11. 11. WITH patient_data AS (SELECT patient_id, patient_name, hospital, drug_dosage FROM hospital_registry WHERE (COALESCE(last_visit,NOW()) > NOW() - INTERVAL '14 days') AND city = "Los Angeles" ), average_dosage AS (SELECT hospital, AVG(drug_dosage) AS Average FROM patient_data GROUP BY hospital ) SELECT count(hospital) FROM average_dosage WHERE drug_dosage > 1000
  12. 12. Master the use of: Functions Window Functions (OLAP functions) CTE - Common Table Expression Views UDF - User Defined Function
  13. 13. Lost
  14. 14. Data Structure Build an ERD if you don’t have Primary Keys Unique Keys Table Size Number of columns (columnar databases) KNOW YOUR DATA
  15. 15. Data Modelling
  16. 16. Modelling techniques Transactional 3NF Star Schema - Data Marts Integration - Data Vault Data Lake - Big Data Flat tables
  17. 17. Data Lake or Data swamp
  18. 18. AVOID the journey to a SWAMP Organize your data and contents Use Name conventions - rules Be aware of object creation Document them on same way
  19. 19. Verbose
  20. 20. Customer id name date Sales id date Cust_id amount Customer id_customer name date_of_birth Sales id_sales sales_date fk_customer amount_inc_tax SELECT c.name AS customer_name, c.date AS customer.dob, s.date AS sales_date, COUNT(1) AS no_of_sales, SUM(amount) AS amount FROM customer c JOIN sales s ON c.id = s.id SELECT c.name AS customer_name, c.date_of_birth, s.sales_date, COUNT(1) no_of_sales, SUM(amount_inc_tax) AS amount_inc_tax FROM customer c JOIN sales s ON c.id_customer = s.fk_customer
  21. 21. Minimise usage of non standard abbreviations Don’t use too long names - you will need to type them one day PK & FK should be a pattern ID & table name, FK and link table Maybe: Data types definition on names like: price_amt, tax_pct
  22. 22. Know technologies
  23. 23. Know you databases & tools Differences, limitations, strength and weakness Columnar databases Functionalities Access & Security
  24. 24. Help make a better world with beautiful code!!!

×