SlideShare a Scribd company logo
1 of 23
SQL Query Review
An Refresher and How-To Profile
Data using SQL
Goals of the Activity
• Learn to connect to our IST722 Server and use its databases.
• Data profiling – “Getting to know your data”
• Why is it important?
• How to you use SQL to do it?
• Why use SQL to do this?
• Review of SQL Important to the course
• Mastering SELECT and JOINS
• Understand the need for data warehousing
Connecting to the IST722 SQL Server in the Labs
• Server Name
ist-cs-dw1.ad.syr.edu
• Credentials
Windows Authentication
NOTE: Uses identity of current
logged on user, so you must
connect from a lab or remote
lab computer!
Connecting: Remote Lab
• Remote Desktop Access to iSchool Labs.
• Easy to use. Works from anywhere!
• For when you need to use our software to complete your
work for this course, but you cannot get to the computer
labs.
• https://remotelab.ischool.syr.edu
Connecting: Your Own Device
IMPORTANT: These instructions are for advanced users. No support will
be given to students using this option. Instructions provided as-is.
Steps:
1. Install SQL Server Developer Edition.
• NOTE: It must be this version as SSAS and SSIS are required.
2. Make an Off-Domain Shortcut.
https://answers.syr.edu/display/ischool/Connecting+to+Microsoft+
SQL+Server+-+OFF+domain
IST722 Databases on the Server
Data Warehouse DB
OLTP Source for Sample Data
Sources we use in our Project
Sample OLTP Retail DB
Your workspace for DW data
Your workspace for Stage data
Netflix movie / DVD rental data
Sample Retail data for Labs
What is Data Profiling?
• The analysis of data sources to be used in the data warehouse.
• Goals
• Understand: Structure, content, relationships, and quality of your data and
metadata (schema).
• Recognize the features and limitations of your data source.
• Checklist, per table:
• What does a single row in this data set mean?
• What makes each row unique? (Business Key)
• What are the relationships among the data?
• Do you understand the schema? (Column Definitions)
A.k.a “Getting Intimate With Your Data”
Data Warehousing is about:
empowering business users to make intelligent
decisions with their data…
…Which is difficult because typically our data is
in a format less conducive to this goal.
Business Questions
Remote Lab Data Set Questions
• When was the most recent login?
• On which days was the Remote Lab Full?
• What’s the GPA of the last 10 students who logged in?
• What are the majors of non-ischool students who logged in the last 2 months?
• How many logins in the month of November 2014?
• How many freshman used remote lab last semester?
• How many different / unique Sophomores logged on in December 2014?
• How many students did not login to remote lab?
• What was the busiest time of day? Day of week?
• Which days of the week are busier than the average?
How do we go about answering these questions?
SQL SELECT  Reads Data
SELECT col1, col2, ...
FROM table
WHERE condition
ORDER BY columns
Columns To
Display
Table to
use
Only
return
rows
matching
this
condition
Sort row
output by
data in
these
columns
SQL SELECT STATEMENT
• HOW WE “SAY” IT
1. SELECT (Projection)
2. FROM
3. WHERE
4. ORDER BY
• HOW IT IS PROCESSED
1. FROM
2. WHERE
3. SELECT (Projection)
4. ORDER BY
Examples:
• On which dates was the Remote Lab Full?
• When was the most recent login?
Before you begin, you’ve got to know your data:
• What does one row in the table mean?
• What makes each row unique?
• What do the columns mean?
JOINS
• JOINS let you combine data from more than
one table into your query output
• Most of the time you join on PK-FK pairs
• Any columns of the same type can be joined
• Most common join is an inner join
SELECT *
FROM tablea
JOIN tableb ON acol = bcol
tablea tableb
join
Outer Joins
• For those situations where
you need to include rows
from one or more tables
across the join criteria.
• In the diagram, let’s assume
• A == Customers
• B == Orders
Examples:
• What’s the GPA of the last 10 students who logged in?
• What are the majors of non-ischool students who logged in the last 2
months?
• Is there anyone who used remote lab but is not in the student table?
Aggregates
• They summarize your data… You no longer get a real row returned,
but a summary of rows from the table.
• Aggregate operators:
• Count, Count distinct, Sum, Min, Max, Avg
• GROUP BY Columns which the aggregate operator will summarize by.
• HAVING Like WHERE only filters after the aggregate has been done.
FULL SQL SELECT STATEMENT
• HOW WE “SAY” IT
1. SELECT (Projection)
2. TOP / DISTINCT
3. FROM
4. WHERE
5. GROUP BY
6. HAVING
7. ORDER BY
• HOW IT IS PROCESSED
1. FROM
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT (Projection)
6. ORDER BY
7. TOP / DISTINCT
Examples:
• How many logins in the month of November 2014?
• How many undergrads freshman / so / jr / sr used remote lab last
semester?
• How many different / unique Sophomores logged on in December
2014?
• How many students did not login to remote lab?
• What was the busiest time of day? Day of week?
Sub Selects
• The full power of the SELECT statement in that you can use it as a table,
column or condition for another SELECT statement.
• In FROM:
SELECT x.*
FROM (SELECT * FROM table1) x
• In Projection:
SELECT (SELECT TOP 1 col1 FROM table1 ) col1
FROM table2 y
• In WHERE:
SELECT x.*
FROM table1 x
WHERE x.col1 IN (SELECT col1 FROM table2 )
Examples
• Which days of the week are busier than the average (from a count of
logins)?
• For the last semester’s logins for ischool grad students only, list
program, total logins per program, total logins for all grads and the
percentage total for each program. Example:
Program Lgns Total PctOfTot
LIS 100 500 20%
IM 250 500 50%
TNM 150 500 30%
Handling Slow Query Processing
• Sometimes your source is not responsive enough for data exploration.
• Fix:
• Copy source data into your Operational Data Store
SELECT * INTO newtable FROM …
or
INSERT INTO table SELECT * FROM …
• Set your business keys as primary keys of the table.
• If performance still lags, Index as required / suggested.
• This is a temporary solution, just for profiling.
Activity Summary
Data Warehousing is about empowering business users to make
intelligent decisions with their data. So…
• How would a business user get these questions answered?
• This is hard work… and you’re technically savvy.
• It’s not practical to write an SQL statement for every business
question we need answered. That does not scale!
• We need to find a better way to re-organize this data so that we can
accomplish the end goal of empowering business users.
• That’s rationale behind data warehousing and the essence of what
you’ll learn in this course.
SQL Query Review
An Refresher and How-To Profile
Data using SQL

More Related Content

Similar to unit01-Activity-SQL-Query-Review-1.pptx

IGeLU 2014
IGeLU 2014IGeLU 2014
IGeLU 2014jhkrug
 
E library mangment system presentation
E library mangment system presentationE library mangment system presentation
E library mangment system presentationraajamohan
 
Relational data modeling trends for transactional applications
Relational data modeling trends for transactional applicationsRelational data modeling trends for transactional applications
Relational data modeling trends for transactional applicationsIke Ellis
 
PBS Works at Imperial College: 30 Million Jobs a Year and Counting
PBS Works at Imperial College: 30 Million Jobs a Year and CountingPBS Works at Imperial College: 30 Million Jobs a Year and Counting
PBS Works at Imperial College: 30 Million Jobs a Year and Countinginside-BigData.com
 
What Are Your Servers Doing While You’re Sleeping?
What Are Your Servers Doing While You’re Sleeping?What Are Your Servers Doing While You’re Sleeping?
What Are Your Servers Doing While You’re Sleeping?Tracy McKibben
 
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...Datavail
 
Week 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptxWeek 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptxNurulIzrin
 
SQL Server 2014 Monitoring and Profiling
SQL Server 2014 Monitoring and ProfilingSQL Server 2014 Monitoring and Profiling
SQL Server 2014 Monitoring and ProfilingAbouzar Noori
 
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Fwdays
 
Guided Interaction: Rethinking the Query-Result Paradigm
Guided Interaction: Rethinking the Query-Result ParadigmGuided Interaction: Rethinking the Query-Result Paradigm
Guided Interaction: Rethinking the Query-Result Paradigmarnabdotorg
 
Texas Rangers to the rescue: turning your VLE into an exam centre
Texas Rangers to the rescue: turning your VLE into an exam centreTexas Rangers to the rescue: turning your VLE into an exam centre
Texas Rangers to the rescue: turning your VLE into an exam centreBlackboardEMEA
 
Enrolments using external database
Enrolments using external databaseEnrolments using external database
Enrolments using external databaseAlex Walker
 
c++ library management
c++ library managementc++ library management
c++ library managementshivani menon
 
Roll No. 26, 31 to 39 (G4) Use of MS-Excel in CA Profession (wecompress.com)....
Roll No. 26, 31 to 39 (G4) Use of MS-Excel in CA Profession (wecompress.com)....Roll No. 26, 31 to 39 (G4) Use of MS-Excel in CA Profession (wecompress.com)....
Roll No. 26, 31 to 39 (G4) Use of MS-Excel in CA Profession (wecompress.com)....VinayakPoddar2
 
Foundations of Data Analytics
Foundations of Data AnalyticsFoundations of Data Analytics
Foundations of Data Analyticsmrichards1
 

Similar to unit01-Activity-SQL-Query-Review-1.pptx (20)

IGeLU 2014
IGeLU 2014IGeLU 2014
IGeLU 2014
 
E library mangment system presentation
E library mangment system presentationE library mangment system presentation
E library mangment system presentation
 
Requirement and System Analysis
Requirement and System AnalysisRequirement and System Analysis
Requirement and System Analysis
 
Requirement and system analysis
Requirement and system analysisRequirement and system analysis
Requirement and system analysis
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
Relational data modeling trends for transactional applications
Relational data modeling trends for transactional applicationsRelational data modeling trends for transactional applications
Relational data modeling trends for transactional applications
 
PBS Works at Imperial College: 30 Million Jobs a Year and Counting
PBS Works at Imperial College: 30 Million Jobs a Year and CountingPBS Works at Imperial College: 30 Million Jobs a Year and Counting
PBS Works at Imperial College: 30 Million Jobs a Year and Counting
 
What Are Your Servers Doing While You’re Sleeping?
What Are Your Servers Doing While You’re Sleeping?What Are Your Servers Doing While You’re Sleeping?
What Are Your Servers Doing While You’re Sleeping?
 
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
 
Week 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptxWeek 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptx
 
SQL Workshop
SQL WorkshopSQL Workshop
SQL Workshop
 
SQL Server 2014 Monitoring and Profiling
SQL Server 2014 Monitoring and ProfilingSQL Server 2014 Monitoring and Profiling
SQL Server 2014 Monitoring and Profiling
 
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
 
Guided Interaction: Rethinking the Query-Result Paradigm
Guided Interaction: Rethinking the Query-Result ParadigmGuided Interaction: Rethinking the Query-Result Paradigm
Guided Interaction: Rethinking the Query-Result Paradigm
 
Philly TechFest SQL Indexes
Philly TechFest SQL IndexesPhilly TechFest SQL Indexes
Philly TechFest SQL Indexes
 
Texas Rangers to the rescue: turning your VLE into an exam centre
Texas Rangers to the rescue: turning your VLE into an exam centreTexas Rangers to the rescue: turning your VLE into an exam centre
Texas Rangers to the rescue: turning your VLE into an exam centre
 
Enrolments using external database
Enrolments using external databaseEnrolments using external database
Enrolments using external database
 
c++ library management
c++ library managementc++ library management
c++ library management
 
Roll No. 26, 31 to 39 (G4) Use of MS-Excel in CA Profession (wecompress.com)....
Roll No. 26, 31 to 39 (G4) Use of MS-Excel in CA Profession (wecompress.com)....Roll No. 26, 31 to 39 (G4) Use of MS-Excel in CA Profession (wecompress.com)....
Roll No. 26, 31 to 39 (G4) Use of MS-Excel in CA Profession (wecompress.com)....
 
Foundations of Data Analytics
Foundations of Data AnalyticsFoundations of Data Analytics
Foundations of Data Analytics
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

unit01-Activity-SQL-Query-Review-1.pptx

  • 1. SQL Query Review An Refresher and How-To Profile Data using SQL
  • 2. Goals of the Activity • Learn to connect to our IST722 Server and use its databases. • Data profiling – “Getting to know your data” • Why is it important? • How to you use SQL to do it? • Why use SQL to do this? • Review of SQL Important to the course • Mastering SELECT and JOINS • Understand the need for data warehousing
  • 3. Connecting to the IST722 SQL Server in the Labs • Server Name ist-cs-dw1.ad.syr.edu • Credentials Windows Authentication NOTE: Uses identity of current logged on user, so you must connect from a lab or remote lab computer!
  • 4. Connecting: Remote Lab • Remote Desktop Access to iSchool Labs. • Easy to use. Works from anywhere! • For when you need to use our software to complete your work for this course, but you cannot get to the computer labs. • https://remotelab.ischool.syr.edu
  • 5. Connecting: Your Own Device IMPORTANT: These instructions are for advanced users. No support will be given to students using this option. Instructions provided as-is. Steps: 1. Install SQL Server Developer Edition. • NOTE: It must be this version as SSAS and SSIS are required. 2. Make an Off-Domain Shortcut. https://answers.syr.edu/display/ischool/Connecting+to+Microsoft+ SQL+Server+-+OFF+domain
  • 6. IST722 Databases on the Server Data Warehouse DB OLTP Source for Sample Data Sources we use in our Project Sample OLTP Retail DB Your workspace for DW data Your workspace for Stage data Netflix movie / DVD rental data Sample Retail data for Labs
  • 7. What is Data Profiling? • The analysis of data sources to be used in the data warehouse. • Goals • Understand: Structure, content, relationships, and quality of your data and metadata (schema). • Recognize the features and limitations of your data source. • Checklist, per table: • What does a single row in this data set mean? • What makes each row unique? (Business Key) • What are the relationships among the data? • Do you understand the schema? (Column Definitions) A.k.a “Getting Intimate With Your Data”
  • 8. Data Warehousing is about: empowering business users to make intelligent decisions with their data… …Which is difficult because typically our data is in a format less conducive to this goal.
  • 9. Business Questions Remote Lab Data Set Questions • When was the most recent login? • On which days was the Remote Lab Full? • What’s the GPA of the last 10 students who logged in? • What are the majors of non-ischool students who logged in the last 2 months? • How many logins in the month of November 2014? • How many freshman used remote lab last semester? • How many different / unique Sophomores logged on in December 2014? • How many students did not login to remote lab? • What was the busiest time of day? Day of week? • Which days of the week are busier than the average? How do we go about answering these questions?
  • 10. SQL SELECT  Reads Data SELECT col1, col2, ... FROM table WHERE condition ORDER BY columns Columns To Display Table to use Only return rows matching this condition Sort row output by data in these columns
  • 11. SQL SELECT STATEMENT • HOW WE “SAY” IT 1. SELECT (Projection) 2. FROM 3. WHERE 4. ORDER BY • HOW IT IS PROCESSED 1. FROM 2. WHERE 3. SELECT (Projection) 4. ORDER BY
  • 12. Examples: • On which dates was the Remote Lab Full? • When was the most recent login? Before you begin, you’ve got to know your data: • What does one row in the table mean? • What makes each row unique? • What do the columns mean?
  • 13. JOINS • JOINS let you combine data from more than one table into your query output • Most of the time you join on PK-FK pairs • Any columns of the same type can be joined • Most common join is an inner join SELECT * FROM tablea JOIN tableb ON acol = bcol tablea tableb join
  • 14. Outer Joins • For those situations where you need to include rows from one or more tables across the join criteria. • In the diagram, let’s assume • A == Customers • B == Orders
  • 15. Examples: • What’s the GPA of the last 10 students who logged in? • What are the majors of non-ischool students who logged in the last 2 months? • Is there anyone who used remote lab but is not in the student table?
  • 16. Aggregates • They summarize your data… You no longer get a real row returned, but a summary of rows from the table. • Aggregate operators: • Count, Count distinct, Sum, Min, Max, Avg • GROUP BY Columns which the aggregate operator will summarize by. • HAVING Like WHERE only filters after the aggregate has been done.
  • 17. FULL SQL SELECT STATEMENT • HOW WE “SAY” IT 1. SELECT (Projection) 2. TOP / DISTINCT 3. FROM 4. WHERE 5. GROUP BY 6. HAVING 7. ORDER BY • HOW IT IS PROCESSED 1. FROM 2. WHERE 3. GROUP BY 4. HAVING 5. SELECT (Projection) 6. ORDER BY 7. TOP / DISTINCT
  • 18. Examples: • How many logins in the month of November 2014? • How many undergrads freshman / so / jr / sr used remote lab last semester? • How many different / unique Sophomores logged on in December 2014? • How many students did not login to remote lab? • What was the busiest time of day? Day of week?
  • 19. Sub Selects • The full power of the SELECT statement in that you can use it as a table, column or condition for another SELECT statement. • In FROM: SELECT x.* FROM (SELECT * FROM table1) x • In Projection: SELECT (SELECT TOP 1 col1 FROM table1 ) col1 FROM table2 y • In WHERE: SELECT x.* FROM table1 x WHERE x.col1 IN (SELECT col1 FROM table2 )
  • 20. Examples • Which days of the week are busier than the average (from a count of logins)? • For the last semester’s logins for ischool grad students only, list program, total logins per program, total logins for all grads and the percentage total for each program. Example: Program Lgns Total PctOfTot LIS 100 500 20% IM 250 500 50% TNM 150 500 30%
  • 21. Handling Slow Query Processing • Sometimes your source is not responsive enough for data exploration. • Fix: • Copy source data into your Operational Data Store SELECT * INTO newtable FROM … or INSERT INTO table SELECT * FROM … • Set your business keys as primary keys of the table. • If performance still lags, Index as required / suggested. • This is a temporary solution, just for profiling.
  • 22. Activity Summary Data Warehousing is about empowering business users to make intelligent decisions with their data. So… • How would a business user get these questions answered? • This is hard work… and you’re technically savvy. • It’s not practical to write an SQL statement for every business question we need answered. That does not scale! • We need to find a better way to re-organize this data so that we can accomplish the end goal of empowering business users. • That’s rationale behind data warehousing and the essence of what you’ll learn in this course.
  • 23. SQL Query Review An Refresher and How-To Profile Data using SQL