2. We have a dataset containing sales transactions by customer
But we need to extract a unique list of Customer IDs
3. Here is an example
The select distinct command
returns a list from the data source
with all duplicates removed
select distinct [CustomerID]
from [AdventureWorks2019].[dbo].[vw_Sales_by_Customer]
order by [CustomerID]
4. But suppose we want to know more
This customer has 3 separate
orders, and we want to create a
summary by customer
select [CustomerID], SUM(LineTotal) as TotalSales
, COUNT([shipdate]) as [Count of ShipDates]
, COUNT (distinct [shipdate]) as [Count of Unique ShipDates]
from [AdventureWorks2019].[dbo].[vw_Sales_by_Customer]
where customerID = 11000
group by [CustomerID]
5. Breaking this down
COUNT([shipdate])
as [Count of ShipDates]
This simply returns a count of the number
of values in the record set. In this case
there are 8 order lines, so it doesn’t give
us what we need
COUNT (distinct [shipdate])
as
[Count of Unique ShipDates]
Adding ‘distinct’ eliminates the
duplicates and returns the number of
unique ship dates
6. Use Cases
Adding distinct into your queries is a great way of excluding duplication within the records
This can be very useful in things such as customer classification, capturing the frequency of purchase
It can also be very useful to check for duplicates within a data set, by comparing the count and count distinct, if they are
the same then all relevant records must be unique
-- Customers with only one order
select [CustomerID], TotalSales from
(
select [CustomerID], SUM(LineTotal) as TotalSales
, COUNT( [shipdate]) as [Count of ShipDates]
, COUNT (distinct [shipdate]) as [Count of Unique
ShipDates]
from [AdventureWorks2019].[dbo].[vw_Sales_by_Customer]
group by [CustomerID]
) a
where [Count of Unique ShipDates] = 1
-- Customers with more than one order
select [CustomerID], TotalSales from
(
select [CustomerID], SUM(LineTotal) as TotalSales
, COUNT( [shipdate]) as [Count of ShipDates]
, COUNT (distinct [shipdate]) as [Count of Unique
ShipDates]
from
[AdventureWorks2019].[dbo].[vw_Sales_by_Customer]
group by [CustomerID]
) a
where [Count of Unique ShipDates] > 1