How to use the SELECT DISTINCT statement in SQL
An Introduction, Syntax and Use Cases
SELECT DISTINCT statement by Select Distinct Limited
This will confuse the SEO :)
https://www.selectdistinct.co.uk/2023/03/23/select-distinct-statement-sql/
#selectdistinct #SQL #DataAnalytics
How to create running totals in SQL
As an analyst being able to determine the balance at the point of a transaction is often highly useful
This short video shows you how to use a 'window' function to create a running total
#SQL #SQLTIPS #SELECTDISTINCT
CIS276DB Module 6 Assignment 1. Write a select sta.docxclarebernice
CIS276DB
Module 6 Assignment
1. Write a select statement based on the InvoiceTotal column of the Invoices table:
Use the CAST function to return the first column as an integer value.
Name it IntTotal. Name it IntTotal.
Use the CAST function to return the second column as datatype decimal
with one digit to the right. Name it DecimalTotal.
Use the CONVERT function to return the third column as a datatype that
outputs 2 digits to the right of the decimal point and all comma’s to the
left (i.e. 3, 106.34). Name it FormatTotal.
2. Write a select statement that returns 4 columns based on the Vendors table:
(Column name- Name): this column should be formatted in the following
way; VendorContactFName followed by the last initial and a period
(example: “John S.”).
(Column name- StateInitial): the VendorState first initial in lowercase.
(Column name- Phone): VendorPhone without the area code
(Column name- TodaysDate): the current date formatted like- Apr 18,
2008
Filter the results to only return rows where the VendorPhone prefix is equal to
‘(800)’. Sort the results by VendorState and LastName.
3. Business Case: The current date is 12/1/2008; the accounting department
would like to know which invoices with a balance due are still outstanding and
the current age in days their invoice is beyond the invoice date.
Write a select statement that returns 4 columns: VendorName, InvoiceTotal,
InvoiceDate and InvoiceAge (use the appropriate function that will return the
number of days between the InvoiceDate and ‘12/1/2008’).
Filter the results to only return rows where there is a balance due and the
InvoiceAge is greater than 132. Sort the results by VendorName.
4. Write a select statement that returns 7 columns:
InvoiceDate
(Column name- WrittenDate): use the function that will convert
InvoiceDate to this format; Apr 18, 2008
(Column name- NewDate): use the function that will add 45 days to
InvoiceDate and convert it to this format; Apr 18, 2008
(Column name- DayOfWeek): Use the function that will return the name
of the day of NewDate (i.e. Saturday)
(Column name- MonthPart): Use the function that will return the name
of the month of NewDate (i.e March)
(Column name- DatePart): Use the function that will return the day date
of NewDate (i.e. 18 {of Apr 18, 2008})
Column name- YearPart): Use the function that will return the year from
NewDate (i.e. 2008)
Sort the results by InvoiceDate.
5. Business Case: The executive committee is implementing a purchase discount
program based on the invoice total for a vendor. As such, they need to gauge how
many invoices might qualify for a discount. Invoices that are below $100 will NOT
qualify for a discount. Invoices between 101 and $500 are a low consideration,
invoices between 501 and $1000 are a higher consideration and invoices above
$1000 are the highest consideration.
Write a sel ...
Below is my code- I have an error that I still have difficulty figurin.pdfarmanuelraj
Below is my code. I have an error that I still have difficulty figuring out. Please explain and
teach me the solution to fix it specifically (e.g. changing which line in the code). Thank you!
main.cpp
/*
Overloaded stream insertion operator <<
- used to display reports and write data to file.
Overloaded relational operator (<)
- used to sort the array in ascending order by name (insertion sort)
*/
#include "Sales.h"
#include <iostream>
#include <sstream>
#include <iomanip>
#include <fstream>
#include <string>
using namespace std;
const int MAX_SIZE = 30;
/* Write your code here:
declare the function you are going to call in this program
*/
void readData(string fileName, Sales *salesArr, int &size);
void insertSort(Sales *salesArr, int size);
double calcSalesAvg(Sales *salesArr, int size);
void displayOverAvg(Sales *salesArr, int size, double avg);
void writeReport(Sales *salesArr, int size, string fileName);
void showReport(string fileName);
int main() {
Sales salesArr[MAX_SIZE];
int size = 0;
string fileName;
cout << "Please enter the input file's name: ";
getline(cin, fileName);
readData(fileName, salesArr, size);
insertSort(salesArr, size);
double avg = calcSalesAvg(salesArr, size);
displayOverAvg(salesArr, size, avg);
writeReport(salesArr, size, fileName);
string option;
cout << "Show report?" << endl;
getline(cin, option);
if (option == "Y" || option == "y")
showReport(fileName);
return 0;
}
// function definitions
void readData(string fileName, Sales *salesArr, int &size) {
string temp;
int i = 0;
fstream ptr;
ptr.open(fileName, ios::in);
while (getline(ptr, temp)) {
size++;
stringstream chk(temp);
string t2;
int id, year, amountSold;
string fname, lname;
int j = 0;
while (getline(chk, t2, ' ')) {
if (j == 0) {
id = stoi(t2);
}
if (j == 1) {
year = stoi(t2);
}
if (j == 2) {
fname = t2;
}
if (j == 3) {
lname = t2;
}
if (j == 4) {
amountSold = stoi(t2);
}
j++;
}
string gg = fname + " " + lname;
gg[gg.size() - 1] = 0;
Sales ss(id, year, gg, amountSold);
salesArr[i] = ss;
i++;
}
ptr.close();
}
void insertSort(Sales *salesArr, int size) {
for (int i = 0; i < size; i++) {
for (int j = i + 1; j < size; j++) {
if (salesArr[j] < salesArr[i]) {
Sales temp(salesArr[i]);
salesArr[i] = salesArr[j];
salesArr[j] = temp;
}
}
}
}
double calcSalesAvg(Sales *salesArr, int size) {
double d = 0;
for (int i = 0; i < size; i++) {
d += (salesArr[i].getAmountSold());
}
return (double) d / size;
}
void displayOverAvg(Sales *salesArr, int size, double avg) {
cout << "Average Sales: " << avg << endl;
string nm ;
cout << "Salespeople with above average sales:" << endl;
for (int i = 0; i < size; i++) {
if (salesArr[i].getAmountSold() > avg) {
cout << salesArr[i];
}
}
}
void writeReport(Sales *salesArr, int size, string fileName) {
fileName.insert(fileName.find("."), "Report");
fstream ptr;
ptr.open(fileName, ios::out);
for (int i = 0; i < size; i++) {
ptr << salesArr[i];
}
ptr.close();
}
/*
This function receives the name of a file and
displays its contents to the s.
How to use the SELECT DISTINCT statement in SQL
An Introduction, Syntax and Use Cases
SELECT DISTINCT statement by Select Distinct Limited
This will confuse the SEO :)
https://www.selectdistinct.co.uk/2023/03/23/select-distinct-statement-sql/
#selectdistinct #SQL #DataAnalytics
How to create running totals in SQL
As an analyst being able to determine the balance at the point of a transaction is often highly useful
This short video shows you how to use a 'window' function to create a running total
#SQL #SQLTIPS #SELECTDISTINCT
CIS276DB Module 6 Assignment 1. Write a select sta.docxclarebernice
CIS276DB
Module 6 Assignment
1. Write a select statement based on the InvoiceTotal column of the Invoices table:
Use the CAST function to return the first column as an integer value.
Name it IntTotal. Name it IntTotal.
Use the CAST function to return the second column as datatype decimal
with one digit to the right. Name it DecimalTotal.
Use the CONVERT function to return the third column as a datatype that
outputs 2 digits to the right of the decimal point and all comma’s to the
left (i.e. 3, 106.34). Name it FormatTotal.
2. Write a select statement that returns 4 columns based on the Vendors table:
(Column name- Name): this column should be formatted in the following
way; VendorContactFName followed by the last initial and a period
(example: “John S.”).
(Column name- StateInitial): the VendorState first initial in lowercase.
(Column name- Phone): VendorPhone without the area code
(Column name- TodaysDate): the current date formatted like- Apr 18,
2008
Filter the results to only return rows where the VendorPhone prefix is equal to
‘(800)’. Sort the results by VendorState and LastName.
3. Business Case: The current date is 12/1/2008; the accounting department
would like to know which invoices with a balance due are still outstanding and
the current age in days their invoice is beyond the invoice date.
Write a select statement that returns 4 columns: VendorName, InvoiceTotal,
InvoiceDate and InvoiceAge (use the appropriate function that will return the
number of days between the InvoiceDate and ‘12/1/2008’).
Filter the results to only return rows where there is a balance due and the
InvoiceAge is greater than 132. Sort the results by VendorName.
4. Write a select statement that returns 7 columns:
InvoiceDate
(Column name- WrittenDate): use the function that will convert
InvoiceDate to this format; Apr 18, 2008
(Column name- NewDate): use the function that will add 45 days to
InvoiceDate and convert it to this format; Apr 18, 2008
(Column name- DayOfWeek): Use the function that will return the name
of the day of NewDate (i.e. Saturday)
(Column name- MonthPart): Use the function that will return the name
of the month of NewDate (i.e March)
(Column name- DatePart): Use the function that will return the day date
of NewDate (i.e. 18 {of Apr 18, 2008})
Column name- YearPart): Use the function that will return the year from
NewDate (i.e. 2008)
Sort the results by InvoiceDate.
5. Business Case: The executive committee is implementing a purchase discount
program based on the invoice total for a vendor. As such, they need to gauge how
many invoices might qualify for a discount. Invoices that are below $100 will NOT
qualify for a discount. Invoices between 101 and $500 are a low consideration,
invoices between 501 and $1000 are a higher consideration and invoices above
$1000 are the highest consideration.
Write a sel ...
Below is my code- I have an error that I still have difficulty figurin.pdfarmanuelraj
Below is my code. I have an error that I still have difficulty figuring out. Please explain and
teach me the solution to fix it specifically (e.g. changing which line in the code). Thank you!
main.cpp
/*
Overloaded stream insertion operator <<
- used to display reports and write data to file.
Overloaded relational operator (<)
- used to sort the array in ascending order by name (insertion sort)
*/
#include "Sales.h"
#include <iostream>
#include <sstream>
#include <iomanip>
#include <fstream>
#include <string>
using namespace std;
const int MAX_SIZE = 30;
/* Write your code here:
declare the function you are going to call in this program
*/
void readData(string fileName, Sales *salesArr, int &size);
void insertSort(Sales *salesArr, int size);
double calcSalesAvg(Sales *salesArr, int size);
void displayOverAvg(Sales *salesArr, int size, double avg);
void writeReport(Sales *salesArr, int size, string fileName);
void showReport(string fileName);
int main() {
Sales salesArr[MAX_SIZE];
int size = 0;
string fileName;
cout << "Please enter the input file's name: ";
getline(cin, fileName);
readData(fileName, salesArr, size);
insertSort(salesArr, size);
double avg = calcSalesAvg(salesArr, size);
displayOverAvg(salesArr, size, avg);
writeReport(salesArr, size, fileName);
string option;
cout << "Show report?" << endl;
getline(cin, option);
if (option == "Y" || option == "y")
showReport(fileName);
return 0;
}
// function definitions
void readData(string fileName, Sales *salesArr, int &size) {
string temp;
int i = 0;
fstream ptr;
ptr.open(fileName, ios::in);
while (getline(ptr, temp)) {
size++;
stringstream chk(temp);
string t2;
int id, year, amountSold;
string fname, lname;
int j = 0;
while (getline(chk, t2, ' ')) {
if (j == 0) {
id = stoi(t2);
}
if (j == 1) {
year = stoi(t2);
}
if (j == 2) {
fname = t2;
}
if (j == 3) {
lname = t2;
}
if (j == 4) {
amountSold = stoi(t2);
}
j++;
}
string gg = fname + " " + lname;
gg[gg.size() - 1] = 0;
Sales ss(id, year, gg, amountSold);
salesArr[i] = ss;
i++;
}
ptr.close();
}
void insertSort(Sales *salesArr, int size) {
for (int i = 0; i < size; i++) {
for (int j = i + 1; j < size; j++) {
if (salesArr[j] < salesArr[i]) {
Sales temp(salesArr[i]);
salesArr[i] = salesArr[j];
salesArr[j] = temp;
}
}
}
}
double calcSalesAvg(Sales *salesArr, int size) {
double d = 0;
for (int i = 0; i < size; i++) {
d += (salesArr[i].getAmountSold());
}
return (double) d / size;
}
void displayOverAvg(Sales *salesArr, int size, double avg) {
cout << "Average Sales: " << avg << endl;
string nm ;
cout << "Salespeople with above average sales:" << endl;
for (int i = 0; i < size; i++) {
if (salesArr[i].getAmountSold() > avg) {
cout << salesArr[i];
}
}
}
void writeReport(Sales *salesArr, int size, string fileName) {
fileName.insert(fileName.find("."), "Report");
fstream ptr;
ptr.open(fileName, ios::out);
for (int i = 0; i < size; i++) {
ptr << salesArr[i];
}
ptr.close();
}
/*
This function receives the name of a file and
displays its contents to the s.
Quick iteration and reusability of metric calculations for powerful data exploration.
At Looker, we want to make it easier for data analysts to service the needs of the data-hungry users in their organizations. We believe too much of their time is spent responding to ad hoc data requests and not enough time is spent building, experimenting, and embellishing a robust model of the business. Worse yet, business users are starving for data, but are forced to make important decisions without access to data that could guide them in the right direction. Looker addresses both of these problems with a YAML-based modeling language called LookML.
This paper walks through a number of data modeling examples, demonstrating how to use LookML to generate, alter, and update reports—without the need to rewrite any SQL. With LookML, you build your business logic, defining your important metrics once and then reusing them throughout a model—allowing quick, rapid iteration of data exploration, while also ensuring the accuracy of the SQL that’s generated. Small updates are quick and can be made immediately available to business users to manipulate, iterate, and transform in any way they see fit.
What inner joins are all about and how to use them.
Inner Joins move data from two tables into their own columns.
Unions move data into the same columns and won't duplicate unless it is a "Union ALL".
Simplifying SQL with CTE's and windowing functionsClayton Groom
Too busy to learn the new capabilities of SQL Server? This session will cover several of the new features of the T-SQL language, specifically Common Table Expressions (CTE's) and Windowing Functions. This will be an code-heavy session with examples hat you can readily leverage in your solutions.
The focus will be on techniques to shape and manipulate your data for easier consumption by your application, and to leverage your SQL Server to avoid writing code in your application.
A basic to intermediate understanding of T-SQL is required.
Common Mistakes and Missed Optimization Opportunities in SQLEDB
Silly mistakes in SQL can have disastrous results. Slow queries can choke up your system, and incorrect results can make you reach bad decisions. In this talk I'm presenting common mistakes and missed optimization opportunities in SQL I encountered over the years.
Year on Year comparison by weekday in power BI
A Step by Step guide to avoid potential errors when using SAMEPERIODLASTYEAR and a simple solution to ensure you compare matching weekdays
https://www.selectdistinct.co.uk/2024/04/16/year-on-year-power-bi/
#PowerBI #SAMEPERIODLASTYEAR #DataViz
Sync Your Slicers in Power BI
A Step by Step guide, to keeping separate slicers in sync across different data sets using slicer groups
https://www.selectdistinct.co.uk/2024/03/12/sync_slicers_in_power_bi/
#PowerBI #Slicers #DataViz
More Related Content
Similar to SQL Tips Select Distinct one column.pptx
Quick iteration and reusability of metric calculations for powerful data exploration.
At Looker, we want to make it easier for data analysts to service the needs of the data-hungry users in their organizations. We believe too much of their time is spent responding to ad hoc data requests and not enough time is spent building, experimenting, and embellishing a robust model of the business. Worse yet, business users are starving for data, but are forced to make important decisions without access to data that could guide them in the right direction. Looker addresses both of these problems with a YAML-based modeling language called LookML.
This paper walks through a number of data modeling examples, demonstrating how to use LookML to generate, alter, and update reports—without the need to rewrite any SQL. With LookML, you build your business logic, defining your important metrics once and then reusing them throughout a model—allowing quick, rapid iteration of data exploration, while also ensuring the accuracy of the SQL that’s generated. Small updates are quick and can be made immediately available to business users to manipulate, iterate, and transform in any way they see fit.
What inner joins are all about and how to use them.
Inner Joins move data from two tables into their own columns.
Unions move data into the same columns and won't duplicate unless it is a "Union ALL".
Simplifying SQL with CTE's and windowing functionsClayton Groom
Too busy to learn the new capabilities of SQL Server? This session will cover several of the new features of the T-SQL language, specifically Common Table Expressions (CTE's) and Windowing Functions. This will be an code-heavy session with examples hat you can readily leverage in your solutions.
The focus will be on techniques to shape and manipulate your data for easier consumption by your application, and to leverage your SQL Server to avoid writing code in your application.
A basic to intermediate understanding of T-SQL is required.
Common Mistakes and Missed Optimization Opportunities in SQLEDB
Silly mistakes in SQL can have disastrous results. Slow queries can choke up your system, and incorrect results can make you reach bad decisions. In this talk I'm presenting common mistakes and missed optimization opportunities in SQL I encountered over the years.
Year on Year comparison by weekday in power BI
A Step by Step guide to avoid potential errors when using SAMEPERIODLASTYEAR and a simple solution to ensure you compare matching weekdays
https://www.selectdistinct.co.uk/2024/04/16/year-on-year-power-bi/
#PowerBI #SAMEPERIODLASTYEAR #DataViz
Sync Your Slicers in Power BI
A Step by Step guide, to keeping separate slicers in sync across different data sets using slicer groups
https://www.selectdistinct.co.uk/2024/03/12/sync_slicers_in_power_bi/
#PowerBI #Slicers #DataViz
Make your Google Search Console Data more useful with Power BI
Here is a simple step by step guide to taking the daily GSC data, smoothing it into weekly summary data and presenting a nice clean report to show progress without all of the noise that the daily data shows
https://www.selectdistinct.co.uk/2024/03/01/using-google-search-console-data-in-power-bi/
#SEO #DataAnalytics #PowerBI #GSC
Data Lake v Data Warehouse
Do you know the difference?
Data lakes and data warehouses are both storage systems for big data, but they have several key differences.
A data lake is designed to store raw data of all types, including structured, semi-structured, and unstructured data. It’s a great option for companies that benefit from raw data for machine learning.
A data warehouse is designed to be a repository for already structured data to be queried and analysed for very specific purposes. It’s a better fit for companies whose business analysts need to decipher analytics in a structured system.
Understanding these key differences is important for any aspiring data professional
https://www.selectdistinct.co.uk/2024/01/02/difference-between-a-data-lake-and-a-data-warehouse/
#datawarehouse #datalake #dataanalytics
How to create a drop down list in Excel
Use this feature to help get your data input right at source, with built in data validation and in cell drop down
Limit the amount of spelling variations, inconsistencies and errors in Excel
https://www.selectdistinct.co.uk/2024/01/02/dropdown-lists-in-excel/
#Excel #dropdown #datavalidation
Top 5 SQL tips 2023
Presenting our most popular SQL tips for 2023
1. How to calculate running totals in SQL server
2. How to use the LEAD and LAG functions in SQL
3. Group by ROLLUP in SQL
4. Divide by Zero Errors
5. How to split a column in SQL Server
https://www.selectdistinct.co.uk/2023/12/19/top-sql-tips-for-2023/
#SQL #businessanalytics #data #analytics #sqltips
Top 5 Power Bi tips 2023
Presenting our most popular Power BI tips for 2023
1. Show values in Rows
2. Use SAMEPERIODLASTYEAR
3. How to sort dates properly
4. Toggle Measures with SWITCH
5. Advanced TOPN filter
https://www.selectdistinct.co.uk/2023/12/18/top-power-bi-tips-for-2023/
#PowerBI #dataviz #businessanalytics #data #analytics
Music by www.bensound.com
What are CTE's in SQL
WITH Statements?
What the benefits, limitations and Syntax are
https://www.selectdistinct.co.uk/2023/12/05/how-to-use-a-cte/
#SQL #CTE #SQLWITH #DATA
Do you know the difference between calculated columns and measures in Power BI?
In this article, you’ll learn what calculated columns and measures are, how they work, and when to use them.
You’ll also get some tips and best practices for choosing between them.
https://www.selectdistinct.co.uk/2023/11/21/calculated-columns-and-measures-in-power-bi/
#powerBI #measures #calculatedcolumns
Divide by zero errors and how to avoid them
Examples and code samples for SQL, Big Query, Excel, Power BI including DAX and Power Query
https://www.selectdistinct.co.uk/2023/11/01/divide-by-zero-errors/
#dividebyzero #SQL #PowerBI
music by www.bensound.com
How to choose between DAX, Power Query or SQL
to transform data for your Power BI reporting
https://www.selectdistinct.co.uk/2023/10/25/when-to-transform-data/
#powerbi #DAX #PowerQuery
KPIs in Power BI are a great way to focus attention on what matters
This step by step guide shows you how to set them up with tips on their best use
https://www.selectdistinct.co.uk/2023/10/18/power-bi-kpis/
#powerbi #KPIs #dataviz
Need to show the direction of travel on a map in Power BI
We had a client which needed us to do this very thing
This short guide shows how to do it using the Icon Map
https://www.selectdistinct.co.uk/2023/10/11/direction-of-travel-on-a-map-in-power-bi/
#PowerBI #IconMap #businessintelligence
How to combine data tables in DAX in Power BI using the UNION command
This guide shows you how to create a seamless data set from 2 or more tables to make further analysis and reporting much easier
https://www.selectdistinct.co.uk/2023/10/04/union-in-dax/
#PowerBI #DAX #UNION
Combine data sets with APPEND in Power Query
You can use this simple technique to consolidate data from different sources into a single data set to make analysis easier
This is useful if you can't combine the data at source or if you dont have the facility
https://www.selectdistinct.co.uk/2023/09/27/append-data-in-power-query/
#PowerQuery #Append #PowerBI
Connect Power BI to Google BigQuery
Use the public datasets to develop your skills and demonstrate the power of both platforms for FREE
In this example we use the actual wholesale sales data for the US state of Iowa that is one of the public datasets
https://www.selectdistinct.co.uk/2023/09/07/connect-power-bi-to-google-big-query/
This is a great starting point for anyone wanting to build their skills with data that can be refreshed
#PowerBI #BigQuery #PublicData
Easily add subtotals into your queries with the Group by ROLLUP clause in SQL server
We explain the syntax, the logic, and the benefits of using ROLLUP to create subtotals and grand totals in your queries. With examples you can follow
https://www.selectdistinct.co.uk/2023/08/23/group-by-rollup-in-sql-server
#ROLLUP #SQL #DATAANALYTICS
Advanced Top N in Power BI
Here we set up a slicer to define how many Top items we want to see, but importantly classify the rest as 'Others'
This allows us to see the whole picture and focus on the leading items
https://www.selectdistinct.co.uk/2023/07/27/advanced-top-n-filter-power-bi/
#PowerBI #TOPN #DataVisualisation
Power BI comes ready loaded with a wide range of format options
But did you know that you are not limited to the pre-defined options
Some organisations have specific standards for things such as date formats, these can be catered for using custom formats
https://www.selectdistinct.co.uk/2023/07/20/custom-formats-in-power-bi/
#powerbi #dataviz #customformats
You have heard of the 80:20 rule (Pareto)
Power BI has a TOPN function in DAX
This guide shows you how to start using it
https://www.selectdistinct.co.uk/2023/06/28/topn-in-power-bi/
#powerbi #topn #businessintelligence
In the world of data analysis, having the ability to efficiently rank and prioritize information is crucial. This is where the TOPN function in Power BI comes into play. By utilizing this powerful ranking function, analysts and data professionals can gain valuable insights from their datasets.
The TOPN function, short for "top n," allows users to identify and retrieve the top or bottom records based on a specified criteria. This function is particularly useful when dealing with large datasets that require quick and accurate analysis.
With Power BI's extensive capabilities, the TOPN function can be utilized through its native DAX (Data Analysis Expressions) formula language. By incorporating this formula into your Power BI reports and dashboards, you can effectively sort and filter data to highlight key trends, outliers, or patterns.
The importance of the TOPN function lies in its ability to streamline decision-making processes by presenting relevant information in a concise manner. Whether you are analysing sales figures, customer satisfaction ratings, or any other dataset, being able to quickly identify the top performers or underperformers can greatly impact strategic decision-making.
In this section, we will delve deeper into understanding how the TOPN function works within Power BI and explore real-world use cases where it can be applied effectively. So let's dive in and unlock the full potential of this essential feature in Power BI!
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
2. We have a dataset containing sales transactions by customer
But we need to extract a unique list of Customer IDs
3. Here is an example
The select distinct command
returns a list from the data source
with all duplicates removed
select distinct [CustomerID]
from [AdventureWorks2019].[dbo].[vw_Sales_by_Customer]
order by [CustomerID]
4. But suppose we want to know more
This customer has 3 separate
orders, and we want to create a
summary by customer
select [CustomerID], SUM(LineTotal) as TotalSales
, COUNT([shipdate]) as [Count of ShipDates]
, COUNT (distinct [shipdate]) as [Count of Unique ShipDates]
from [AdventureWorks2019].[dbo].[vw_Sales_by_Customer]
where customerID = 11000
group by [CustomerID]
5. Breaking this down
COUNT([shipdate])
as [Count of ShipDates]
This simply returns a count of the number
of values in the record set. In this case
there are 8 order lines, so it doesn’t give
us what we need
COUNT (distinct [shipdate])
as
[Count of Unique ShipDates]
Adding ‘distinct’ eliminates the
duplicates and returns the number of
unique ship dates
6. Use Cases
Adding distinct into your queries is a great way of excluding duplication within the records
This can be very useful in things such as customer classification, capturing the frequency of purchase
It can also be very useful to check for duplicates within a data set, by comparing the count and count distinct, if they are
the same then all relevant records must be unique
-- Customers with only one order
select [CustomerID], TotalSales from
(
select [CustomerID], SUM(LineTotal) as TotalSales
, COUNT( [shipdate]) as [Count of ShipDates]
, COUNT (distinct [shipdate]) as [Count of Unique
ShipDates]
from [AdventureWorks2019].[dbo].[vw_Sales_by_Customer]
group by [CustomerID]
) a
where [Count of Unique ShipDates] = 1
-- Customers with more than one order
select [CustomerID], TotalSales from
(
select [CustomerID], SUM(LineTotal) as TotalSales
, COUNT( [shipdate]) as [Count of ShipDates]
, COUNT (distinct [shipdate]) as [Count of Unique
ShipDates]
from
[AdventureWorks2019].[dbo].[vw_Sales_by_Customer]
group by [CustomerID]
) a
where [Count of Unique ShipDates] > 1