SlideShare a Scribd company logo
1 of 18
Randomizing Data with
Microsoft® SQL Server™
An excerpt of the advantages, use case
scenarios and benefits of data randomization
applied to the real world.
After reading this document you will be able to:
 Understand data randomization
 The use of Unique Identifier Fields and GUID’s
 Insert data to temporary tables
 Sort and manipulate table data randomly
.:: Copyright© 2018 by Wally Pons & Datagrupo ::.
About the author
 A bit about me:
 Wally M. Pons, IT professional with over 20 years
of experience in software programming,
databases and solutions provider, you may
contact me as follows:
 Twitter: @datagrupo / @wallypons
 Web: Https://www.datagrupo.com
 Email: wpons@datagrupo.com
The software you will need
 You will need the following software:
 Microsoft® SQL Server™ either 2008/2012/2014/2016 or 2017
Developer, Standard or greater edition. The Express editions
won’t be able to handle a 10GB or bigger database. You can
download the Developer editions free from the following link:
 https://sqlserverupdates.com/
 Microsoft® Windows installation or virtual machine that supports
the Microsoft® SQL Server™ version of your choice.
 Torrent client, this is to download our test database(s).
 7-zip or WinRAR for decompressing files.
What is randomization?
• Let’s start with my common definition of randomization: “It’s the
process of making something random, more specifically, a technique
to produce results that are not controlled by a human decision-
making process, thus removing the possibility of a manipulated
outcome and also preventing misjudgment.”
• Probably one of the most primitive examples of randomization
comes from the “coin-tossing” technique which involves one of two
choices (usually heads or tails), this is also by far the simplest form
of randomization.
• Beyond the aforementioned, there are other techniques and
processes to randomize information to our desire and needs, in this
case, I will explain how to accomplish this by using Microsoft® SQL
Server™ and some sample data from the Stack Overflow database,
which can be downloaded freely for testing purposes, this is
provided under the cc-by-sa 3.0 license terms.
Getting some sample data
• If you have a sample Microsoft® SQL Server™ database that you can
work on, then you may skip to the “Selecting the Data” section.
Such sample of information may include some or all of the following
data:
– Customer names
– Vendor names
– Inventory items
– Automobile brands & models or other type of data collection
• But if you don’t have a sample database then you may continue to
download the Stack Overflow database from the following link:
https://www.brentozar.com/archive/2015/10/how-to-download-
the-stack-overflow-database-via-bittorrent/
Downloading the data
• Before you download from the previous link and in order to
decompress the file(s), you need to have a torrent client and
either 7-zip or WinRAR installed on your machine. Links to those
apps are included in the download page for your convenience.
• There are three versions of the Stack Overflow database and
depending on your disk space you can download them all, but for
the purpose of this excerpt, I have downloaded the 10GB and 50GB
versions (which will more than suffice) but you may download the
312GB version, that is if you have the available disk space. Here’s a
preview of how the downloaded files look:
• You then decompress them to specific folders, as shown next.
Decompressing the data files
• The small 1.08GB (1,140,633KB) file contains the 10GB database,
which is composed by a Primary and Translog file. The other 9.43GB
(9,898,904KB) file contains the 50GB database, this is a little more
complex than the previous one because it contains the Primary,
Translog and three secondary files.
• In my case I have created a folder structure (yours doesn’t have to
match or look the same) for every file and named it accordingly to
its usage, for a better reference, please see the below image:
Attaching the data files
• Although your files are nice and neatly in place, you need to tell
Microsoft® SQL Server™ to use them, this is accomplished by using
a short script, you can use the same script and modify it to your file
location needs:
• As you can see, I’m attaching both databases using T-SQL, this also
allows me to designate a more adequate name to the databases.
Record counts
• Both databases have the same nine tables but not the same
amount of records, here’s what the 10GB and 50GB database tables
look like:
• We will be using the “Users” table since the data contained in it is
more meaningful for the purpose of this excerpt.
Selecting the data
• Now we’re going to see a sequential sample of the contents from
the “Users” table, please note that only specific fields are included
in the query image:
• With the above data collection sample you can have a better idea of
what data we are going to analyze for randomization.
Filtering the data
• Let’s filter the data in the “Users” table, in this case we will choose
the “Location” field to have an idea as to how many locations are
used:
• Interestingly, you may observe that the
most used location has a value of NULL,
followed by empty, India, London, United
Kingdom, United States, Germany and
so forth.
• Now we can choose a location for our
randomization process.
Creating unique values
• To randomize our data we are going to need to assign unique values
to each record, and to create unique values we will use a data type
known as “uniqueidentifier” which is 16 bytes in storage length and
stores a GUID (globally unique identifier). This field holds a 36
character GUID composed of numbers from 0 to 9, 4 hyphens and
the letters from a to f, a valid GUID looks like the following:
69BC9D6C-B22B-476E-AD09-008661F165C3
• And just in case that you were wondering about getting duplicate
GUIDs, the probability to find a duplicate within 103 trillion
GUIDs is one in a billion, so you can rest assure that duplicates
are far from happening with this approach.
Creating a Temporary Table
• Now we’re going to create a temporary table in which we will insert
a GUID and data for randomization purposes, sort it and display it.
• The temporary table has 5 fields (RandomGUID, DisplayName,
CurrentReputation, Location, Id), the scipt is as follows:
• Please note that this is a global temporary table, this means it is
available to all sessions within your current SQL instance and not
just yours, if you wish to keep the table accessible only for your SQL
session then you may remove one # sign from the name.
Selecting, Sorting and Inserting
Data to the Temporary Table
• Once the table has been created we must insert data into it, in this
case we will insert a selected portion of the data based on location.
First we create a script specifying which fields will be affected and
then we make our insert as shown on the below script:
• For this example I have chosen the
location of ‘San Francisco, CA’ but
you may chose any other location
that you wish. Now I have a temporary
table with 4,465 records in it and they
can be sorted by the RandomGUID
field for random results.
Displaying Random Results
• As you may have (or not) noticed, the RandomGUID field is not
shown in the previous insert and select portion when we populated
our temporary table, this is because that field has a default value
which creates a GUID automatically for every record you insert.
• This is something we will use to randomize results from the table by
doing a Select top 10 ordered by that field.
More Random Results
• In the end of our last script we added a ‘Drop Table’ command, this
is to delete the temporary table but you may omit this if you or
someone else is going to use the table.
• The script image to the right makes
the whole process of creating,
inserting, displaying and dropping
the table, this is useful for multiple
runs with variable results.
• On the next slide I will show two
results from this script.
Random Results Examples
• For better results, you may use larger amounts of records and
increase the randomization posibilities.
Use of Random Results
• One good purpose of randomizing data this way is to get one or all
of the following (randomly):
1. Jury selection
2. Volunteers
3. Responsible assignees
4. Group leaders
5. Employees that will attend a SQL seminar in Vegas (you wish!)
• I hope you find this excerpt useful, please share and practice the
gift of knowledge, it doesn’t matter if it’s one line of code or two
thousand lines, Thanks!

More Related Content

What's hot

Mssql database repair when DBCC CHECKDB fails
Mssql database repair when DBCC CHECKDB failsMssql database repair when DBCC CHECKDB fails
Mssql database repair when DBCC CHECKDB failsmssqldatabase repair
 
Chapter 5 design of keyvalue databses from nosql for mere mortals
Chapter 5 design of keyvalue databses from nosql for mere mortalsChapter 5 design of keyvalue databses from nosql for mere mortals
Chapter 5 design of keyvalue databses from nosql for mere mortalsnehabsairam
 
Fundamental of computer
Fundamental of computerFundamental of computer
Fundamental of computerMousumi Biswas
 
php databse handling
php databse handlingphp databse handling
php databse handlingkunj desai
 
SQL Database Performance Tuning for Developers
SQL Database Performance Tuning for DevelopersSQL Database Performance Tuning for Developers
SQL Database Performance Tuning for DevelopersBRIJESH KUMAR
 
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortalsChapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortalsnehabsairam
 
S3 l5 db2 - process model
S3 l5   db2 - process modelS3 l5   db2 - process model
S3 l5 db2 - process modelMohammad Khan
 
Chapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortalsChapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortalsnehabsairam
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
Chapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortalsChapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortalsnehabsairam
 
S3 l6 db2 - memory model
S3 l6   db2 - memory modelS3 l6   db2 - memory model
S3 l6 db2 - memory modelMohammad Khan
 
Raw Hard Drive Recovery
Raw Hard Drive RecoveryRaw Hard Drive Recovery
Raw Hard Drive RecoveryYodot
 
Entourage Repair
 Entourage Repair  Entourage Repair
Entourage Repair smith bush
 
Overview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesOverview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesAndrew Kandels
 
Project Presentation Final
Project Presentation FinalProject Presentation Final
Project Presentation FinalDhritiman Halder
 
S3 l4 db2 environment - databases
S3 l4  db2 environment - databasesS3 l4  db2 environment - databases
S3 l4 db2 environment - databasesMohammad Khan
 

What's hot (20)

Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Mssql database repair when DBCC CHECKDB fails
Mssql database repair when DBCC CHECKDB failsMssql database repair when DBCC CHECKDB fails
Mssql database repair when DBCC CHECKDB fails
 
Databse management system
Databse management systemDatabse management system
Databse management system
 
Chapter 5 design of keyvalue databses from nosql for mere mortals
Chapter 5 design of keyvalue databses from nosql for mere mortalsChapter 5 design of keyvalue databses from nosql for mere mortals
Chapter 5 design of keyvalue databses from nosql for mere mortals
 
Fundamental of computer
Fundamental of computerFundamental of computer
Fundamental of computer
 
php databse handling
php databse handlingphp databse handling
php databse handling
 
SQL Database Performance Tuning for Developers
SQL Database Performance Tuning for DevelopersSQL Database Performance Tuning for Developers
SQL Database Performance Tuning for Developers
 
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortalsChapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
 
S3 l5 db2 - process model
S3 l5   db2 - process modelS3 l5   db2 - process model
S3 l5 db2 - process model
 
SQL 2005 Memory Module
SQL 2005 Memory ModuleSQL 2005 Memory Module
SQL 2005 Memory Module
 
Chapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortalsChapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortals
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Chapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortalsChapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortals
 
S3 l6 db2 - memory model
S3 l6   db2 - memory modelS3 l6   db2 - memory model
S3 l6 db2 - memory model
 
Raw Hard Drive Recovery
Raw Hard Drive RecoveryRaw Hard Drive Recovery
Raw Hard Drive Recovery
 
Teradata a z
Teradata a zTeradata a z
Teradata a z
 
Entourage Repair
 Entourage Repair  Entourage Repair
Entourage Repair
 
Overview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesOverview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational Databases
 
Project Presentation Final
Project Presentation FinalProject Presentation Final
Project Presentation Final
 
S3 l4 db2 environment - databases
S3 l4  db2 environment - databasesS3 l4  db2 environment - databases
S3 l4 db2 environment - databases
 

Similar to Randomizing Data with Microsoft SQL Server

Dutch PHP Conference 2021 - MySQL Indexes and Histograms
Dutch PHP Conference 2021 - MySQL Indexes and HistogramsDutch PHP Conference 2021 - MySQL Indexes and Histograms
Dutch PHP Conference 2021 - MySQL Indexes and HistogramsDave Stokes
 
Confoo 2021 - MySQL Indexes & Histograms
Confoo 2021 - MySQL Indexes & HistogramsConfoo 2021 - MySQL Indexes & Histograms
Confoo 2021 - MySQL Indexes & HistogramsDave Stokes
 
Longhorn PHP - MySQL Indexes, Histograms, Locking Options, and Other Ways to ...
Longhorn PHP - MySQL Indexes, Histograms, Locking Options, and Other Ways to ...Longhorn PHP - MySQL Indexes, Histograms, Locking Options, and Other Ways to ...
Longhorn PHP - MySQL Indexes, Histograms, Locking Options, and Other Ways to ...Dave Stokes
 
MySQL Indexes and Histograms - RMOUG Training Days 2022
MySQL Indexes and Histograms - RMOUG Training Days 2022MySQL Indexes and Histograms - RMOUG Training Days 2022
MySQL Indexes and Histograms - RMOUG Training Days 2022Dave Stokes
 
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...Dave Stokes
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...Amazon Web Services
 
Sap abap
Sap abapSap abap
Sap abapnrj10
 
Lsmw ppt in SAP ABAP
Lsmw ppt in SAP ABAPLsmw ppt in SAP ABAP
Lsmw ppt in SAP ABAPAabid Khan
 
How to Think Like the SQL Server Engine
How to Think Like the SQL Server EngineHow to Think Like the SQL Server Engine
How to Think Like the SQL Server EngineBrent Ozar
 
An In-Depth Guide for Cleaning Server Log Data in KNIME
An In-Depth Guide for Cleaning Server Log Data in KNIMEAn In-Depth Guide for Cleaning Server Log Data in KNIME
An In-Depth Guide for Cleaning Server Log Data in KNIMERanq.io
 
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...Dave Stokes
 
SharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSPC Adriatics
 
Big Data: Getting off to a fast start with Big SQL (World of Watson 2016 sess...
Big Data: Getting off to a fast start with Big SQL (World of Watson 2016 sess...Big Data: Getting off to a fast start with Big SQL (World of Watson 2016 sess...
Big Data: Getting off to a fast start with Big SQL (World of Watson 2016 sess...Cynthia Saracco
 
cPanel now supports MySQL 8.0 - My Top Seven Features
cPanel now supports MySQL 8.0 - My Top Seven FeaturescPanel now supports MySQL 8.0 - My Top Seven Features
cPanel now supports MySQL 8.0 - My Top Seven FeaturesDave Stokes
 
What Your Database Query is Really Doing
What Your Database Query is Really DoingWhat Your Database Query is Really Doing
What Your Database Query is Really DoingDave Stokes
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossAndrew Flatters
 
Dynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the fieldDynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the fieldStéphane Dorrekens
 
Basic Application Performance Optimization Techniques (Backend)
Basic Application Performance Optimization Techniques (Backend)Basic Application Performance Optimization Techniques (Backend)
Basic Application Performance Optimization Techniques (Backend)Klas Berlič Fras
 

Similar to Randomizing Data with Microsoft SQL Server (20)

Dutch PHP Conference 2021 - MySQL Indexes and Histograms
Dutch PHP Conference 2021 - MySQL Indexes and HistogramsDutch PHP Conference 2021 - MySQL Indexes and Histograms
Dutch PHP Conference 2021 - MySQL Indexes and Histograms
 
Confoo 2021 - MySQL Indexes & Histograms
Confoo 2021 - MySQL Indexes & HistogramsConfoo 2021 - MySQL Indexes & Histograms
Confoo 2021 - MySQL Indexes & Histograms
 
Longhorn PHP - MySQL Indexes, Histograms, Locking Options, and Other Ways to ...
Longhorn PHP - MySQL Indexes, Histograms, Locking Options, and Other Ways to ...Longhorn PHP - MySQL Indexes, Histograms, Locking Options, and Other Ways to ...
Longhorn PHP - MySQL Indexes, Histograms, Locking Options, and Other Ways to ...
 
MySQL Indexes and Histograms - RMOUG Training Days 2022
MySQL Indexes and Histograms - RMOUG Training Days 2022MySQL Indexes and Histograms - RMOUG Training Days 2022
MySQL Indexes and Histograms - RMOUG Training Days 2022
 
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
Sap abap
Sap abapSap abap
Sap abap
 
Lsmw ppt in SAP ABAP
Lsmw ppt in SAP ABAPLsmw ppt in SAP ABAP
Lsmw ppt in SAP ABAP
 
Ibm redbook
Ibm redbookIbm redbook
Ibm redbook
 
How to Think Like the SQL Server Engine
How to Think Like the SQL Server EngineHow to Think Like the SQL Server Engine
How to Think Like the SQL Server Engine
 
An In-Depth Guide for Cleaning Server Log Data in KNIME
An In-Depth Guide for Cleaning Server Log Data in KNIMEAn In-Depth Guide for Cleaning Server Log Data in KNIME
An In-Depth Guide for Cleaning Server Log Data in KNIME
 
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
 
SharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi Vončina
 
Big Data: Getting off to a fast start with Big SQL (World of Watson 2016 sess...
Big Data: Getting off to a fast start with Big SQL (World of Watson 2016 sess...Big Data: Getting off to a fast start with Big SQL (World of Watson 2016 sess...
Big Data: Getting off to a fast start with Big SQL (World of Watson 2016 sess...
 
cPanel now supports MySQL 8.0 - My Top Seven Features
cPanel now supports MySQL 8.0 - My Top Seven FeaturescPanel now supports MySQL 8.0 - My Top Seven Features
cPanel now supports MySQL 8.0 - My Top Seven Features
 
What Your Database Query is Really Doing
What Your Database Query is Really DoingWhat Your Database Query is Really Doing
What Your Database Query is Really Doing
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy Cross
 
Dynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the fieldDynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the field
 
Basic Application Performance Optimization Techniques (Backend)
Basic Application Performance Optimization Techniques (Backend)Basic Application Performance Optimization Techniques (Backend)
Basic Application Performance Optimization Techniques (Backend)
 

Recently uploaded

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Randomizing Data with Microsoft SQL Server

  • 1. Randomizing Data with Microsoft® SQL Server™ An excerpt of the advantages, use case scenarios and benefits of data randomization applied to the real world. After reading this document you will be able to:  Understand data randomization  The use of Unique Identifier Fields and GUID’s  Insert data to temporary tables  Sort and manipulate table data randomly .:: Copyright© 2018 by Wally Pons & Datagrupo ::.
  • 2. About the author  A bit about me:  Wally M. Pons, IT professional with over 20 years of experience in software programming, databases and solutions provider, you may contact me as follows:  Twitter: @datagrupo / @wallypons  Web: Https://www.datagrupo.com  Email: wpons@datagrupo.com
  • 3. The software you will need  You will need the following software:  Microsoft® SQL Server™ either 2008/2012/2014/2016 or 2017 Developer, Standard or greater edition. The Express editions won’t be able to handle a 10GB or bigger database. You can download the Developer editions free from the following link:  https://sqlserverupdates.com/  Microsoft® Windows installation or virtual machine that supports the Microsoft® SQL Server™ version of your choice.  Torrent client, this is to download our test database(s).  7-zip or WinRAR for decompressing files.
  • 4. What is randomization? • Let’s start with my common definition of randomization: “It’s the process of making something random, more specifically, a technique to produce results that are not controlled by a human decision- making process, thus removing the possibility of a manipulated outcome and also preventing misjudgment.” • Probably one of the most primitive examples of randomization comes from the “coin-tossing” technique which involves one of two choices (usually heads or tails), this is also by far the simplest form of randomization. • Beyond the aforementioned, there are other techniques and processes to randomize information to our desire and needs, in this case, I will explain how to accomplish this by using Microsoft® SQL Server™ and some sample data from the Stack Overflow database, which can be downloaded freely for testing purposes, this is provided under the cc-by-sa 3.0 license terms.
  • 5. Getting some sample data • If you have a sample Microsoft® SQL Server™ database that you can work on, then you may skip to the “Selecting the Data” section. Such sample of information may include some or all of the following data: – Customer names – Vendor names – Inventory items – Automobile brands & models or other type of data collection • But if you don’t have a sample database then you may continue to download the Stack Overflow database from the following link: https://www.brentozar.com/archive/2015/10/how-to-download- the-stack-overflow-database-via-bittorrent/
  • 6. Downloading the data • Before you download from the previous link and in order to decompress the file(s), you need to have a torrent client and either 7-zip or WinRAR installed on your machine. Links to those apps are included in the download page for your convenience. • There are three versions of the Stack Overflow database and depending on your disk space you can download them all, but for the purpose of this excerpt, I have downloaded the 10GB and 50GB versions (which will more than suffice) but you may download the 312GB version, that is if you have the available disk space. Here’s a preview of how the downloaded files look: • You then decompress them to specific folders, as shown next.
  • 7. Decompressing the data files • The small 1.08GB (1,140,633KB) file contains the 10GB database, which is composed by a Primary and Translog file. The other 9.43GB (9,898,904KB) file contains the 50GB database, this is a little more complex than the previous one because it contains the Primary, Translog and three secondary files. • In my case I have created a folder structure (yours doesn’t have to match or look the same) for every file and named it accordingly to its usage, for a better reference, please see the below image:
  • 8. Attaching the data files • Although your files are nice and neatly in place, you need to tell Microsoft® SQL Server™ to use them, this is accomplished by using a short script, you can use the same script and modify it to your file location needs: • As you can see, I’m attaching both databases using T-SQL, this also allows me to designate a more adequate name to the databases.
  • 9. Record counts • Both databases have the same nine tables but not the same amount of records, here’s what the 10GB and 50GB database tables look like: • We will be using the “Users” table since the data contained in it is more meaningful for the purpose of this excerpt.
  • 10. Selecting the data • Now we’re going to see a sequential sample of the contents from the “Users” table, please note that only specific fields are included in the query image: • With the above data collection sample you can have a better idea of what data we are going to analyze for randomization.
  • 11. Filtering the data • Let’s filter the data in the “Users” table, in this case we will choose the “Location” field to have an idea as to how many locations are used: • Interestingly, you may observe that the most used location has a value of NULL, followed by empty, India, London, United Kingdom, United States, Germany and so forth. • Now we can choose a location for our randomization process.
  • 12. Creating unique values • To randomize our data we are going to need to assign unique values to each record, and to create unique values we will use a data type known as “uniqueidentifier” which is 16 bytes in storage length and stores a GUID (globally unique identifier). This field holds a 36 character GUID composed of numbers from 0 to 9, 4 hyphens and the letters from a to f, a valid GUID looks like the following: 69BC9D6C-B22B-476E-AD09-008661F165C3 • And just in case that you were wondering about getting duplicate GUIDs, the probability to find a duplicate within 103 trillion GUIDs is one in a billion, so you can rest assure that duplicates are far from happening with this approach.
  • 13. Creating a Temporary Table • Now we’re going to create a temporary table in which we will insert a GUID and data for randomization purposes, sort it and display it. • The temporary table has 5 fields (RandomGUID, DisplayName, CurrentReputation, Location, Id), the scipt is as follows: • Please note that this is a global temporary table, this means it is available to all sessions within your current SQL instance and not just yours, if you wish to keep the table accessible only for your SQL session then you may remove one # sign from the name.
  • 14. Selecting, Sorting and Inserting Data to the Temporary Table • Once the table has been created we must insert data into it, in this case we will insert a selected portion of the data based on location. First we create a script specifying which fields will be affected and then we make our insert as shown on the below script: • For this example I have chosen the location of ‘San Francisco, CA’ but you may chose any other location that you wish. Now I have a temporary table with 4,465 records in it and they can be sorted by the RandomGUID field for random results.
  • 15. Displaying Random Results • As you may have (or not) noticed, the RandomGUID field is not shown in the previous insert and select portion when we populated our temporary table, this is because that field has a default value which creates a GUID automatically for every record you insert. • This is something we will use to randomize results from the table by doing a Select top 10 ordered by that field.
  • 16. More Random Results • In the end of our last script we added a ‘Drop Table’ command, this is to delete the temporary table but you may omit this if you or someone else is going to use the table. • The script image to the right makes the whole process of creating, inserting, displaying and dropping the table, this is useful for multiple runs with variable results. • On the next slide I will show two results from this script.
  • 17. Random Results Examples • For better results, you may use larger amounts of records and increase the randomization posibilities.
  • 18. Use of Random Results • One good purpose of randomizing data this way is to get one or all of the following (randomly): 1. Jury selection 2. Volunteers 3. Responsible assignees 4. Group leaders 5. Employees that will attend a SQL seminar in Vegas (you wish!) • I hope you find this excerpt useful, please share and practice the gift of knowledge, it doesn’t matter if it’s one line of code or two thousand lines, Thanks!