Sql server 2008 r2 data mining whitepaper overview
Introduction to SQL Report tool
1. Why the SQL Report Program was written
The SQL report tool was written and first used in 2002 to create reports from data extracted
from a membership system written in Dataflex and used the Connx program to provide
connectivity to the database. The membership system recorded the yearly contributions of
members but didn’t provide any means to analyse data to extract historical trends nor
provide an easy method of identifying members in arrears.
The reason for creating the program was that for a small organisation it was regarded as
expensive by management to have custom reports, the custom reports that did exist
weren’t flexible and at best they provided a dump to a printable report using limited
number of parameters.
The printable report didn’t allow any manipulation of data such as sorting using various
parameters nor provide historical analysis using data from previous years stored in the
membership system. It was possible to manually look up records but this was time
consuming and only undertaken on limited basis.
Writing the program provided me with the ability to create reports from raw data without
relying on an external programmer. The ability to create a report using multiple sources
came in handy when I started working with multiple databases and had to create reports
using multiple sources.
Organisationalissues addressed by the program
When I began working for an engineering services company I encountered a situation where
amalgamated reports were required using data from different sources as they had an
accounting system (Triumph written in Dataflex) and a program management system
(InControl).
InControl had a timesheet module and I had to extract the timesheet data and create a file
to import into the Triumph accounting systemand also create a report showing the weekly
timesheets in Excel for management review. Management were too busy to review
timesheets by looking up an individual’s timesheet in InControl and preferred a format
where they could view all timesheets in one go.
Working in a project management environment required creating ad hoc reports for the
lifetime of the project and also transferring data between various databases. As an example
at a later time Oracle provided the timesheet systembut Sage Timberline software provided
the accounting system to use with the InControl project management system. Timberline
accumulated the project cost data and provided a monthly upload to InControl.
2. Both Timberline and Oracle had their own distinct employee attributes and the source data
from Oracle had to have the Timberline attributes added before importation into
Timberline. This was achieved by having a csv file containing the employee attribute values
for both databases and writing an SQL script that took the Oracle values and looked up the
corresponding Timberline values and then output the Timberline values into a file for
Timberline to import.
At a later point in time Oracle provided the timesheet system, accounting and project
management system. After this Oracle was the sole source of timesheet data but there was
still a need to provide weekly reports for projects and external payroll providers.
The SQL Report tool used accumulated timesheet data from Oracle, or other databases,
stored in an Access database to use as a source for reports that provided estimates using
historical timesheet data from similar projects.
Later in my time with the Project Management Company I also used Text based data
extracts from the Primavera planning tool to calculate cost curves.
Adding additional information to data extracts from external stored data
The extracts from Oracle and other databases were sparse i.e. the usable portion supplied
information such as employee name, employee number, employee classification, project
number, project task code, hours worked, date. To enhance this information a csv file was
used called “Department.csv” to add required reporting information such as classification
description, employee supervisor, employee work status and office location.
The project task code didn’t come with a description in the extract and I had to store an
extract from Oracle in a file that had the task number and task description to supply the task
description for reports.
Design of the Program
For the reasons given above I used my programming skills to create a reporting tool that
read SQL statements in a file to control the extraction and manipulation data from the
membership system.
I also extended the program to write the formatted data to an Excel spreadsheet using
instructions written in the file. Originally data was transferred to the spreadsheet using
Dynamic Data Exchange (DDE) and each instruction was prefaced with DDE as shown in the
example below where the Excel file and Excel sheet used to store the report are identified
and opened to write data. The DDE instruction was kept for conformity after the program
was upgraded to use the Excel object.
3. DDE Excel Book1.xls;
DDE Excel_Sheet Inv_Summary;
The file has a 64kb size limitation because the SQL Report tool uses Notepad to open the file
of instructions before the program starts reading and executing each instruction.
The example in the attached file “Script Files” shows how it was possible to set up a report
that used in that example, raw timesheet data, and construct a generic report which took
parameters to control the output written to a spreadsheet.
Running a script file also provides an advantage in the situation where the available data
isn’t in a form where it is immediately available for use in a query and the data requires
prior manipulation before it usable in a SQL statement.
As an example the following shows the stages in importing the budget for project M6009.
The budget came from the Primavera planning program and a portion of the budget
exported is shown below.
These are the schema values for the output file and a portion of budget “csv” file is shown
below.
[m6009_budget.csv]
ColNameHeader=False
Format=CSVDelimited
MaxScanRows=25
CharacterSet=ANSI
Col1=TASK Char Width 8
Col2=EMPLOYEE Char Width 55
Col3=WEEK1 Float
Col4=WEEK2 Float
Col5=WEEK3 Float
Col6=WEEK4 Float
Col7=WEEK5 Float
Col8=WEEK6 Float
Col9=WEEK7 Float
4. Portion of budget csv showing the first seven weeks
E1070 00357.Sylwestrzak,Linus 5.12 5.12 5.12 5.12 5.12 5.12
P1010 00334.Heinzle,Thomas 5.67 5.67 5.67 5.67 5.67 5.67 5.67
P1010 00605.Siemon,AndrewHugh 5.67 5.67 5.67 5.67 5.67 5.67 5.67
E1010 00605.Siemon,AndrewHugh 1.32 1.64 1.64 1.64 1.64 1.64
E1020 00334.Heinzle,Thomas 5.93 5.93 5.93 5.93 5.93 5.93
E1090 00605.Siemon,AndrewHugh 8.25 8.25 8.25 8.25 8.25 8.25
E1090 00334.Heinzle,Thomas 3.98 3.98 3.98 3.98 3.98 3.98
E1070 00334.Heinzle,Thomas 5.06 5.06 5.06 5.06 5.06 5.06
E1070 00605.Siemon,AndrewHugh 5.06 5.06 5.06 5.06 5.06 5.06
P1010 00795.McLeod, Anthony 1.22 1.22 1.22 1.22 1.22 1.22 1.22
P1010 81250.Bear, Michael 1.1 1.1 1.1 1.1 1.1 1.1
P1010 00765.Middleditch,Leslie John 23.54 23.54 23.54 23.54 23.54 23.54 23.54
This script file portion shows the SQL statements used to import the budget. The project
budget is flexible and under control of the planner so the number of tasks will increase, the
allocation of budget to tasks will change and the employees used will change. This means
that the budget is re-imported on a regular basis to include any the changes. The budget set
up in B12 Oracle will also change and there is regular maintenance of resource allocations in
Oracle.
// beginning of file Load_M6009
Drop Table M6009_B;
Drop Table WIP;
Drop Table Budget;
Drop Table Dept;
Create Table Budget (
TASK CHAR (15),
EMP_NAME CHAR (55),
RESOURCE CHAR (5),
HOURS Float,
RATE Float,
WEEK Date);
Select * INTO M6009_B From lnkM6009_Budget;
Select * INTO Dept From lnkDepartments;
Select * INTO WIP From lnkwip_Rates;
Alter Table M6009_B ADD Column EMPID TEXT (10);
Alter Table M6009_B ADD Column SEP Integer;
5. Alter Table M6009_B ADD Column EMP_NAME TEXT (55);
Alter Table M6009_B ADD Column RESOURCE TEXT (5);
Alter Table M6009_B ADD Column RATE Float;
Alter Table M6009_B ADD Column WEEK Date;
// housekeeping tasks to resolve discrepancies between Oracle records and Primavera
budget allocation
Update M6009_B SET M6009_B.SEP = instr (M6009_B.EMPLOYEE, “.");
Update M6009_B SET M6009_B.EMPID = Left (M6009_B.EMPLOYEE, M6009_B.SEP - 1);
Update M6009_B SET M6009_B.SEP = instr (M6009_B.EMPLOYEE,"0");
Update M6009_B SET M6009_B.EMPID = Right( M6009_B.EMPID , ( Len(
M6009_B.EMPID ) - M6009_B.SEP ) ) Where M6009_B.SEP = 1 ;
Update M6009_B SET M6009_B.SEP = instr (M6009_B.EMPLOYEE,"0");
Update M6009_B SET M6009_B.EMPID = Right( M6009_B.EMPID , ( Len(
M6009_B.EMPID ) - M6009_B.SEP ) ) Where M6009_B.SEP = 1 ;
Update M6009_B SET M6009_B.EMPID = "AU" & M6009_B.EMPID;
Update M6009_B, Dept SET M6009_B.EMP_NAME = Dept.NAME Where M6009_B.EMPID =
Dept.EMP_NUM;
Update M6009_B, Dept SET M6009_B.RESOURCE = Dept. RESOURCE Where
M6009_B.EMPID = Dept.EMP_NUM;
Update M6009_B SET M6009_B.RESOURCE = "STM" Where M6009_B.RESOURCE = "MS1";
Update M6009_B SET M6009_B.RESOURCE = "PE6" Where M6009_B.RESOURCE = "GD6";
// add current WIP rate values
Update M6009_B, WIP SET M6009_B.RATE = WIP.SELL_RATE Where M6009_B.RESOURCE =
WIP.RESOURCE;
// Insert Planning budget into Table Budget showing hours and week;
Update M6009_B SET M6009_B.WEEK = #06/11/2010 0:00:00 AM#;
Insert INTO Budget Select M6009_B.TASK , M6009_B.EMP_NAME , M6009_B.RESOURCE ,
M6009_B.WEEK1 AS HOURS , M6009_B.RATE , M6009_B.WEEK FROM M6009_B Where
M6009_B.WEEK1 <> 0.00 ;
Update M6009_B SET M6009_B.WEEK = #06/18/2010 0:00:00 AM#;
Insert INTO Budget Select M6009_B.TASK , M6009_B.EMP_NAME , M6009_B.RESOURCE ,
M6009_B.WEEK2 AS HOURS , M6009_B.RATE , M6009_B.WEEK FROM M6009_B Where
M6009_B.WEEK2 <> 0.00 ;
6. Update M6009_B SET M6009_B.WEEK = #06/25/2010 0:00:00 AM#;
Insert INTO Budget Select M6009_B.TASK , M6009_B.EMP_NAME , M6009_B.RESOURCE ,
M6009_B.WEEK3 AS HOURS , M6009_B.RATE , M6009_B.WEEK FROM M6009_B Where
M6009_B.WEEK3 <> 0.00 ;
The above portion of coding sets the weekly budget values in the database M6009_B and is
used to provide the basis for cost of planned expenditure versus actual expenditure.
At the script file end the following lines of SQL were executed to calculate the budget cost.
Alter Table Budget ADD Column Cost Float;
Alter Table Budget ADD Column ACCUM_WEEK DATE;
Update Budget SET Budget.Cost = Budget.HOURS * Budget. RATE;
A similar arrangement was used to calculate the weekly cost actual value where the actual
hours per week were multiplied against the rate to get actual expenditure.
The planning tool Primavera produced a weekly percentage complete value and the earned
value was calculated by multiplying the budget cost against the percent complete.
The SQL code to produce these values isn’t shown as the files are large and the initial
intention is to show a process that can take raw data and insert it into a file that later is used
to write a report to Excel.
Simplified the process is calculate the budget allocation which will vary as the planner opens
additional tasks and moves a portion of the budget to them.
Calculate the earned value which is how much work is complete as calculated by the planner
and is budget amount.
Calculate the actual cost using timesheet data and incurred non-labour costs.
The information is transferred to a spreadsheet and the budget values, earned values and
actual cost is written to a sheet which can serve as a source for the chart function in Excel.
Below is a representative example of the type of information that is displayable in a chart
after the required information is transferred to a spreadsheet.
There is work involved in setting this up so in general the data is only supplied in a
spreadsheet for the recipient to make their determinations which basically is checking the
project is on track but for a larger project the visual presentation is used for a quick
reference.