SlideShare a Scribd company logo
1 of 13
Introduction
This report discusses the programming process which I would
developed and used to produce the required data suitable for
part two and three. The main measurement that I used to
generate data is one region, particularly in two month period of
time. This period information is required to generate from
particular years 2011, 2012 and 2013. This data contains two
different types of information which are climatic conditions
recorded and power consumption that are related to that period
of time.
Climatic conditions
The program that I developed using C programming related to
weather data was focusing on years. What is supposed to do is
processing a bunch of dataset containing information that is
climatic conditions recorded across various regions. Which
means reducing it down to just the data values that are relevant
or meaningful to the desired region (Auckland) to be able to get
its details on January and February in particular years .The idea
is collecting the 2011+2013 desired information and generating
it in a separate excel file then so on for 2013.
Power consumption
It is the same idea for power consumption, what I accomplished
was using two processes in a huge number of data file to
generate a filtered file. Although that huge file contained only
the required years, there were unwanted months details that
needed to be excluded. The first process was using C codes
programming to get the desired two months by printing out the
first two months of each year. So, during printing process, it
had to be stopped at the end of the second months of each year
and jumping on the following year to complete the process. The
second process was combining every two rows of the filtered
file as each row taken every 5 minutes power consuming
recorded but the requirement was ten minutes reading for each
row.
After achieving all of that processes and generating the filtered
files, we need to use these files information with Weka to
undertake a data modelling task. Then using this modelling task
in different visualization techniques to see how well the
performance of the task predictive is. The following sections
show how to use the generated data both the weather data and
power consumption in data mining and data visualization.
STAT390-14B (Ham): Directed Study Project
Individual Project Focus: Work vs. Play
Project co-ordinator: Associate Professor David Bainbridge
Process the weather data for Auckland in January and February
in the given
dataset (10 minute readings) and experiment with various data
mining
techniques to see if a model can be generated that predicts
power
consumption for Monday-Friday (work), Saturday, and Sunday
(play). Is it
easier to predict the power usage for one of time periods? Trial
having
Saturday and Sunday represented as a single entity (i.e. the
weekend) and as
separate days.
The aim of this directed study project is combine the
programming skills learnt in COMP5002 (BoPP)
with the Data Mining MOOCs that were studied earlier in the
semester at Waikato, and the
JavaScript skills for web use taught in COMP 223 (TGA) in the
A semester of this year.
The central theme to this project—shared across all the projects
being run in this course—is to
investigate the relationship between power usage in New
Zealand, and chronological data (the time
of day and the time of year) and meteorological data (the
weather!) to see if any patterns exist;
more specifically, to see whether the latter information helps
predict the former. Each project
investigates a separate aspect within this theme, applying Data
Mining techniques to publically
available data produced by both Transpower and the National
Institute of Water and Atmospheric
Research (NIWA), from which a range of visualizations will be
generated.
The key steps to the project are:
1. Undertake data cleaning and processing of a rich dataset
containing information that
captures power consumption and climatic conditions recorded
across various regions of
New Zealand.
2. Feed the processed data into Weka to undertake a data
modelling task.
3. Produce a set of visualizations that provide insight into the
generated data
Two types of visualization will be produced: the first is focused
on showing how well the predictive
modelling is performing; the second is a more open-ended task,
with the aim of showing “something
interesting” in the data related to the project’s focus. An
example of “something interesting” could
be a time-based geographical map showing power usage in the
different regions of New Zealand
enriched with what is happening in terms of temperature in the
different regions. For further ideas
see Prof Apperley’s Data Visualization slides, available through
the STAT390 web site:
www.cs.waikato.ac.nz/~davidb/stat390/
The dataset provided for this project (also available for
download through the same web site) is in
the form of a set of Comma Separated Value (CSV) files. The
files span a mixture of years and
locations within New Zealand. While each project is different,
there is one common dimension to
how the data is to be used:
training the Data Mining
models;
establishing the accuracy of the models
developed.
We will now go through and detail what is involved in the three
keys steps to the project. The
schedule (see below) allows 1 week for each of these steps,
although it should be noted there is
some flexibility around this, as long as the final deadlines—a
presentation and a report, due in the
final week—are met. If at any point during the project you wish
to go back to an earlier step and
revise/adjust what you have done, this is not only permissible it
is actively encouraged (!), as it
reflects an increased level of understanding. At the end of each
week, a 2–3 page “mini” report is
requested describing the work you have done that corresponds
to the relevant step in the schedule.
The intention of each mini report is to help you develop a
section of the final report. Feedback on
mini reports submitted according to the schedule will be given
to assist you in developing the final
report.
Step 1: Data processing and cleaning
One of the first things you will need to do in this project is to
process the provided dataset into a
more amenable form, reducing it down to just the data values
that are meaningful to your project.
Example C code is provided on the course web site for reading
in CSV files, breaking each line into
individual fields, and then writing out a selection of those
fields.
The code you need to write needs to go beyond this. The fields
you select will be motivated by what
type of data you have been directed to focus on for the Data
Mining step. You will also need to
develop ways of controlling which lines of the CSV files make
it through to the next stage of
processing: filtered, for example by time, or location—the exact
details again will be determined by
the task you have been assigned in your project.
There are also undefined values to be aware of. These are
typically represented as a hyphen (-) in
the CSV files. Sometimes you might find an entire column will
consist of hyphens (for the particular
lines of the data you have filtered down to), other times most of
the values will be there, with only
the occasional hyphen.
In preparing the way for the Data Mining step, something you
might consider doing is to merge data:
fields (either in rows or columns). For example, 12 power
readings taken every 5 minutes could be
combined to provide an hourly figure instead, which would fit
more nicely with weather data
reported every hour.
Your 2-3 page “mini” report for this step of the project should
detail the decisions you made in how
the data needed to be processed, and how that was
accomplished.
Step 2: Data Mining with Weka
The second step to the project is to load the processed data into
Weka and start experimenting with
the data to develop a model that can predict power usage. To
reiterate what was stated above, use
data from 2011 and 2012 to train your models, and then run it
on the 2013 values (test data) to
establish how accurate the predictions are. While a technique
such as 10-fold cross-validation is a
quick and convenient way to gauge how well a model is
performing (in general)—and you may very
well use this in early stages of testing—the needs of this project
is to produce a model that can go on
to be used to make predictions on other (previously unseen)
data.
It is anticipated that the Explorer tool will be the most likely
sub-system you will work with in Weka,
and within that the Classifier section; however, there are no
hard-and-fast rules here. Use what you
have learned in the Data Mining MOOCs wisely. When data is
“flying around” at speed it is easy to
overlook important details that turns what would otherwise be a
highly successful model to
garbage! Similarly, accidentally including the feature you wish
to learn in the set of attributes used
to train a model is a mistake that is easy enough to make if you
are not careful. In such cases it leads
to results that are amazingly high. If the result you are getting
seem “too good to be true” … that
might very well be exactly what is going on! In short, “know
your data.”
The key ability for this stage of the project is to be able to train
a model on the 2011 and 2012 data,
from which a run can be made against a test set (2013) with the
predictions made saved in a
machine readable form, ready for processing by Step 3 (Data
Visualization)
The “mini” report for this step of the project should provide an
overview of the different methods
you experimented with, along with the one that you established
performed the best, and the
reasons for why that was.
Step 3: Data Visualization
There are two parts to the data visualization step.
Charts which shows how well the
chosen Data Mining model is performing in making predictions
about power usage;
visualization software if desired:
the key requirement for this part of the project is to visually
show something interesting
about the cleaned up and processed data that has been produced.
Data Model Accuracy visualized with Google Charts
Google Charts (https://developers.google.com/chart/) is a web-
based technology for presenting data
in a variety of forms. There are over 25 standard forms to
choose from. See Google’s web site for
extensive documentation, and the course web site for some
selected examples that are more closely
aligned with the needs of the Directed Study project.
Produce “Something Interesting”
For this final part of the project you might choose to visualize
something interesting that has been
produced as a result of your experimentation using Weka, but
equally it might be something that is
already present in the data produced in Step 1 of the project (no
Data Mining required).
The overall intention for this part of the project is to think back
to (and look back at, since the slides
are on the course web site!) the Data Visualization examples
given by Prof Apperley in the first week
of the project, and be inspired by this to produce a visualization
that shows “something interesting”
in the dataset you have been working with.
To achieve this, the scope for this part of the project can be
widened. For example, if the focus of
the project had been to compare extremes of latitude, using
Auckland and Invercargill as the two
extremes, then in the “something interesting” visualization it is
permissible to broaden this to other
centres: the visualization produced could be, say, a map of New
Zealand showing power-usage and
temperature data across all the main centres of population in the
country, over time. Going further,
if the visuals drawn on the map per centre make more sense if
normalized by city population, then
this information too can be added in to the dataset used to
produce the visualization.
Given such an open ended brief, if you are at all unsure what to
attempt for this final part of the
project then please consult with me for guidance as to what is a
reasonable expectation.
As a final remark, for this visualization do not feel constrained
to working with Google Charts
(although that is a valid option). There are several interactive
Data Visualizations resources on-line,
such as IBM’s ManyEyes, (http://www.ibm.com/manyeyes) that
allow you to upload datasets to
their web-site from which you can then develop your
visualizations.
Schedule
-
5pm G1.15)
-report on Step 1 submitted
: Feedback on mini-reports can be collected
from department office
-
5pm G1.15)
-report on Step 2 submitted
-reports can be collected
from department office
Deadlines
produced (3-5pm G1.15)

More Related Content

Similar to IntroductionThis report discusses the programming process whic.docx

Big Data on Implementation of Many to Many Clustering
Big Data on Implementation of Many to Many ClusteringBig Data on Implementation of Many to Many Clustering
Big Data on Implementation of Many to Many Clusteringpaperpublications3
 
Amplitude wave architecture - Test
Amplitude wave architecture - TestAmplitude wave architecture - Test
Amplitude wave architecture - TestKiran Naiga
 
Introduction to object oriented language
Introduction to object oriented languageIntroduction to object oriented language
Introduction to object oriented languagefarhan amjad
 
Access. Analyze. Act.
Access. Analyze. Act. Access. Analyze. Act.
Access. Analyze. Act. Pivvot
 
Scalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesScalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesIRJET Journal
 
Data Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with CloudData Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with CloudIJAAS Team
 
Report Out Presentation - Air(2nd)
Report Out Presentation - Air(2nd)Report Out Presentation - Air(2nd)
Report Out Presentation - Air(2nd)Sarah Hartman
 
Developing project objectives and Execution plan in Economy management
Developing project objectives and Execution plan in Economy management Developing project objectives and Execution plan in Economy management
Developing project objectives and Execution plan in Economy management Nzar Braim
 
Fast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisFast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisIRJET Journal
 
Iwsm2014 performance measurement for cloud computing applications using iso...
Iwsm2014   performance measurement for cloud computing applications using iso...Iwsm2014   performance measurement for cloud computing applications using iso...
Iwsm2014 performance measurement for cloud computing applications using iso...Nesma
 
Introduction to SQL Report tool
Introduction to SQL Report toolIntroduction to SQL Report tool
Introduction to SQL Report toolRussell Frearson
 
MODERN DATA PIPELINE
MODERN DATA PIPELINEMODERN DATA PIPELINE
MODERN DATA PIPELINEIRJET Journal
 
MineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperMineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperDerek Diamond
 
IRJET- Big Data Processes and Analysis using Hadoop Framework
IRJET- Big Data Processes and Analysis using Hadoop FrameworkIRJET- Big Data Processes and Analysis using Hadoop Framework
IRJET- Big Data Processes and Analysis using Hadoop FrameworkIRJET Journal
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaNithin Kakkireni
 
Data mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configurationData mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configurationijcsit
 

Similar to IntroductionThis report discusses the programming process whic.docx (20)

Big Data on Implementation of Many to Many Clustering
Big Data on Implementation of Many to Many ClusteringBig Data on Implementation of Many to Many Clustering
Big Data on Implementation of Many to Many Clustering
 
Amplitude wave architecture - Test
Amplitude wave architecture - TestAmplitude wave architecture - Test
Amplitude wave architecture - Test
 
Introduction to object oriented language
Introduction to object oriented languageIntroduction to object oriented language
Introduction to object oriented language
 
Access. Analyze. Act.
Access. Analyze. Act. Access. Analyze. Act.
Access. Analyze. Act.
 
Scalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesScalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehouses
 
Data Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with CloudData Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with Cloud
 
Report Out Presentation - Air(2nd)
Report Out Presentation - Air(2nd)Report Out Presentation - Air(2nd)
Report Out Presentation - Air(2nd)
 
Developing project objectives and Execution plan in Economy management
Developing project objectives and Execution plan in Economy management Developing project objectives and Execution plan in Economy management
Developing project objectives and Execution plan in Economy management
 
Poster
PosterPoster
Poster
 
Resume (1)
Resume (1)Resume (1)
Resume (1)
 
Resume (1)
Resume (1)Resume (1)
Resume (1)
 
Fast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisFast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data Analysis
 
Iwsm2014 performance measurement for cloud computing applications using iso...
Iwsm2014   performance measurement for cloud computing applications using iso...Iwsm2014   performance measurement for cloud computing applications using iso...
Iwsm2014 performance measurement for cloud computing applications using iso...
 
Introduction to SQL Report tool
Introduction to SQL Report toolIntroduction to SQL Report tool
Introduction to SQL Report tool
 
MODERN DATA PIPELINE
MODERN DATA PIPELINEMODERN DATA PIPELINE
MODERN DATA PIPELINE
 
MineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperMineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White Paper
 
Project Report (Summer 2016)
Project Report (Summer 2016)Project Report (Summer 2016)
Project Report (Summer 2016)
 
IRJET- Big Data Processes and Analysis using Hadoop Framework
IRJET- Big Data Processes and Analysis using Hadoop FrameworkIRJET- Big Data Processes and Analysis using Hadoop Framework
IRJET- Big Data Processes and Analysis using Hadoop Framework
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_Sharmila
 
Data mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configurationData mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configuration
 

More from mariuse18nolet

IRM 3305 Risk Management Theory and PracticeFall 2014Proje.docx
IRM 3305 Risk Management Theory and PracticeFall 2014Proje.docxIRM 3305 Risk Management Theory and PracticeFall 2014Proje.docx
IRM 3305 Risk Management Theory and PracticeFall 2014Proje.docxmariuse18nolet
 
Ironwood Company manufactures cast-iron barbeque cookware. During .docx
Ironwood Company manufactures cast-iron barbeque cookware. During .docxIronwood Company manufactures cast-iron barbeque cookware. During .docx
Ironwood Company manufactures cast-iron barbeque cookware. During .docxmariuse18nolet
 
IRM 3305 Risk Management Theory and PracticeGroup Project.docx
IRM 3305 Risk Management Theory and PracticeGroup Project.docxIRM 3305 Risk Management Theory and PracticeGroup Project.docx
IRM 3305 Risk Management Theory and PracticeGroup Project.docxmariuse18nolet
 
Iranian Women and GenderRelations in Los AngelesNAYEREH .docx
Iranian Women and GenderRelations in Los AngelesNAYEREH .docxIranian Women and GenderRelations in Los AngelesNAYEREH .docx
Iranian Women and GenderRelations in Los AngelesNAYEREH .docxmariuse18nolet
 
IRB HANDBOOK IRB A-Z Handbook E.docx
IRB HANDBOOK IRB A-Z Handbook  E.docxIRB HANDBOOK IRB A-Z Handbook  E.docx
IRB HANDBOOK IRB A-Z Handbook E.docxmariuse18nolet
 
IQuiz # II-Emerson QuizGeneral For Emerson, truth (or.docx
IQuiz # II-Emerson QuizGeneral For Emerson, truth (or.docxIQuiz # II-Emerson QuizGeneral For Emerson, truth (or.docx
IQuiz # II-Emerson QuizGeneral For Emerson, truth (or.docxmariuse18nolet
 
iPython 2For Beginners OnlyVersion 1.0Matthew .docx
iPython 2For Beginners OnlyVersion 1.0Matthew .docxiPython 2For Beginners OnlyVersion 1.0Matthew .docx
iPython 2For Beginners OnlyVersion 1.0Matthew .docxmariuse18nolet
 
Iranian Journal of Military Medicine Spring 2011, Volume 13, .docx
Iranian Journal of Military Medicine  Spring 2011, Volume 13, .docxIranian Journal of Military Medicine  Spring 2011, Volume 13, .docx
Iranian Journal of Military Medicine Spring 2011, Volume 13, .docxmariuse18nolet
 
IoT Referenceshttpswww.techrepublic.comarticlehow-to-secur.docx
IoT Referenceshttpswww.techrepublic.comarticlehow-to-secur.docxIoT Referenceshttpswww.techrepublic.comarticlehow-to-secur.docx
IoT Referenceshttpswww.techrepublic.comarticlehow-to-secur.docxmariuse18nolet
 
IP Subnet Design Project- ONLY QUALITY ASSIGNMENTS AND 0 PLAG.docx
IP Subnet Design Project- ONLY QUALITY ASSIGNMENTS AND 0 PLAG.docxIP Subnet Design Project- ONLY QUALITY ASSIGNMENTS AND 0 PLAG.docx
IP Subnet Design Project- ONLY QUALITY ASSIGNMENTS AND 0 PLAG.docxmariuse18nolet
 
IranAyatollahTheocracyTwelver ShiismVilayat-e Faghih (jur.docx
IranAyatollahTheocracyTwelver ShiismVilayat-e Faghih (jur.docxIranAyatollahTheocracyTwelver ShiismVilayat-e Faghih (jur.docx
IranAyatollahTheocracyTwelver ShiismVilayat-e Faghih (jur.docxmariuse18nolet
 
ipopulation monitoring in radiation emergencies a gui.docx
ipopulation monitoring in radiation emergencies a gui.docxipopulation monitoring in radiation emergencies a gui.docx
ipopulation monitoring in radiation emergencies a gui.docxmariuse18nolet
 
In Innovation as Usual How to Help Your People Bring Great Ideas .docx
In Innovation as Usual How to Help Your People Bring Great Ideas .docxIn Innovation as Usual How to Help Your People Bring Great Ideas .docx
In Innovation as Usual How to Help Your People Bring Great Ideas .docxmariuse18nolet
 
Investor’s Business Daily – Investors.comBloomberg Business – Blo.docx
Investor’s Business Daily –  Investors.comBloomberg Business – Blo.docxInvestor’s Business Daily –  Investors.comBloomberg Business – Blo.docx
Investor’s Business Daily – Investors.comBloomberg Business – Blo.docxmariuse18nolet
 
Invitation to Public Speaking, Fifth EditionChapter 8 Introdu.docx
Invitation to Public Speaking, Fifth EditionChapter 8 Introdu.docxInvitation to Public Speaking, Fifth EditionChapter 8 Introdu.docx
Invitation to Public Speaking, Fifth EditionChapter 8 Introdu.docxmariuse18nolet
 
Invitation to the Life SpanRead chapters 13 and 14.Objectives.docx
Invitation to the Life SpanRead chapters 13 and 14.Objectives.docxInvitation to the Life SpanRead chapters 13 and 14.Objectives.docx
Invitation to the Life SpanRead chapters 13 and 14.Objectives.docxmariuse18nolet
 
IOBOARD Week 2 Lab BPage 2 of 4Name _________________ Gr.docx
IOBOARD Week 2 Lab BPage 2 of 4Name _________________ Gr.docxIOBOARD Week 2 Lab BPage 2 of 4Name _________________ Gr.docx
IOBOARD Week 2 Lab BPage 2 of 4Name _________________ Gr.docxmariuse18nolet
 
INVITATION TO Computer Science 1 1 Chapter 17 Making .docx
INVITATION TO  Computer Science 1 1 Chapter 17 Making .docxINVITATION TO  Computer Science 1 1 Chapter 17 Making .docx
INVITATION TO Computer Science 1 1 Chapter 17 Making .docxmariuse18nolet
 
Investment Analysis & Portfolio Management AD 717 OLHomework E.docx
Investment Analysis & Portfolio Management AD 717 OLHomework E.docxInvestment Analysis & Portfolio Management AD 717 OLHomework E.docx
Investment Analysis & Portfolio Management AD 717 OLHomework E.docxmariuse18nolet
 
Investment BAFI 1042 Kevin Dorr 3195598 GOODMAN .docx
Investment BAFI 1042  Kevin Dorr 3195598  GOODMAN .docxInvestment BAFI 1042  Kevin Dorr 3195598  GOODMAN .docx
Investment BAFI 1042 Kevin Dorr 3195598 GOODMAN .docxmariuse18nolet
 

More from mariuse18nolet (20)

IRM 3305 Risk Management Theory and PracticeFall 2014Proje.docx
IRM 3305 Risk Management Theory and PracticeFall 2014Proje.docxIRM 3305 Risk Management Theory and PracticeFall 2014Proje.docx
IRM 3305 Risk Management Theory and PracticeFall 2014Proje.docx
 
Ironwood Company manufactures cast-iron barbeque cookware. During .docx
Ironwood Company manufactures cast-iron barbeque cookware. During .docxIronwood Company manufactures cast-iron barbeque cookware. During .docx
Ironwood Company manufactures cast-iron barbeque cookware. During .docx
 
IRM 3305 Risk Management Theory and PracticeGroup Project.docx
IRM 3305 Risk Management Theory and PracticeGroup Project.docxIRM 3305 Risk Management Theory and PracticeGroup Project.docx
IRM 3305 Risk Management Theory and PracticeGroup Project.docx
 
Iranian Women and GenderRelations in Los AngelesNAYEREH .docx
Iranian Women and GenderRelations in Los AngelesNAYEREH .docxIranian Women and GenderRelations in Los AngelesNAYEREH .docx
Iranian Women and GenderRelations in Los AngelesNAYEREH .docx
 
IRB HANDBOOK IRB A-Z Handbook E.docx
IRB HANDBOOK IRB A-Z Handbook  E.docxIRB HANDBOOK IRB A-Z Handbook  E.docx
IRB HANDBOOK IRB A-Z Handbook E.docx
 
IQuiz # II-Emerson QuizGeneral For Emerson, truth (or.docx
IQuiz # II-Emerson QuizGeneral For Emerson, truth (or.docxIQuiz # II-Emerson QuizGeneral For Emerson, truth (or.docx
IQuiz # II-Emerson QuizGeneral For Emerson, truth (or.docx
 
iPython 2For Beginners OnlyVersion 1.0Matthew .docx
iPython 2For Beginners OnlyVersion 1.0Matthew .docxiPython 2For Beginners OnlyVersion 1.0Matthew .docx
iPython 2For Beginners OnlyVersion 1.0Matthew .docx
 
Iranian Journal of Military Medicine Spring 2011, Volume 13, .docx
Iranian Journal of Military Medicine  Spring 2011, Volume 13, .docxIranian Journal of Military Medicine  Spring 2011, Volume 13, .docx
Iranian Journal of Military Medicine Spring 2011, Volume 13, .docx
 
IoT Referenceshttpswww.techrepublic.comarticlehow-to-secur.docx
IoT Referenceshttpswww.techrepublic.comarticlehow-to-secur.docxIoT Referenceshttpswww.techrepublic.comarticlehow-to-secur.docx
IoT Referenceshttpswww.techrepublic.comarticlehow-to-secur.docx
 
IP Subnet Design Project- ONLY QUALITY ASSIGNMENTS AND 0 PLAG.docx
IP Subnet Design Project- ONLY QUALITY ASSIGNMENTS AND 0 PLAG.docxIP Subnet Design Project- ONLY QUALITY ASSIGNMENTS AND 0 PLAG.docx
IP Subnet Design Project- ONLY QUALITY ASSIGNMENTS AND 0 PLAG.docx
 
IranAyatollahTheocracyTwelver ShiismVilayat-e Faghih (jur.docx
IranAyatollahTheocracyTwelver ShiismVilayat-e Faghih (jur.docxIranAyatollahTheocracyTwelver ShiismVilayat-e Faghih (jur.docx
IranAyatollahTheocracyTwelver ShiismVilayat-e Faghih (jur.docx
 
ipopulation monitoring in radiation emergencies a gui.docx
ipopulation monitoring in radiation emergencies a gui.docxipopulation monitoring in radiation emergencies a gui.docx
ipopulation monitoring in radiation emergencies a gui.docx
 
In Innovation as Usual How to Help Your People Bring Great Ideas .docx
In Innovation as Usual How to Help Your People Bring Great Ideas .docxIn Innovation as Usual How to Help Your People Bring Great Ideas .docx
In Innovation as Usual How to Help Your People Bring Great Ideas .docx
 
Investor’s Business Daily – Investors.comBloomberg Business – Blo.docx
Investor’s Business Daily –  Investors.comBloomberg Business – Blo.docxInvestor’s Business Daily –  Investors.comBloomberg Business – Blo.docx
Investor’s Business Daily – Investors.comBloomberg Business – Blo.docx
 
Invitation to Public Speaking, Fifth EditionChapter 8 Introdu.docx
Invitation to Public Speaking, Fifth EditionChapter 8 Introdu.docxInvitation to Public Speaking, Fifth EditionChapter 8 Introdu.docx
Invitation to Public Speaking, Fifth EditionChapter 8 Introdu.docx
 
Invitation to the Life SpanRead chapters 13 and 14.Objectives.docx
Invitation to the Life SpanRead chapters 13 and 14.Objectives.docxInvitation to the Life SpanRead chapters 13 and 14.Objectives.docx
Invitation to the Life SpanRead chapters 13 and 14.Objectives.docx
 
IOBOARD Week 2 Lab BPage 2 of 4Name _________________ Gr.docx
IOBOARD Week 2 Lab BPage 2 of 4Name _________________ Gr.docxIOBOARD Week 2 Lab BPage 2 of 4Name _________________ Gr.docx
IOBOARD Week 2 Lab BPage 2 of 4Name _________________ Gr.docx
 
INVITATION TO Computer Science 1 1 Chapter 17 Making .docx
INVITATION TO  Computer Science 1 1 Chapter 17 Making .docxINVITATION TO  Computer Science 1 1 Chapter 17 Making .docx
INVITATION TO Computer Science 1 1 Chapter 17 Making .docx
 
Investment Analysis & Portfolio Management AD 717 OLHomework E.docx
Investment Analysis & Portfolio Management AD 717 OLHomework E.docxInvestment Analysis & Portfolio Management AD 717 OLHomework E.docx
Investment Analysis & Portfolio Management AD 717 OLHomework E.docx
 
Investment BAFI 1042 Kevin Dorr 3195598 GOODMAN .docx
Investment BAFI 1042  Kevin Dorr 3195598  GOODMAN .docxInvestment BAFI 1042  Kevin Dorr 3195598  GOODMAN .docx
Investment BAFI 1042 Kevin Dorr 3195598 GOODMAN .docx
 

Recently uploaded

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 

Recently uploaded (20)

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 

IntroductionThis report discusses the programming process whic.docx

  • 1. Introduction This report discusses the programming process which I would developed and used to produce the required data suitable for part two and three. The main measurement that I used to generate data is one region, particularly in two month period of time. This period information is required to generate from particular years 2011, 2012 and 2013. This data contains two different types of information which are climatic conditions recorded and power consumption that are related to that period of time. Climatic conditions The program that I developed using C programming related to weather data was focusing on years. What is supposed to do is processing a bunch of dataset containing information that is climatic conditions recorded across various regions. Which means reducing it down to just the data values that are relevant or meaningful to the desired region (Auckland) to be able to get its details on January and February in particular years .The idea is collecting the 2011+2013 desired information and generating it in a separate excel file then so on for 2013. Power consumption It is the same idea for power consumption, what I accomplished was using two processes in a huge number of data file to generate a filtered file. Although that huge file contained only the required years, there were unwanted months details that needed to be excluded. The first process was using C codes programming to get the desired two months by printing out the first two months of each year. So, during printing process, it had to be stopped at the end of the second months of each year and jumping on the following year to complete the process. The second process was combining every two rows of the filtered file as each row taken every 5 minutes power consuming recorded but the requirement was ten minutes reading for each
  • 2. row. After achieving all of that processes and generating the filtered files, we need to use these files information with Weka to undertake a data modelling task. Then using this modelling task in different visualization techniques to see how well the performance of the task predictive is. The following sections show how to use the generated data both the weather data and power consumption in data mining and data visualization. STAT390-14B (Ham): Directed Study Project Individual Project Focus: Work vs. Play Project co-ordinator: Associate Professor David Bainbridge Process the weather data for Auckland in January and February in the given dataset (10 minute readings) and experiment with various data mining techniques to see if a model can be generated that predicts power consumption for Monday-Friday (work), Saturday, and Sunday (play). Is it easier to predict the power usage for one of time periods? Trial having Saturday and Sunday represented as a single entity (i.e. the
  • 3. weekend) and as separate days. The aim of this directed study project is combine the programming skills learnt in COMP5002 (BoPP) with the Data Mining MOOCs that were studied earlier in the semester at Waikato, and the JavaScript skills for web use taught in COMP 223 (TGA) in the A semester of this year. The central theme to this project—shared across all the projects being run in this course—is to investigate the relationship between power usage in New Zealand, and chronological data (the time of day and the time of year) and meteorological data (the weather!) to see if any patterns exist; more specifically, to see whether the latter information helps predict the former. Each project investigates a separate aspect within this theme, applying Data Mining techniques to publically available data produced by both Transpower and the National Institute of Water and Atmospheric Research (NIWA), from which a range of visualizations will be generated. The key steps to the project are:
  • 4. 1. Undertake data cleaning and processing of a rich dataset containing information that captures power consumption and climatic conditions recorded across various regions of New Zealand. 2. Feed the processed data into Weka to undertake a data modelling task. 3. Produce a set of visualizations that provide insight into the generated data Two types of visualization will be produced: the first is focused on showing how well the predictive modelling is performing; the second is a more open-ended task, with the aim of showing “something interesting” in the data related to the project’s focus. An example of “something interesting” could be a time-based geographical map showing power usage in the different regions of New Zealand enriched with what is happening in terms of temperature in the different regions. For further ideas see Prof Apperley’s Data Visualization slides, available through the STAT390 web site: www.cs.waikato.ac.nz/~davidb/stat390/
  • 5. The dataset provided for this project (also available for download through the same web site) is in the form of a set of Comma Separated Value (CSV) files. The files span a mixture of years and locations within New Zealand. While each project is different, there is one common dimension to how the data is to be used: training the Data Mining models; establishing the accuracy of the models developed. We will now go through and detail what is involved in the three keys steps to the project. The schedule (see below) allows 1 week for each of these steps, although it should be noted there is some flexibility around this, as long as the final deadlines—a presentation and a report, due in the final week—are met. If at any point during the project you wish to go back to an earlier step and revise/adjust what you have done, this is not only permissible it is actively encouraged (!), as it
  • 6. reflects an increased level of understanding. At the end of each week, a 2–3 page “mini” report is requested describing the work you have done that corresponds to the relevant step in the schedule. The intention of each mini report is to help you develop a section of the final report. Feedback on mini reports submitted according to the schedule will be given to assist you in developing the final report. Step 1: Data processing and cleaning One of the first things you will need to do in this project is to process the provided dataset into a more amenable form, reducing it down to just the data values that are meaningful to your project. Example C code is provided on the course web site for reading in CSV files, breaking each line into individual fields, and then writing out a selection of those fields. The code you need to write needs to go beyond this. The fields you select will be motivated by what type of data you have been directed to focus on for the Data Mining step. You will also need to develop ways of controlling which lines of the CSV files make it through to the next stage of
  • 7. processing: filtered, for example by time, or location—the exact details again will be determined by the task you have been assigned in your project. There are also undefined values to be aware of. These are typically represented as a hyphen (-) in the CSV files. Sometimes you might find an entire column will consist of hyphens (for the particular lines of the data you have filtered down to), other times most of the values will be there, with only the occasional hyphen. In preparing the way for the Data Mining step, something you might consider doing is to merge data: fields (either in rows or columns). For example, 12 power readings taken every 5 minutes could be combined to provide an hourly figure instead, which would fit more nicely with weather data reported every hour. Your 2-3 page “mini” report for this step of the project should detail the decisions you made in how the data needed to be processed, and how that was accomplished.
  • 8. Step 2: Data Mining with Weka The second step to the project is to load the processed data into Weka and start experimenting with the data to develop a model that can predict power usage. To reiterate what was stated above, use data from 2011 and 2012 to train your models, and then run it on the 2013 values (test data) to establish how accurate the predictions are. While a technique such as 10-fold cross-validation is a quick and convenient way to gauge how well a model is performing (in general)—and you may very well use this in early stages of testing—the needs of this project is to produce a model that can go on to be used to make predictions on other (previously unseen) data. It is anticipated that the Explorer tool will be the most likely sub-system you will work with in Weka, and within that the Classifier section; however, there are no hard-and-fast rules here. Use what you have learned in the Data Mining MOOCs wisely. When data is “flying around” at speed it is easy to overlook important details that turns what would otherwise be a highly successful model to garbage! Similarly, accidentally including the feature you wish
  • 9. to learn in the set of attributes used to train a model is a mistake that is easy enough to make if you are not careful. In such cases it leads to results that are amazingly high. If the result you are getting seem “too good to be true” … that might very well be exactly what is going on! In short, “know your data.” The key ability for this stage of the project is to be able to train a model on the 2011 and 2012 data, from which a run can be made against a test set (2013) with the predictions made saved in a machine readable form, ready for processing by Step 3 (Data Visualization) The “mini” report for this step of the project should provide an overview of the different methods you experimented with, along with the one that you established performed the best, and the reasons for why that was. Step 3: Data Visualization There are two parts to the data visualization step. Charts which shows how well the chosen Data Mining model is performing in making predictions
  • 10. about power usage; visualization software if desired: the key requirement for this part of the project is to visually show something interesting about the cleaned up and processed data that has been produced. Data Model Accuracy visualized with Google Charts Google Charts (https://developers.google.com/chart/) is a web- based technology for presenting data in a variety of forms. There are over 25 standard forms to choose from. See Google’s web site for extensive documentation, and the course web site for some selected examples that are more closely aligned with the needs of the Directed Study project. Produce “Something Interesting” For this final part of the project you might choose to visualize something interesting that has been produced as a result of your experimentation using Weka, but equally it might be something that is already present in the data produced in Step 1 of the project (no Data Mining required).
  • 11. The overall intention for this part of the project is to think back to (and look back at, since the slides are on the course web site!) the Data Visualization examples given by Prof Apperley in the first week of the project, and be inspired by this to produce a visualization that shows “something interesting” in the dataset you have been working with. To achieve this, the scope for this part of the project can be widened. For example, if the focus of the project had been to compare extremes of latitude, using Auckland and Invercargill as the two extremes, then in the “something interesting” visualization it is permissible to broaden this to other centres: the visualization produced could be, say, a map of New Zealand showing power-usage and temperature data across all the main centres of population in the country, over time. Going further, if the visuals drawn on the map per centre make more sense if normalized by city population, then this information too can be added in to the dataset used to produce the visualization. Given such an open ended brief, if you are at all unsure what to attempt for this final part of the project then please consult with me for guidance as to what is a
  • 12. reasonable expectation. As a final remark, for this visualization do not feel constrained to working with Google Charts (although that is a valid option). There are several interactive Data Visualizations resources on-line, such as IBM’s ManyEyes, (http://www.ibm.com/manyeyes) that allow you to upload datasets to their web-site from which you can then develop your visualizations. Schedule - 5pm G1.15) -report on Step 1 submitted : Feedback on mini-reports can be collected from department office - 5pm G1.15) -report on Step 2 submitted -reports can be collected from department office Deadlines