SlideShare a Scribd company logo
1 of 7
Download to read offline
Application of Data Warehousing & Data Mining to Exploitation
         for Supporting the Planning of Higher Education System in Sri Lanka


                                 M.G.N.A.S. Fernando and G.N. Wikramanayake
                                   University of Colombo School of Computing
                                           {nas, gnw}@ucsc.cmb.ac.lk



ABSTRACT                                                     varies reasons [6, 7]. Although many changes have been
The topic of data warehousing encompasses                    introduced to the education system the graduate
architectures, algorithms, and tools for brining together    unemployment problem have still not been resolved. We
selected data from multiple databases or other               feel the lack of integrated information for decision
information sources into a single repository called a data   making as one of the core reasons. Policy decisions made
warehouse, suitable for direct querying and analysis.        years ago are being carried forward due to political
We have designed a data warehouse for the Sri Lanka          pressures and lack of data to convince the need for such a
education system and applied basic data mining               change.
techniques (i.e. data cleaning, data integration, data
selection, data transformation, data mining, pattern         One of the contributing factors is the mismatch between
evaluation, knowledge representation) to support             the state and private sector requirements for employment.
decision making activities. For this we have built an        Many students seeking alternative non-higher education
integrated data warehouse consisting data from Dept. of      paths have been successful than some of the state
Examination, University Grants Commission, School            graduates who have been ending up as unemployed. The
Census data, national population data and University         present education policies and the university selection
student’s information.                                       criteria could be contributing towards producing
This paper highlights how the data warehouse was built       unemployed graduates.
for the Sri Lanka education system and how it was used       This paper investigates how such issues could be
to create data summary cubes for data analysis and
                                                             addressed using basic data mining techniques. Our
mining process. At present using this developed system,
basic level of summaries and analysis can be performed       investigation is based on designing a data warehouse and
to obtain for decision support information. Further          applying data mining techniques to obtain information
applying data mining techniques and advanced queries,        for decision making [10] and hence assist the policy
we can obtain the necessary knowledge for policy             makers. This process can be seen as in figure 1.
marking as well.


                                                                                               Pattern
1.0      INTRODUCTION                                                                          Evaluation

In Sri Lanka, there are limited opportunities to admission                             Data Mining
to the state universities. From 40% to 50% students who                      Task-relevant
pass the G.C.E. (A/L) examination only 13% to 17%                            Data

students can enter to the state university [8]. Among           Data                   Selection
them although about 90% graduates annually [9], only            Warehouse
27% are able to find employment [5]. Thus the state            Data
universities contribute towards increasing the                 Cleaning
unemployment rate with 70% from the Arts Stream [5].                        Data Integration

University admission policy was reviewed in 1984 &                                    Databases
1987 and some of the policies were implemented                              Figure 1: Data Mining Process
successfully while others are yet not implemented due to
2.0      DATA WAREHOUSE                                                                    Bottom tier         Middle tier   Top tier

                                                                                             Monitor
Data can now be stored in many different types of                other                          &              OLAP Server
databases. One database architecture that has recently           sources                   Administration

emerged is data warehouse, a repository of multiple                                                                          Analysis
heterogeneous data sources, organized under a unified            Operational   Extract                                       Query
                                                                 DBs           Transform      Data              Output       Reports
schema at a single site in order to facilitate management                      Load
                                                                                            Warehouse
                                                                               Refresh                                       Data mining
decision-making [2, 3]. Data warehouse technology
includes data cleaning, data integrating, and on-line
analytical processing (OLAP) that is, analysis techniques
with functionalities such as summarization, consolidation
and aggregation, as well as the ability to view
information from different angles.                             Data Sources         Data Storage            OLAP Engine Front-End Tools

A data warehouse is a “subject-oriented, integrated, time                      Figure 2: Three-tier Architecture
variant, non-volatile collection of data that serves as a
physical implementation of a decision support data             Analysing and query processing of huge data warehouse
model and stores the information on which an enterprise        is very difficult and time-consuming task. Therefore
needs to make strategic decisions.                             multi dimensional data cubes consisting of summary
                                                               tables are created for all possible decision marking
In data warehouses historical, summarized and
                                                               activities   [4].    The    middle     tier   implements
consolidated data is more important than detailed,
                                                               multidimensional data and operations and an OLAP
individual records. Since data warehouses contain
                                                               server is typically used for that. The top tier is a client
consolidated data, perhaps from several operational
                                                               who uses front-end tools to obtain information.
databases, over potentially long periods of time, they
tend to be orders of magnitude larger than operational         2.4         Data Mining
databases. The workloads are query intensive with
mostly ad hoc, complex queries that can access millions        Data mining is the process of applying intelligent
of records and perform a lot of scans, joins, and              methods to extract data patterns. This is done using the
aggregates. Query throughput and response times are            front-end tools. The spreadsheet is still the most
more important than transaction throughput.                    compiling front-end application for OLAP. The
                                                               challenges in supporting a query environment for OLAP
2.1      Multi-dimensional Data                                can be crudely summarized as that of supporting
                                                               spreadsheet operation effectively over large multi-
To facilitate complex analyses and visualization, the data
                                                               gigabytes databases. Indeed, the essbase product of
in a warehouse is typically modelled multi-
                                                               Arbar cooperation uses Microsoft Excel as the front-end
dimensionally. For example, in a sales data warehouse,
                                                               tools for its multidimensional engine [1].
time of sale, sales district, salesperson, and product might
be some of the dimensions of interest. Often, these            3.0         EDUCATION DATA WAREHOUSE
dimensions are hierarchical, e.g. time of sale may be
organized as a day-month-quarter-year hierarchy.               3.1         Data Sources
2.2      Data Analysis                                         The Education Ministry conducts annual school census
                                                               through their Zonal Education Offices. They have data
Typically data analysis is done through OLAP operations        related to number of students in different streams of
such as rollup (increasing level of aggregation) and drill-    study for each school. The Dept. of Examinations
down (decreasing level of aggregation or increasing            conducts all national examinations and hence they have
details) along one or more dimension hierarchies, slice-       about the year 5 scholarship examination, G.C.E. (O/L)
and-dice (selection and projection) and pivot (re-             and G.C.E. (A/L) examinations. The University Grants
orienting the multidimensional view of data).                  Commission (UGC) process the university admissions
                                                               and they have data related to the students admitted to
2.3      Data Warehouse Architecture                           each university for various streams of study. The
Data warehouses are built using three-tier architecture as     university admission process is based on the national
shown in figure 2. They are constructed via process of         population and this data is maintained by the Dept. of
data cleaning, data transformation, data integration, data     Census. Each university has the academic performances
loading and periodic data refreshing. Data warehouse is a      of their students. However an important source of
stored under a unified schema, and it usually resides at a     information of what the graduates are doing after
single site. This is the bottom tier of the architecture and   completing the degree is not maintained by anybody,
is managed by a data warehouse server.
except for time to time the government collects               Usually missing data are identified at this stage and
applications from unemployed graduates.                       necessary actions are taken. These data are either
                                                              ignored, filled manually or automatically using a global
The way the data is maintained it is not possible to link
                                                              constraint such as unknown or infinity, mean value or
this data to individuals such as how a person has
                                                              probable value. We came across missing values such as
performed from school time to university. However,
                                                              year, university etc. for some of the unemployed data.
characteristics identified at each stage could be linked to
                                                              Presently we have opted for the ignore option to deal
obtain valuable information such as industry acceptance
                                                              with such cases.
for an average rural student.
                                                              Identifying noisy data and smoothing them is also part of
3.2          Design the Data Warehouses                       the data cleaning process. This may not apply for most
The data of the above sources has to be cleaned and           educational data. Detecting and correcting data
transformed so that the different database structures         inconsistencies is also part of data cleaning.
could be integrated as shown in figure 3. Thereafter the      3.2.2    Data Integration
data from the relevant data sources have to be loaded to
the integrated data warehouse and later on periodically       Data integration is the next important step. All our data
updated through import data from the respective data          sources use the data mainly to produce annual
sources.                                                      summaries. Hence the data of different years are not
                                                              checked for consistency. Also the data of different years
                                                              need not have the same database structures. This makes
                                                              the integration process over several years more difficult.
         1                                                    To update the data warehouse, same format should be
                                                              consider, therefore after modifying the data structures
         2                                                    and after performing relevant processing we had to
                 Extract
                 Transform                                    construct our data warehouse.
         3       Load          Data
                                                              3.2.3    Data Selection
                 Refresh     Warehouse
         4                               Analysis Query       All the fields of the operational databases are not
                                          Reports Data        necessary for the data warehouse. Most important fields
                                            mining
                                                              for the decision marking activities are selected and
         5
                                                              updated the warehouse. For an example All Island Ranks,
      Data Sources                                            District Ranks are available in the Department of
                                                              Examinations data source and it will not update the data
             Figure 3: Education Data Warehouse               warehouse as these can be derived as well as the rank
                                                              change if some does not apply. Details such as student
Data warehousing systems use a variety of data                details are not required for decision making.
extraction and clearing tools, and refresh utilities for      3.2.4    Data Transformation
populating warehouses. Data extraction from different
location is usually implemented gateways and standard         There were so many coding differences in the data
interfaces (such as information builders SQL, ODBC,           sources. Codes used for district, school etc. by the
Oracle open connect Informix enterprise gateways). We         different sources were quite different (e.g., in the census
used extraction tool available in SQL Server.                 population, district code for Matale and Kandy were 4
                                                              and 5 respectively, while the Examination department
3.2.1        Data Cleaning                                    and UGC had used Kandy and Matale as 4 and 5
A data warehouse is used for decision marking. It is          respectively). Thus matching the structures only was not
important that the data in the warehouse is accurate. As      sufficient. Another common mismatch was the use of
large volumes of data from multiple sources are involved      different data types and field lengths (e.g. actual size of
there is a high probability of errors and anomalies in the    the district code is two digits but some sources had
data.                                                         declared it as a string and some sources it declared as
                                                              number with 8-digit length). We used SPSS to convert
All our data sources had used internal controls to validate   the different data format into one form as well as
their data. Also data of the Dept. of Examination and         recoding purposes.
UGC has to ensure 100% accuracy. Hence our data
sources could consist of a little amount of data error and    After extracting, clearing and transformation data must
anyway there is no way of determining this.                   be loaded into the warehouse. Additional pre-processing
                                                              such as checking integrity constrains, sorting
                                                              summarization, aggregation and other computation may
still be required to build the warehouse tables. Typically               Just as relational query language like SQL can be used to
batch load utilities are used for this purpose. In addition              specify relational queries. A data mining query language
to populating the warehouse, a load utility must allow the               can be used to specify relational queries. We examine an
system administrator to monitor status so that in case of a              SQL based data mining query language called DMQL,
failure operation such as cancel, suspend and resume a                   which contains language primitives for defining data
load, can be performed with no loss of integrity.                        warehouses. Data warehouses can be defined using two
                                                                         language primitives, one for cube definition and other
For our system data can be loaded yearly, after
                                                                         one for dimension definitions.
completion of their major operational works. For
example, A/L databases can be loaded after updating A/L
re-correction marks added, UGC selected students                           define cube cube_master [AL_Year, Exam_Index_No,
database can be updated after completing of filling of                          UGC_Serial_No, Uni_Reg_No, Faculty_Code,
vacancies.                                                                      Course_Code, Sch_Code, Dist_No] : stu_status=
                                                                                count(*)
3.2.5       Refresh
                                                                           define dimension Year as (AL_Year, Sp_Remarks)
Refreshing a warehouse consists of propagating updates
on source data to corresponding updates in the                             define dimension District as (Dist_No, Dist_Name,
warehouse. There are two sets of issues to consider:                            Population)
when to refresh and how to refresh, usually the                            define dimension Course as (Course_Code,
warehouse is refreshed periodically (daily or weekly)                           Course_Name, No_of_Vac)
only if some OLAP queries need current data, it is
necessary to propagate every update. The warehouse                         …….
administrator sets the refresh policy, depending on user                           Figure 5: part of DMQL statements
needs or traffic, of the data sources. In this education
decision support system all operational process are
completed annually.
                                                                         3.4      Performing Data Mining
                                                                         Data mining represents the next quantum step beyond the
                                                                         historical and aggregate-based world of information that
3.3         Star Schema                                                  the data warehouse makes available to users. Data
                                                                         mining allows organizations to collect vital information
The most popular model for a data warehouse is the star                  regarding business processes, customers, sales and
schema. Here a fact table and set of dimension tables are                marketing and arrange the information in such a fashion
identified. Multi-dimensional cubes are formed using                     as to allow business users to make predictive decisions
these tables. The star schema for our education system                   about what direction the business should focus its
is given in figure 4. It can be formally defined using the               resources. This advantage allows business decision
DMQL as in figure 5. The fact table is defined as a                      makers to “steer” the focus of an organization and
cube_master with the dimensions as the attributes (e.g.                  facilitate the continued success of the enterprise.
AL_Year) and facts (e.g. stu_status). Then each attribute
is defined as a dimension (e.g. Year).                                   Educational data too can be mined to find hidden
                                                                         knowledge for the decision marking process. One of the
   District            Year                                  Exam        major tasks is to review and examine the current policy
                                      Student Master
  Dist_No                                                Exam_Index_No   for the admission to the higher education institutes, i.e.
                     AL_Year            Fact Table
  Dist_Name                                              Dist_No(o)
  Population         Sp_Remarks                                          Present Quota System (Merit, District Under- Privilege
                                        AL_Year          Sub[4],
                                                         Gra[4]
                                                                         and Special intake quota system), and          to make
  Application                           Dist_No                          recommendations if any changes are considered
 UGC_Serial_No                      Exam_Index_No         School         necessary etc.
 Corr_Dist                          UGC_Serial_No         Sch_Code
 P_Address                                                Sch_Address
                                                                         In order to facilitate decision marking, the data in a data
                                       Sch_Code           Sch_Type       warehouse is organized around major subjects, such as
 Preference[10,40]
 M th t                              Faculty_Code         Year12S        year, district, course etc.. The data are stored to provide
                                                          Year12C        information from a historical perspective (such as the
  Faculty            Uni_stu          Course_Code
                                                          Year12A
  Uni_code                                                               past 5- 10 years) and typically summarized. The data
  Faculty       Reg_no                Uni_Reg_No          Year13S
  No of plac    Year1_st                                                 warehouse is usually modelled by a multidimensional
                                  Category,                              database structure, where each dimensions corresponds
                Year2_st
                                  status(m/d/u) Stu_st
                                                            Course
                Year3_st                                 Course_Code     to an attribute or a set of attributes in the schema and
                Year4_st                                 Course_Name
                Final_st
                                                                         each cell stores the value of some aggregate measure.
                                                         No_of_Vac


                       Figure 4: Star Schema
4.0       DATA CUBES                                         Province would have 285 students selected for Medicine
                                                             for the Year 2001 as in figure 8.
There are so many billons of data records in a data
warehouses. Analysis and query processing of such huge                                                         Sri Lanka
data warehouse is very difficult and time consuming task.     Country
Therefore maintaining multi-dimensional warehouse
servers will help to solve this problem. Multi decisional                                                                  ...
                                                              Province                           Western                           Sabaragamuwa
data cubes consists summary tables for all possible
decision marking activities. A possible data cube is                                               ...                                       ...
shown in figure 6.                                            District                  Colombo          Kalutara


                                                                                           ...                 ...                     ...          ...
                                                              Zone             Colombo
             …
   Course Vet                                                                     ...      ...           ...         ...         ...         ...   ...    ...
         Den                                                  School     De-la Salle College

     Med                                                                            Figure 7: Concept hierarchy
              208        228    211
  Colombo
                                                             Similarly for the location concept hierarchy the roll-up
                                                             operation could be used to aggregate the school data to
  Gampaha      51        67      68
                                                             zone or zone to district or even province to country.
District
               26        31      39
  Kalutara                                                                    …
                                                                 Course     Vet
      …                                                                   Den
                                                                       Med
             2001       2002 2003       …                                         285            326                 318
                                                                Western
                           Year
                                                              Province
                                                               Central             92            108                 96
                    Figure 6: A Data Cube
                                                               Southern
                                                                                  149            142              140
After defining the star schema we can create so many
cubs according to the requirements. For example as in
figure 6, a cube may represent the selected students as a
function of Year, District, and Courses (e.g. Year: 2001,
2002, 2003 etc., District: Colombo, Gampaha, Kalutara                           2001             2002 2003                             …
etc. and Courses: Medicine, Dental, Vet. Science etc. are                                           Year
the three dimensions with 208 students from Colombo
district selected for Medicine in year 2001).                                        Figure 8: Roll-up District

A concept hierarchy defines a sequence of mappings           4. 2 Drill-down
from a set of low-level concepts to more general higher-     The drill-down operation is the opposite of roll-up. It
level concepts. Using it data could be aggregated or         navigates from less detail data to more details. The data
disaggregated. Many concept hierarchies are implicit         cube of figure 6 can be drill-down using the location
within the database schema and a hierarchy could be          concept-hierarchy and hence districts can be
defined for the locations in the order of school < zone <    disaggregated into zones as shown in figure 9.
district < province < country (see figure 7). This allows
districts to be aggregated to provinces (roll-up) and well   4. 3 Slice and Dice
as districts to be disaggregated into zones (drill-down).
                                                             The slice operation performs a selection on one
4. 1 Roll-up                                                 dimension of the given cube, resulting is a sub-cube. For
                                                             example, we could select the course dimension and slice
The Roll-up operation corresponds to taking the current      for course medicine and view a sub-cube as shown in
data objects and doing further grouping by one of the        figure 10.
dimensions. The Roll-up operations performed on the
central cube by climbing up the concept of hierarchy of      The dice operations define a sub-cube by performing a
figure 7.Thus, it is possible to Roll-up admission data by   selection of one or more dimensions. For example, three
grouping districts into provinces. Hence Western             dimension dice for courses “Medicine” and Dental” for
districts “Colombo” and “Jaffna” for year 2001 and 2002     4. 4 Pivot (Rotate)
could be as shown in figure 11. Using this approach it is
possible to focus on specific areas such as district or     Pivot is a visualisation operation that rotates the data
course and hence make appropriate decisions.                axes in view in order to provide an alternative
                                                            presentation of the data. It allows visualisation of the
                                                            other side of the dice.
              …
   Course   Vet
          Den                                               5.0      CONCLUSIONS
       Med
                                                            For a developing country like Sri Lanka, education is
                  170          173         158
  Colombo                                                   more relevant factor for its entire economy. To contribute
Zone                                                        to the above, making correct decisions and implementing
 Homagama          1            0           0               the correct policies are very essential. To achieve the
                                                            above task it is important to integrate the heterogeneous
 Sri Jayawar       12           7           6               data sources and use them to build a data warehouse and
 dhanapura                                                  data cubes.
                                                            Based on our preliminary work we have shown how
       …                                                    useful information for decision making could be
                                                            extracted from a data warehouse and how it in turn could
                 2001          2002 2003              …
                                  Year                      help to formulate the correct policies.
                                                            In this process we have highlighted the need to maintain
                   Figure 9: Drill-down District
                                                            graduate information. Further our selected examples were
                                                            based on district, year and course as the dimensions. It is
                                                            important to note that the star schema of figure 4 has four
                                                            more dimensions as well as it could be further expanded
                         208         228        211         by integrating data such as unemployment. Thus through
       Colombo
                                                            creating the appropriate cubes, we can extract knowledge
       Gampaha            51         67          68         necessary to achieve our objectives.

     District                                               We expect to further research this area before forwarding
                          26         31          39         our recommendations. We need to investigate the
       Kalutara
                                                            existence of hidden data patterns and for that we need to
                                                            apply data mining techniques and advanced queries.
           …

                         2001 2002 2003 …
                               Year                         6.0      ACKNOWLEDGMENTS
                                                            We acknowledge the support given by the departments of
       Figure 10: A Slice for Course Medicine
                                                            Education Examination and Census, UGC, Faculty of
                                                            Medicine, Universities and UCSC.


                                                            7.0      REFERENCES
           Course Den
               Med                                          1) Chaudhuri S., Dayal U. "An Overview of Data
                                            40
                            208            228                 Warehousing and OLAP Technology", SIGMOD
           Colombo
                                                               Record 26:1, Mar 1997, pp 65-74.
        District
               Jaffna          44          40               2) Chawatte S.S., Garcia-Molina H., Hammer J.,
                                                               Ireland K., Papakonstantinou Y., Ullman J., and
                           2001       2002                     Wisdom J., "The TSIMMIS Project: Integration of
                                    Year                       Heterogeneous Information Sources", Proc. of IPSJ
                                                               Conf., Tokyo, Japan, Oct. 1994, pp. 7-18.

                        Figure 11: A 3D Dice                3) Han J. & Kamber M., "Data Mining Concepts and
                                                               Techniques", Morgan Kaufmann, 2001.
4) Harinarayan V., Rajaraman A., Ulman J.D.
   "Implementing Data Cubes Efficiently", Proc of
   SIGMOD Record, 25:2, 1996, pp. 205-216.
5) MOTET, Report of analysis of the graduate
   registration forms, Ministry of Tertiary Education &
   Training, Sept. 2002.
6) UGC, Report of the committee appointed to review
   university admissions policy, May 1984.
7) UGC, Report of the committee appointed to review
   university admissions policy, Dec. 1987.
8) UGC, Admission Booklet – Academic Year 2003-
   2004, 2003.
9) UGC undergraduate statistics. http://www.ugc.ac.lk
10) Wisdom J., "Research Problems in Data
    Warehousing", Proc. 4th Intl., CIKM Conf., 1995.

More Related Content

What's hot

Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kambererror007
 
22830802 management-information-system
22830802 management-information-system22830802 management-information-system
22830802 management-information-systemarunakankshadixit
 
Data mining in telecommunications industry
Data mining in telecommunications industryData mining in telecommunications industry
Data mining in telecommunications industryIssa Memari
 
distributed Computing system model
distributed Computing system modeldistributed Computing system model
distributed Computing system modelHarshad Umredkar
 
Data Warehouse Architectures
Data Warehouse ArchitecturesData Warehouse Architectures
Data Warehouse ArchitecturesTheju Paul
 
Incident response methodology
Incident response methodologyIncident response methodology
Incident response methodologyPiyush Jain
 
Engenharia Social: Explorando a Vulnerabilidade Humana
Engenharia Social: Explorando a Vulnerabilidade HumanaEngenharia Social: Explorando a Vulnerabilidade Humana
Engenharia Social: Explorando a Vulnerabilidade HumanaRafael Magri Gedra
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture janani thirupathi
 
Chapter05 identifying and selecting systems development projects
Chapter05 identifying and selecting systems development projectsChapter05 identifying and selecting systems development projects
Chapter05 identifying and selecting systems development projectsDhani Ahmad
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 
System analysis and design chapter 2
System analysis and design chapter 2System analysis and design chapter 2
System analysis and design chapter 2Einrez Pugao
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technologyDataminingTools Inc
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysisPoonam Kshirsagar
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationSunderland City Council
 

What's hot (20)

Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
Information management
Information managementInformation management
Information management
 
22830802 management-information-system
22830802 management-information-system22830802 management-information-system
22830802 management-information-system
 
Sadchap04
Sadchap04Sadchap04
Sadchap04
 
Data mining in telecommunications industry
Data mining in telecommunications industryData mining in telecommunications industry
Data mining in telecommunications industry
 
distributed Computing system model
distributed Computing system modeldistributed Computing system model
distributed Computing system model
 
Data Warehouse Architectures
Data Warehouse ArchitecturesData Warehouse Architectures
Data Warehouse Architectures
 
Incident response methodology
Incident response methodologyIncident response methodology
Incident response methodology
 
Engenharia Social: Explorando a Vulnerabilidade Humana
Engenharia Social: Explorando a Vulnerabilidade HumanaEngenharia Social: Explorando a Vulnerabilidade Humana
Engenharia Social: Explorando a Vulnerabilidade Humana
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture
 
Middleware
MiddlewareMiddleware
Middleware
 
Chapter05 identifying and selecting systems development projects
Chapter05 identifying and selecting systems development projectsChapter05 identifying and selecting systems development projects
Chapter05 identifying and selecting systems development projects
 
Web content mining
Web content miningWeb content mining
Web content mining
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
System analysis and design chapter 2
System analysis and design chapter 2System analysis and design chapter 2
System analysis and design chapter 2
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data Visualisation
 
Data Mining
Data MiningData Mining
Data Mining
 
Web mining
Web mining Web mining
Web mining
 

Viewers also liked

Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...DATAVERSITY
 
Data Mining and Text Mining in Educational Research
Data Mining and Text Mining in Educational ResearchData Mining and Text Mining in Educational Research
Data Mining and Text Mining in Educational ResearchQiang Hao
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Data Warehousing - in the real world
Data Warehousing - in the real worldData Warehousing - in the real world
Data Warehousing - in the real worldukc4
 
Social media and it's use in disease surveillance
Social media and it's use in disease surveillanceSocial media and it's use in disease surveillance
Social media and it's use in disease surveillanceDan Aronne
 
Mastering in Data Warehousing and Business Intelligence
Mastering in Data Warehousing and Business IntelligenceMastering in Data Warehousing and Business Intelligence
Mastering in Data Warehousing and Business IntelligenceEdureka!
 
Izobrazevanje za data-mining
Izobrazevanje za data-miningIzobrazevanje za data-mining
Izobrazevanje za data-miningbutest
 
Economic growth china
Economic growth chinaEconomic growth china
Economic growth chinaAda Alipaj
 
Educational Data Mining in relation to education statistics of Nepal
Educational Data Mining in relation to education statistics of NepalEducational Data Mining in relation to education statistics of Nepal
Educational Data Mining in relation to education statistics of NepalRaj Subit
 
Why Your Healthcare Business Intelligence Strategy Can't Win
Why Your Healthcare Business Intelligence Strategy Can't WinWhy Your Healthcare Business Intelligence Strategy Can't Win
Why Your Healthcare Business Intelligence Strategy Can't WinHealth Catalyst
 
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
Learning Analytics in Education:  Using Student’s Big Data to Improve TeachingLearning Analytics in Education:  Using Student’s Big Data to Improve Teaching
Learning Analytics in Education: Using Student’s Big Data to Improve TeachingRafael Scapin, Ph.D.
 
Distributed database management systems
Distributed database management systemsDistributed database management systems
Distributed database management systemsDhani Ahmad
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data miningDataminingTools Inc
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 
Components of a Data-Warehouse
Components of a Data-WarehouseComponents of a Data-Warehouse
Components of a Data-WarehouseAbdul Aslam
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 

Viewers also liked (18)

Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
 
Data Mining and Text Mining in Educational Research
Data Mining and Text Mining in Educational ResearchData Mining and Text Mining in Educational Research
Data Mining and Text Mining in Educational Research
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Data Warehousing - in the real world
Data Warehousing - in the real worldData Warehousing - in the real world
Data Warehousing - in the real world
 
Using Twitter Data to Predict Flu Outbreak
Using Twitter Data to Predict Flu OutbreakUsing Twitter Data to Predict Flu Outbreak
Using Twitter Data to Predict Flu Outbreak
 
Social media and it's use in disease surveillance
Social media and it's use in disease surveillanceSocial media and it's use in disease surveillance
Social media and it's use in disease surveillance
 
Mastering in Data Warehousing and Business Intelligence
Mastering in Data Warehousing and Business IntelligenceMastering in Data Warehousing and Business Intelligence
Mastering in Data Warehousing and Business Intelligence
 
Izobrazevanje za data-mining
Izobrazevanje za data-miningIzobrazevanje za data-mining
Izobrazevanje za data-mining
 
Economic growth china
Economic growth chinaEconomic growth china
Economic growth china
 
Educational Data Mining in relation to education statistics of Nepal
Educational Data Mining in relation to education statistics of NepalEducational Data Mining in relation to education statistics of Nepal
Educational Data Mining in relation to education statistics of Nepal
 
Why Your Healthcare Business Intelligence Strategy Can't Win
Why Your Healthcare Business Intelligence Strategy Can't WinWhy Your Healthcare Business Intelligence Strategy Can't Win
Why Your Healthcare Business Intelligence Strategy Can't Win
 
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
Learning Analytics in Education:  Using Student’s Big Data to Improve TeachingLearning Analytics in Education:  Using Student’s Big Data to Improve Teaching
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
 
Distributed database management systems
Distributed database management systemsDistributed database management systems
Distributed database management systems
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Components of a Data-Warehouse
Components of a Data-WarehouseComponents of a Data-Warehouse
Components of a Data-Warehouse
 
RESULT MINING: ANALYSIS OF DATA MINING TECHNIQUES IN EDUCATION
RESULT MINING: ANALYSIS OF DATA MINING TECHNIQUES IN EDUCATIONRESULT MINING: ANALYSIS OF DATA MINING TECHNIQUES IN EDUCATION
RESULT MINING: ANALYSIS OF DATA MINING TECHNIQUES IN EDUCATION
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 

Similar to Application of Data Warehousing & Data Mining to Exploitation for Supporting the Planning of Higher Education System in Sri Lanka

Design and implementation of the web (extract, transform, load) process in da...
Design and implementation of the web (extract, transform, load) process in da...Design and implementation of the web (extract, transform, load) process in da...
Design and implementation of the web (extract, transform, load) process in da...IAESIJAI
 
Advances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing TechnologyAdvances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing TechnologyKate Campbell
 
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseA Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseRidwan Fadjar
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingsumit621
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse conceptsobieefans
 
Dwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousingDwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousingDhilsath Fathima
 
MODERN DATA PIPELINE
MODERN DATA PIPELINEMODERN DATA PIPELINE
MODERN DATA PIPELINEIRJET Journal
 
The Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their UsageThe Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their UsageIRJET Journal
 
An Empirical Study of the Applications of Classification Techniques in Studen...
An Empirical Study of the Applications of Classification Techniques in Studen...An Empirical Study of the Applications of Classification Techniques in Studen...
An Empirical Study of the Applications of Classification Techniques in Studen...IJERA Editor
 
Olap, oltp and data mining
Olap, oltp and data miningOlap, oltp and data mining
Olap, oltp and data miningzafrii
 
Designing a Framework to Standardize Data Warehouse Development Process for E...
Designing a Framework to Standardize Data Warehouse Development Process for E...Designing a Framework to Standardize Data Warehouse Development Process for E...
Designing a Framework to Standardize Data Warehouse Development Process for E...ijdms
 

Similar to Application of Data Warehousing & Data Mining to Exploitation for Supporting the Planning of Higher Education System in Sri Lanka (20)

Issue in Data warehousing and OLAP in E-business
Issue in Data warehousing and OLAP in E-businessIssue in Data warehousing and OLAP in E-business
Issue in Data warehousing and OLAP in E-business
 
Design and implementation of the web (extract, transform, load) process in da...
Design and implementation of the web (extract, transform, load) process in da...Design and implementation of the web (extract, transform, load) process in da...
Design and implementation of the web (extract, transform, load) process in da...
 
Dwbasics
DwbasicsDwbasics
Dwbasics
 
Advances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing TechnologyAdvances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing Technology
 
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseA Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
 
Advanced Database System
Advanced Database SystemAdvanced Database System
Advanced Database System
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
9. Data Warehousing & Mining.pptx
9. Data Warehousing & Mining.pptx9. Data Warehousing & Mining.pptx
9. Data Warehousing & Mining.pptx
 
Isas report
Isas reportIsas report
Isas report
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 
Dwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousingDwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousing
 
MODERN DATA PIPELINE
MODERN DATA PIPELINEMODERN DATA PIPELINE
MODERN DATA PIPELINE
 
[IJET-V1I5P5] Authors: T.Jalaja, M.Shailaja
[IJET-V1I5P5] Authors: T.Jalaja, M.Shailaja[IJET-V1I5P5] Authors: T.Jalaja, M.Shailaja
[IJET-V1I5P5] Authors: T.Jalaja, M.Shailaja
 
The Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their UsageThe Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their Usage
 
An Empirical Study of the Applications of Classification Techniques in Studen...
An Empirical Study of the Applications of Classification Techniques in Studen...An Empirical Study of the Applications of Classification Techniques in Studen...
An Empirical Study of the Applications of Classification Techniques in Studen...
 
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
 
MARAT ANALYSIS
MARAT ANALYSISMARAT ANALYSIS
MARAT ANALYSIS
 
Olap, oltp and data mining
Olap, oltp and data miningOlap, oltp and data mining
Olap, oltp and data mining
 
Designing a Framework to Standardize Data Warehouse Development Process for E...
Designing a Framework to Standardize Data Warehouse Development Process for E...Designing a Framework to Standardize Data Warehouse Development Process for E...
Designing a Framework to Standardize Data Warehouse Development Process for E...
 

More from Gihan Wikramanayake

Using ICT to Promote Learning in a Medical Faculty
Using ICT to Promote Learning in a Medical FacultyUsing ICT to Promote Learning in a Medical Faculty
Using ICT to Promote Learning in a Medical FacultyGihan Wikramanayake
 
Evaluation of English and IT skills of new entrants to Sri Lankan universities
Evaluation of English and IT skills of new entrants to Sri Lankan universitiesEvaluation of English and IT skills of new entrants to Sri Lankan universities
Evaluation of English and IT skills of new entrants to Sri Lankan universitiesGihan Wikramanayake
 
Broadcasting Technology: Overview
Broadcasting  Technology: OverviewBroadcasting  Technology: Overview
Broadcasting Technology: OverviewGihan Wikramanayake
 
Importance of Information Technology for Sports
Importance of Information Technology for SportsImportance of Information Technology for Sports
Importance of Information Technology for SportsGihan Wikramanayake
 
Improving student learning through assessment for learning using social media...
Improving student learning through assessment for learning using social media...Improving student learning through assessment for learning using social media...
Improving student learning through assessment for learning using social media...Gihan Wikramanayake
 
Exploiting Tourism through Data Warehousing
Exploiting Tourism through Data WarehousingExploiting Tourism through Data Warehousing
Exploiting Tourism through Data WarehousingGihan Wikramanayake
 
Speaker Search and Indexing for Multimedia Databases
Speaker Search and Indexing for Multimedia DatabasesSpeaker Search and Indexing for Multimedia Databases
Speaker Search and Indexing for Multimedia DatabasesGihan Wikramanayake
 
Authropometry of Sri Lankan Sportsmen and Sportswomen, with Special Reference...
Authropometry of Sri Lankan Sportsmen and Sportswomen, with Special Reference...Authropometry of Sri Lankan Sportsmen and Sportswomen, with Special Reference...
Authropometry of Sri Lankan Sportsmen and Sportswomen, with Special Reference...Gihan Wikramanayake
 
Analysis of Multiple Choice Question Papers with Special Reference to those s...
Analysis of Multiple Choice Question Papers with Special Reference to those s...Analysis of Multiple Choice Question Papers with Special Reference to those s...
Analysis of Multiple Choice Question Papers with Special Reference to those s...Gihan Wikramanayake
 
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy DatabasesAssisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy DatabasesGihan Wikramanayake
 
ICT ප්‍රාරම්භක ඩිප්ලෝමා පාඨමාලාව දිනමිණ, පරිගණක දැනුම
ICT ප්‍රාරම්භක ඩිප්ලෝමා පාඨමාලාව   දිනමිණ, පරිගණක දැනුමICT ප්‍රාරම්භක ඩිප්ලෝමා පාඨමාලාව   දිනමිණ, පරිගණක දැනුම
ICT ප්‍රාරම්භක ඩිප්ලෝමා පාඨමාලාව දිනමිණ, පරිගණක දැනුමGihan Wikramanayake
 
වෘත්තීය අවස්ථා වැඩි පරිගණක ක්ෂේත‍්‍රය දිනමිණ, පරිගණක දැනුම
වෘත්තීය අවස්ථා වැඩි පරිගණක ක්ෂේත‍්‍රය   දිනමිණ, පරිගණක දැනුමවෘත්තීය අවස්ථා වැඩි පරිගණක ක්ෂේත‍්‍රය   දිනමිණ, පරිගණක දැනුම
වෘත්තීය අවස්ථා වැඩි පරිගණක ක්ෂේත‍්‍රය දිනමිණ, පරිගණක දැනුමGihan Wikramanayake
 
පරිගණක ක්ෂේත‍්‍රයේ වෘත්තීය අවස්ථා දිනමිණ, පරිගණක දැනුම
පරිගණක ක්ෂේත‍්‍රයේ වෘත්තීය අවස්ථා   දිනමිණ, පරිගණක දැනුමපරිගණක ක්ෂේත‍්‍රයේ වෘත්තීය අවස්ථා   දිනමිණ, පරිගණක දැනුම
පරිගණක ක්ෂේත‍්‍රයේ වෘත්තීය අවස්ථා දිනමිණ, පරිගණක දැනුමGihan Wikramanayake
 
Balanced Scorecard and its relationship to UMM
Balanced Scorecard and its relationship to UMMBalanced Scorecard and its relationship to UMM
Balanced Scorecard and its relationship to UMMGihan Wikramanayake
 
Web Usage Mining based on Heuristics: Drawbacks
Web Usage Mining based on Heuristics: DrawbacksWeb Usage Mining based on Heuristics: Drawbacks
Web Usage Mining based on Heuristics: DrawbacksGihan Wikramanayake
 
Evolving and Migrating Relational Legacy Databases
Evolving and Migrating Relational Legacy DatabasesEvolving and Migrating Relational Legacy Databases
Evolving and Migrating Relational Legacy DatabasesGihan Wikramanayake
 
Re-Engineering Databases using Meta-Programming Technology
Re-Engineering Databases using Meta-Programming TechnologyRe-Engineering Databases using Meta-Programming Technology
Re-Engineering Databases using Meta-Programming TechnologyGihan Wikramanayake
 

More from Gihan Wikramanayake (20)

Using ICT to Promote Learning in a Medical Faculty
Using ICT to Promote Learning in a Medical FacultyUsing ICT to Promote Learning in a Medical Faculty
Using ICT to Promote Learning in a Medical Faculty
 
Evaluation of English and IT skills of new entrants to Sri Lankan universities
Evaluation of English and IT skills of new entrants to Sri Lankan universitiesEvaluation of English and IT skills of new entrants to Sri Lankan universities
Evaluation of English and IT skills of new entrants to Sri Lankan universities
 
Learning beyond the classroom
Learning beyond the classroomLearning beyond the classroom
Learning beyond the classroom
 
Broadcasting Technology: Overview
Broadcasting  Technology: OverviewBroadcasting  Technology: Overview
Broadcasting Technology: Overview
 
Importance of Information Technology for Sports
Importance of Information Technology for SportsImportance of Information Technology for Sports
Importance of Information Technology for Sports
 
Improving student learning through assessment for learning using social media...
Improving student learning through assessment for learning using social media...Improving student learning through assessment for learning using social media...
Improving student learning through assessment for learning using social media...
 
Exploiting Tourism through Data Warehousing
Exploiting Tourism through Data WarehousingExploiting Tourism through Data Warehousing
Exploiting Tourism through Data Warehousing
 
Speaker Search and Indexing for Multimedia Databases
Speaker Search and Indexing for Multimedia DatabasesSpeaker Search and Indexing for Multimedia Databases
Speaker Search and Indexing for Multimedia Databases
 
Authropometry of Sri Lankan Sportsmen and Sportswomen, with Special Reference...
Authropometry of Sri Lankan Sportsmen and Sportswomen, with Special Reference...Authropometry of Sri Lankan Sportsmen and Sportswomen, with Special Reference...
Authropometry of Sri Lankan Sportsmen and Sportswomen, with Special Reference...
 
Analysis of Multiple Choice Question Papers with Special Reference to those s...
Analysis of Multiple Choice Question Papers with Special Reference to those s...Analysis of Multiple Choice Question Papers with Special Reference to those s...
Analysis of Multiple Choice Question Papers with Special Reference to those s...
 
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy DatabasesAssisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
 
ICT ප්‍රාරම්භක ඩිප්ලෝමා පාඨමාලාව දිනමිණ, පරිගණක දැනුම
ICT ප්‍රාරම්භක ඩිප්ලෝමා පාඨමාලාව   දිනමිණ, පරිගණක දැනුමICT ප්‍රාරම්භක ඩිප්ලෝමා පාඨමාලාව   දිනමිණ, පරිගණක දැනුම
ICT ප්‍රාරම්භක ඩිප්ලෝමා පාඨමාලාව දිනමිණ, පරිගණක දැනුම
 
වෘත්තීය අවස්ථා වැඩි පරිගණක ක්ෂේත‍්‍රය දිනමිණ, පරිගණක දැනුම
වෘත්තීය අවස්ථා වැඩි පරිගණක ක්ෂේත‍්‍රය   දිනමිණ, පරිගණක දැනුමවෘත්තීය අවස්ථා වැඩි පරිගණක ක්ෂේත‍්‍රය   දිනමිණ, පරිගණක දැනුම
වෘත්තීය අවස්ථා වැඩි පරිගණක ක්ෂේත‍්‍රය දිනමිණ, පරිගණක දැනුම
 
පරිගණක ක්ෂේත‍්‍රයේ වෘත්තීය අවස්ථා දිනමිණ, පරිගණක දැනුම
පරිගණක ක්ෂේත‍්‍රයේ වෘත්තීය අවස්ථා   දිනමිණ, පරිගණක දැනුමපරිගණක ක්ෂේත‍්‍රයේ වෘත්තීය අවස්ථා   දිනමිණ, පරිගණක දැනුම
පරිගණක ක්ෂේත‍්‍රයේ වෘත්තීය අවස්ථා දිනමිණ, පරිගණක දැනුම
 
Producing Employable Graduates
Producing Employable GraduatesProducing Employable Graduates
Producing Employable Graduates
 
Balanced Scorecard and its relationship to UMM
Balanced Scorecard and its relationship to UMMBalanced Scorecard and its relationship to UMM
Balanced Scorecard and its relationship to UMM
 
An SMS-Email Reader
An SMS-Email ReaderAn SMS-Email Reader
An SMS-Email Reader
 
Web Usage Mining based on Heuristics: Drawbacks
Web Usage Mining based on Heuristics: DrawbacksWeb Usage Mining based on Heuristics: Drawbacks
Web Usage Mining based on Heuristics: Drawbacks
 
Evolving and Migrating Relational Legacy Databases
Evolving and Migrating Relational Legacy DatabasesEvolving and Migrating Relational Legacy Databases
Evolving and Migrating Relational Legacy Databases
 
Re-Engineering Databases using Meta-Programming Technology
Re-Engineering Databases using Meta-Programming TechnologyRe-Engineering Databases using Meta-Programming Technology
Re-Engineering Databases using Meta-Programming Technology
 

Recently uploaded

Alamkara theory by Bhamaha Indian Poetics (1).pptx
Alamkara theory by Bhamaha Indian Poetics (1).pptxAlamkara theory by Bhamaha Indian Poetics (1).pptx
Alamkara theory by Bhamaha Indian Poetics (1).pptxDhatriParmar
 
ASTRINGENTS.pdf Pharmacognosy chapter 5 diploma in Pharmacy
ASTRINGENTS.pdf Pharmacognosy chapter 5 diploma in PharmacyASTRINGENTS.pdf Pharmacognosy chapter 5 diploma in Pharmacy
ASTRINGENTS.pdf Pharmacognosy chapter 5 diploma in PharmacySumit Tiwari
 
AUDIENCE THEORY - PARTICIPATORY - JENKINS.pptx
AUDIENCE THEORY - PARTICIPATORY - JENKINS.pptxAUDIENCE THEORY - PARTICIPATORY - JENKINS.pptx
AUDIENCE THEORY - PARTICIPATORY - JENKINS.pptxiammrhaywood
 
PHARMACOGNOSY CHAPTER NO 5 CARMINATIVES AND G.pdf
PHARMACOGNOSY CHAPTER NO 5 CARMINATIVES AND G.pdfPHARMACOGNOSY CHAPTER NO 5 CARMINATIVES AND G.pdf
PHARMACOGNOSY CHAPTER NO 5 CARMINATIVES AND G.pdfSumit Tiwari
 
Pharmacology chapter No 7 full notes.pdf
Pharmacology chapter No 7 full notes.pdfPharmacology chapter No 7 full notes.pdf
Pharmacology chapter No 7 full notes.pdfSumit Tiwari
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - HK2 (...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - HK2 (...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - HK2 (...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - HK2 (...Nguyen Thanh Tu Collection
 
Quantitative research methodology and survey design
Quantitative research methodology and survey designQuantitative research methodology and survey design
Quantitative research methodology and survey designBalelaBoru
 
LEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced StudLEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced StudDr. Bruce A. Johnson
 
DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...
DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...
DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...AKSHAYMAGAR17
 
2024.03.16 How to write better quality materials for your learners ELTABB San...
2024.03.16 How to write better quality materials for your learners ELTABB San...2024.03.16 How to write better quality materials for your learners ELTABB San...
2024.03.16 How to write better quality materials for your learners ELTABB San...Sandy Millin
 
Riti theory by Vamana Indian poetics.pptx
Riti theory by Vamana Indian poetics.pptxRiti theory by Vamana Indian poetics.pptx
Riti theory by Vamana Indian poetics.pptxDhatriParmar
 
3.14.24 Gender Discrimination and Gender Inequity.pptx
3.14.24 Gender Discrimination and Gender Inequity.pptx3.14.24 Gender Discrimination and Gender Inequity.pptx
3.14.24 Gender Discrimination and Gender Inequity.pptxmary850239
 
Metabolism , Metabolic Fate& disorders of cholesterol.pptx
Metabolism , Metabolic Fate& disorders of cholesterol.pptxMetabolism , Metabolic Fate& disorders of cholesterol.pptx
Metabolism , Metabolic Fate& disorders of cholesterol.pptxDr. Santhosh Kumar. N
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...Nguyen Thanh Tu Collection
 
3.12.24 Freedom Summer in Mississippi.pptx
3.12.24 Freedom Summer in Mississippi.pptx3.12.24 Freedom Summer in Mississippi.pptx
3.12.24 Freedom Summer in Mississippi.pptxmary850239
 
3.12.24 The Social Construction of Gender.pptx
3.12.24 The Social Construction of Gender.pptx3.12.24 The Social Construction of Gender.pptx
3.12.24 The Social Construction of Gender.pptxmary850239
 
How to Customise Quotation's Appearance Using PDF Quote Builder in Odoo 17
How to Customise Quotation's Appearance Using PDF Quote Builder in Odoo 17How to Customise Quotation's Appearance Using PDF Quote Builder in Odoo 17
How to Customise Quotation's Appearance Using PDF Quote Builder in Odoo 17Celine George
 
BBA 205 BUSINESS ENVIRONMENT UNIT I.pptx
BBA 205 BUSINESS ENVIRONMENT UNIT I.pptxBBA 205 BUSINESS ENVIRONMENT UNIT I.pptx
BBA 205 BUSINESS ENVIRONMENT UNIT I.pptxProf. Kanchan Kumari
 
Material Remains as Source of Ancient Indian History & Culture.ppt
Material Remains as Source of Ancient Indian History & Culture.pptMaterial Remains as Source of Ancient Indian History & Culture.ppt
Material Remains as Source of Ancient Indian History & Culture.pptBanaras Hindu University
 
LEAD5623 The Economics of Community Coll
LEAD5623 The Economics of Community CollLEAD5623 The Economics of Community Coll
LEAD5623 The Economics of Community CollDr. Bruce A. Johnson
 

Recently uploaded (20)

Alamkara theory by Bhamaha Indian Poetics (1).pptx
Alamkara theory by Bhamaha Indian Poetics (1).pptxAlamkara theory by Bhamaha Indian Poetics (1).pptx
Alamkara theory by Bhamaha Indian Poetics (1).pptx
 
ASTRINGENTS.pdf Pharmacognosy chapter 5 diploma in Pharmacy
ASTRINGENTS.pdf Pharmacognosy chapter 5 diploma in PharmacyASTRINGENTS.pdf Pharmacognosy chapter 5 diploma in Pharmacy
ASTRINGENTS.pdf Pharmacognosy chapter 5 diploma in Pharmacy
 
AUDIENCE THEORY - PARTICIPATORY - JENKINS.pptx
AUDIENCE THEORY - PARTICIPATORY - JENKINS.pptxAUDIENCE THEORY - PARTICIPATORY - JENKINS.pptx
AUDIENCE THEORY - PARTICIPATORY - JENKINS.pptx
 
PHARMACOGNOSY CHAPTER NO 5 CARMINATIVES AND G.pdf
PHARMACOGNOSY CHAPTER NO 5 CARMINATIVES AND G.pdfPHARMACOGNOSY CHAPTER NO 5 CARMINATIVES AND G.pdf
PHARMACOGNOSY CHAPTER NO 5 CARMINATIVES AND G.pdf
 
Pharmacology chapter No 7 full notes.pdf
Pharmacology chapter No 7 full notes.pdfPharmacology chapter No 7 full notes.pdf
Pharmacology chapter No 7 full notes.pdf
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - HK2 (...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - HK2 (...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - HK2 (...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - HK2 (...
 
Quantitative research methodology and survey design
Quantitative research methodology and survey designQuantitative research methodology and survey design
Quantitative research methodology and survey design
 
LEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced StudLEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced Stud
 
DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...
DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...
DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...
 
2024.03.16 How to write better quality materials for your learners ELTABB San...
2024.03.16 How to write better quality materials for your learners ELTABB San...2024.03.16 How to write better quality materials for your learners ELTABB San...
2024.03.16 How to write better quality materials for your learners ELTABB San...
 
Riti theory by Vamana Indian poetics.pptx
Riti theory by Vamana Indian poetics.pptxRiti theory by Vamana Indian poetics.pptx
Riti theory by Vamana Indian poetics.pptx
 
3.14.24 Gender Discrimination and Gender Inequity.pptx
3.14.24 Gender Discrimination and Gender Inequity.pptx3.14.24 Gender Discrimination and Gender Inequity.pptx
3.14.24 Gender Discrimination and Gender Inequity.pptx
 
Metabolism , Metabolic Fate& disorders of cholesterol.pptx
Metabolism , Metabolic Fate& disorders of cholesterol.pptxMetabolism , Metabolic Fate& disorders of cholesterol.pptx
Metabolism , Metabolic Fate& disorders of cholesterol.pptx
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
 
3.12.24 Freedom Summer in Mississippi.pptx
3.12.24 Freedom Summer in Mississippi.pptx3.12.24 Freedom Summer in Mississippi.pptx
3.12.24 Freedom Summer in Mississippi.pptx
 
3.12.24 The Social Construction of Gender.pptx
3.12.24 The Social Construction of Gender.pptx3.12.24 The Social Construction of Gender.pptx
3.12.24 The Social Construction of Gender.pptx
 
How to Customise Quotation's Appearance Using PDF Quote Builder in Odoo 17
How to Customise Quotation's Appearance Using PDF Quote Builder in Odoo 17How to Customise Quotation's Appearance Using PDF Quote Builder in Odoo 17
How to Customise Quotation's Appearance Using PDF Quote Builder in Odoo 17
 
BBA 205 BUSINESS ENVIRONMENT UNIT I.pptx
BBA 205 BUSINESS ENVIRONMENT UNIT I.pptxBBA 205 BUSINESS ENVIRONMENT UNIT I.pptx
BBA 205 BUSINESS ENVIRONMENT UNIT I.pptx
 
Material Remains as Source of Ancient Indian History & Culture.ppt
Material Remains as Source of Ancient Indian History & Culture.pptMaterial Remains as Source of Ancient Indian History & Culture.ppt
Material Remains as Source of Ancient Indian History & Culture.ppt
 
LEAD5623 The Economics of Community Coll
LEAD5623 The Economics of Community CollLEAD5623 The Economics of Community Coll
LEAD5623 The Economics of Community Coll
 

Application of Data Warehousing & Data Mining to Exploitation for Supporting the Planning of Higher Education System in Sri Lanka

  • 1. Application of Data Warehousing & Data Mining to Exploitation for Supporting the Planning of Higher Education System in Sri Lanka M.G.N.A.S. Fernando and G.N. Wikramanayake University of Colombo School of Computing {nas, gnw}@ucsc.cmb.ac.lk ABSTRACT varies reasons [6, 7]. Although many changes have been The topic of data warehousing encompasses introduced to the education system the graduate architectures, algorithms, and tools for brining together unemployment problem have still not been resolved. We selected data from multiple databases or other feel the lack of integrated information for decision information sources into a single repository called a data making as one of the core reasons. Policy decisions made warehouse, suitable for direct querying and analysis. years ago are being carried forward due to political We have designed a data warehouse for the Sri Lanka pressures and lack of data to convince the need for such a education system and applied basic data mining change. techniques (i.e. data cleaning, data integration, data selection, data transformation, data mining, pattern One of the contributing factors is the mismatch between evaluation, knowledge representation) to support the state and private sector requirements for employment. decision making activities. For this we have built an Many students seeking alternative non-higher education integrated data warehouse consisting data from Dept. of paths have been successful than some of the state Examination, University Grants Commission, School graduates who have been ending up as unemployed. The Census data, national population data and University present education policies and the university selection student’s information. criteria could be contributing towards producing This paper highlights how the data warehouse was built unemployed graduates. for the Sri Lanka education system and how it was used This paper investigates how such issues could be to create data summary cubes for data analysis and addressed using basic data mining techniques. Our mining process. At present using this developed system, basic level of summaries and analysis can be performed investigation is based on designing a data warehouse and to obtain for decision support information. Further applying data mining techniques to obtain information applying data mining techniques and advanced queries, for decision making [10] and hence assist the policy we can obtain the necessary knowledge for policy makers. This process can be seen as in figure 1. marking as well. Pattern 1.0 INTRODUCTION Evaluation In Sri Lanka, there are limited opportunities to admission Data Mining to the state universities. From 40% to 50% students who Task-relevant pass the G.C.E. (A/L) examination only 13% to 17% Data students can enter to the state university [8]. Among Data Selection them although about 90% graduates annually [9], only Warehouse 27% are able to find employment [5]. Thus the state Data universities contribute towards increasing the Cleaning unemployment rate with 70% from the Arts Stream [5]. Data Integration University admission policy was reviewed in 1984 & Databases 1987 and some of the policies were implemented Figure 1: Data Mining Process successfully while others are yet not implemented due to
  • 2. 2.0 DATA WAREHOUSE Bottom tier Middle tier Top tier Monitor Data can now be stored in many different types of other & OLAP Server databases. One database architecture that has recently sources Administration emerged is data warehouse, a repository of multiple Analysis heterogeneous data sources, organized under a unified Operational Extract Query DBs Transform Data Output Reports schema at a single site in order to facilitate management Load Warehouse Refresh Data mining decision-making [2, 3]. Data warehouse technology includes data cleaning, data integrating, and on-line analytical processing (OLAP) that is, analysis techniques with functionalities such as summarization, consolidation and aggregation, as well as the ability to view information from different angles. Data Sources Data Storage OLAP Engine Front-End Tools A data warehouse is a “subject-oriented, integrated, time Figure 2: Three-tier Architecture variant, non-volatile collection of data that serves as a physical implementation of a decision support data Analysing and query processing of huge data warehouse model and stores the information on which an enterprise is very difficult and time-consuming task. Therefore needs to make strategic decisions. multi dimensional data cubes consisting of summary tables are created for all possible decision marking In data warehouses historical, summarized and activities [4]. The middle tier implements consolidated data is more important than detailed, multidimensional data and operations and an OLAP individual records. Since data warehouses contain server is typically used for that. The top tier is a client consolidated data, perhaps from several operational who uses front-end tools to obtain information. databases, over potentially long periods of time, they tend to be orders of magnitude larger than operational 2.4 Data Mining databases. The workloads are query intensive with mostly ad hoc, complex queries that can access millions Data mining is the process of applying intelligent of records and perform a lot of scans, joins, and methods to extract data patterns. This is done using the aggregates. Query throughput and response times are front-end tools. The spreadsheet is still the most more important than transaction throughput. compiling front-end application for OLAP. The challenges in supporting a query environment for OLAP 2.1 Multi-dimensional Data can be crudely summarized as that of supporting spreadsheet operation effectively over large multi- To facilitate complex analyses and visualization, the data gigabytes databases. Indeed, the essbase product of in a warehouse is typically modelled multi- Arbar cooperation uses Microsoft Excel as the front-end dimensionally. For example, in a sales data warehouse, tools for its multidimensional engine [1]. time of sale, sales district, salesperson, and product might be some of the dimensions of interest. Often, these 3.0 EDUCATION DATA WAREHOUSE dimensions are hierarchical, e.g. time of sale may be organized as a day-month-quarter-year hierarchy. 3.1 Data Sources 2.2 Data Analysis The Education Ministry conducts annual school census through their Zonal Education Offices. They have data Typically data analysis is done through OLAP operations related to number of students in different streams of such as rollup (increasing level of aggregation) and drill- study for each school. The Dept. of Examinations down (decreasing level of aggregation or increasing conducts all national examinations and hence they have details) along one or more dimension hierarchies, slice- about the year 5 scholarship examination, G.C.E. (O/L) and-dice (selection and projection) and pivot (re- and G.C.E. (A/L) examinations. The University Grants orienting the multidimensional view of data). Commission (UGC) process the university admissions and they have data related to the students admitted to 2.3 Data Warehouse Architecture each university for various streams of study. The Data warehouses are built using three-tier architecture as university admission process is based on the national shown in figure 2. They are constructed via process of population and this data is maintained by the Dept. of data cleaning, data transformation, data integration, data Census. Each university has the academic performances loading and periodic data refreshing. Data warehouse is a of their students. However an important source of stored under a unified schema, and it usually resides at a information of what the graduates are doing after single site. This is the bottom tier of the architecture and completing the degree is not maintained by anybody, is managed by a data warehouse server.
  • 3. except for time to time the government collects Usually missing data are identified at this stage and applications from unemployed graduates. necessary actions are taken. These data are either ignored, filled manually or automatically using a global The way the data is maintained it is not possible to link constraint such as unknown or infinity, mean value or this data to individuals such as how a person has probable value. We came across missing values such as performed from school time to university. However, year, university etc. for some of the unemployed data. characteristics identified at each stage could be linked to Presently we have opted for the ignore option to deal obtain valuable information such as industry acceptance with such cases. for an average rural student. Identifying noisy data and smoothing them is also part of 3.2 Design the Data Warehouses the data cleaning process. This may not apply for most The data of the above sources has to be cleaned and educational data. Detecting and correcting data transformed so that the different database structures inconsistencies is also part of data cleaning. could be integrated as shown in figure 3. Thereafter the 3.2.2 Data Integration data from the relevant data sources have to be loaded to the integrated data warehouse and later on periodically Data integration is the next important step. All our data updated through import data from the respective data sources use the data mainly to produce annual sources. summaries. Hence the data of different years are not checked for consistency. Also the data of different years need not have the same database structures. This makes the integration process over several years more difficult. 1 To update the data warehouse, same format should be consider, therefore after modifying the data structures 2 and after performing relevant processing we had to Extract Transform construct our data warehouse. 3 Load Data 3.2.3 Data Selection Refresh Warehouse 4 Analysis Query All the fields of the operational databases are not Reports Data necessary for the data warehouse. Most important fields mining for the decision marking activities are selected and 5 updated the warehouse. For an example All Island Ranks, Data Sources District Ranks are available in the Department of Examinations data source and it will not update the data Figure 3: Education Data Warehouse warehouse as these can be derived as well as the rank change if some does not apply. Details such as student Data warehousing systems use a variety of data details are not required for decision making. extraction and clearing tools, and refresh utilities for 3.2.4 Data Transformation populating warehouses. Data extraction from different location is usually implemented gateways and standard There were so many coding differences in the data interfaces (such as information builders SQL, ODBC, sources. Codes used for district, school etc. by the Oracle open connect Informix enterprise gateways). We different sources were quite different (e.g., in the census used extraction tool available in SQL Server. population, district code for Matale and Kandy were 4 and 5 respectively, while the Examination department 3.2.1 Data Cleaning and UGC had used Kandy and Matale as 4 and 5 A data warehouse is used for decision marking. It is respectively). Thus matching the structures only was not important that the data in the warehouse is accurate. As sufficient. Another common mismatch was the use of large volumes of data from multiple sources are involved different data types and field lengths (e.g. actual size of there is a high probability of errors and anomalies in the the district code is two digits but some sources had data. declared it as a string and some sources it declared as number with 8-digit length). We used SPSS to convert All our data sources had used internal controls to validate the different data format into one form as well as their data. Also data of the Dept. of Examination and recoding purposes. UGC has to ensure 100% accuracy. Hence our data sources could consist of a little amount of data error and After extracting, clearing and transformation data must anyway there is no way of determining this. be loaded into the warehouse. Additional pre-processing such as checking integrity constrains, sorting summarization, aggregation and other computation may
  • 4. still be required to build the warehouse tables. Typically Just as relational query language like SQL can be used to batch load utilities are used for this purpose. In addition specify relational queries. A data mining query language to populating the warehouse, a load utility must allow the can be used to specify relational queries. We examine an system administrator to monitor status so that in case of a SQL based data mining query language called DMQL, failure operation such as cancel, suspend and resume a which contains language primitives for defining data load, can be performed with no loss of integrity. warehouses. Data warehouses can be defined using two language primitives, one for cube definition and other For our system data can be loaded yearly, after one for dimension definitions. completion of their major operational works. For example, A/L databases can be loaded after updating A/L re-correction marks added, UGC selected students define cube cube_master [AL_Year, Exam_Index_No, database can be updated after completing of filling of UGC_Serial_No, Uni_Reg_No, Faculty_Code, vacancies. Course_Code, Sch_Code, Dist_No] : stu_status= count(*) 3.2.5 Refresh define dimension Year as (AL_Year, Sp_Remarks) Refreshing a warehouse consists of propagating updates on source data to corresponding updates in the define dimension District as (Dist_No, Dist_Name, warehouse. There are two sets of issues to consider: Population) when to refresh and how to refresh, usually the define dimension Course as (Course_Code, warehouse is refreshed periodically (daily or weekly) Course_Name, No_of_Vac) only if some OLAP queries need current data, it is necessary to propagate every update. The warehouse ……. administrator sets the refresh policy, depending on user Figure 5: part of DMQL statements needs or traffic, of the data sources. In this education decision support system all operational process are completed annually. 3.4 Performing Data Mining Data mining represents the next quantum step beyond the historical and aggregate-based world of information that 3.3 Star Schema the data warehouse makes available to users. Data mining allows organizations to collect vital information The most popular model for a data warehouse is the star regarding business processes, customers, sales and schema. Here a fact table and set of dimension tables are marketing and arrange the information in such a fashion identified. Multi-dimensional cubes are formed using as to allow business users to make predictive decisions these tables. The star schema for our education system about what direction the business should focus its is given in figure 4. It can be formally defined using the resources. This advantage allows business decision DMQL as in figure 5. The fact table is defined as a makers to “steer” the focus of an organization and cube_master with the dimensions as the attributes (e.g. facilitate the continued success of the enterprise. AL_Year) and facts (e.g. stu_status). Then each attribute is defined as a dimension (e.g. Year). Educational data too can be mined to find hidden knowledge for the decision marking process. One of the District Year Exam major tasks is to review and examine the current policy Student Master Dist_No Exam_Index_No for the admission to the higher education institutes, i.e. AL_Year Fact Table Dist_Name Dist_No(o) Population Sp_Remarks Present Quota System (Merit, District Under- Privilege AL_Year Sub[4], Gra[4] and Special intake quota system), and to make Application Dist_No recommendations if any changes are considered UGC_Serial_No Exam_Index_No School necessary etc. Corr_Dist UGC_Serial_No Sch_Code P_Address Sch_Address In order to facilitate decision marking, the data in a data Sch_Code Sch_Type warehouse is organized around major subjects, such as Preference[10,40] M th t Faculty_Code Year12S year, district, course etc.. The data are stored to provide Year12C information from a historical perspective (such as the Faculty Uni_stu Course_Code Year12A Uni_code past 5- 10 years) and typically summarized. The data Faculty Reg_no Uni_Reg_No Year13S No of plac Year1_st warehouse is usually modelled by a multidimensional Category, database structure, where each dimensions corresponds Year2_st status(m/d/u) Stu_st Course Year3_st Course_Code to an attribute or a set of attributes in the schema and Year4_st Course_Name Final_st each cell stores the value of some aggregate measure. No_of_Vac Figure 4: Star Schema
  • 5. 4.0 DATA CUBES Province would have 285 students selected for Medicine for the Year 2001 as in figure 8. There are so many billons of data records in a data warehouses. Analysis and query processing of such huge Sri Lanka data warehouse is very difficult and time consuming task. Country Therefore maintaining multi-dimensional warehouse servers will help to solve this problem. Multi decisional ... Province Western Sabaragamuwa data cubes consists summary tables for all possible decision marking activities. A possible data cube is ... ... shown in figure 6. District Colombo Kalutara ... ... ... ... Zone Colombo … Course Vet ... ... ... ... ... ... ... ... Den School De-la Salle College Med Figure 7: Concept hierarchy 208 228 211 Colombo Similarly for the location concept hierarchy the roll-up operation could be used to aggregate the school data to Gampaha 51 67 68 zone or zone to district or even province to country. District 26 31 39 Kalutara … Course Vet … Den Med 2001 2002 2003 … 285 326 318 Western Year Province Central 92 108 96 Figure 6: A Data Cube Southern 149 142 140 After defining the star schema we can create so many cubs according to the requirements. For example as in figure 6, a cube may represent the selected students as a function of Year, District, and Courses (e.g. Year: 2001, 2002, 2003 etc., District: Colombo, Gampaha, Kalutara 2001 2002 2003 … etc. and Courses: Medicine, Dental, Vet. Science etc. are Year the three dimensions with 208 students from Colombo district selected for Medicine in year 2001). Figure 8: Roll-up District A concept hierarchy defines a sequence of mappings 4. 2 Drill-down from a set of low-level concepts to more general higher- The drill-down operation is the opposite of roll-up. It level concepts. Using it data could be aggregated or navigates from less detail data to more details. The data disaggregated. Many concept hierarchies are implicit cube of figure 6 can be drill-down using the location within the database schema and a hierarchy could be concept-hierarchy and hence districts can be defined for the locations in the order of school < zone < disaggregated into zones as shown in figure 9. district < province < country (see figure 7). This allows districts to be aggregated to provinces (roll-up) and well 4. 3 Slice and Dice as districts to be disaggregated into zones (drill-down). The slice operation performs a selection on one 4. 1 Roll-up dimension of the given cube, resulting is a sub-cube. For example, we could select the course dimension and slice The Roll-up operation corresponds to taking the current for course medicine and view a sub-cube as shown in data objects and doing further grouping by one of the figure 10. dimensions. The Roll-up operations performed on the central cube by climbing up the concept of hierarchy of The dice operations define a sub-cube by performing a figure 7.Thus, it is possible to Roll-up admission data by selection of one or more dimensions. For example, three grouping districts into provinces. Hence Western dimension dice for courses “Medicine” and Dental” for
  • 6. districts “Colombo” and “Jaffna” for year 2001 and 2002 4. 4 Pivot (Rotate) could be as shown in figure 11. Using this approach it is possible to focus on specific areas such as district or Pivot is a visualisation operation that rotates the data course and hence make appropriate decisions. axes in view in order to provide an alternative presentation of the data. It allows visualisation of the other side of the dice. … Course Vet Den 5.0 CONCLUSIONS Med For a developing country like Sri Lanka, education is 170 173 158 Colombo more relevant factor for its entire economy. To contribute Zone to the above, making correct decisions and implementing Homagama 1 0 0 the correct policies are very essential. To achieve the above task it is important to integrate the heterogeneous Sri Jayawar 12 7 6 data sources and use them to build a data warehouse and dhanapura data cubes. Based on our preliminary work we have shown how … useful information for decision making could be extracted from a data warehouse and how it in turn could 2001 2002 2003 … Year help to formulate the correct policies. In this process we have highlighted the need to maintain Figure 9: Drill-down District graduate information. Further our selected examples were based on district, year and course as the dimensions. It is important to note that the star schema of figure 4 has four more dimensions as well as it could be further expanded 208 228 211 by integrating data such as unemployment. Thus through Colombo creating the appropriate cubes, we can extract knowledge Gampaha 51 67 68 necessary to achieve our objectives. District We expect to further research this area before forwarding 26 31 39 our recommendations. We need to investigate the Kalutara existence of hidden data patterns and for that we need to apply data mining techniques and advanced queries. … 2001 2002 2003 … Year 6.0 ACKNOWLEDGMENTS We acknowledge the support given by the departments of Figure 10: A Slice for Course Medicine Education Examination and Census, UGC, Faculty of Medicine, Universities and UCSC. 7.0 REFERENCES Course Den Med 1) Chaudhuri S., Dayal U. "An Overview of Data 40 208 228 Warehousing and OLAP Technology", SIGMOD Colombo Record 26:1, Mar 1997, pp 65-74. District Jaffna 44 40 2) Chawatte S.S., Garcia-Molina H., Hammer J., Ireland K., Papakonstantinou Y., Ullman J., and 2001 2002 Wisdom J., "The TSIMMIS Project: Integration of Year Heterogeneous Information Sources", Proc. of IPSJ Conf., Tokyo, Japan, Oct. 1994, pp. 7-18. Figure 11: A 3D Dice 3) Han J. & Kamber M., "Data Mining Concepts and Techniques", Morgan Kaufmann, 2001.
  • 7. 4) Harinarayan V., Rajaraman A., Ulman J.D. "Implementing Data Cubes Efficiently", Proc of SIGMOD Record, 25:2, 1996, pp. 205-216. 5) MOTET, Report of analysis of the graduate registration forms, Ministry of Tertiary Education & Training, Sept. 2002. 6) UGC, Report of the committee appointed to review university admissions policy, May 1984. 7) UGC, Report of the committee appointed to review university admissions policy, Dec. 1987. 8) UGC, Admission Booklet – Academic Year 2003- 2004, 2003. 9) UGC undergraduate statistics. http://www.ugc.ac.lk 10) Wisdom J., "Research Problems in Data Warehousing", Proc. 4th Intl., CIKM Conf., 1995.