Published on

International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Nitin S.Kharat / International Journal of Engineering Research and Applications(IJERA) ISSN: 2248-9622 www.ijera.comVol. 3, Issue 3, May-Jun 2013, pp.119-122119 | P a g eDeciding the Correct Usage of Database Queries, Data Miningand OLAP in an ApplicationsNitin S.Kharat.B.E. (I.T.) M.E.(Computer Engg.)Department of Computer Engineering, University of Pune, Pune,Maharashtra,India.ABSTRACTIn the recent years, numbers of thestudies have been done on different techniques ofinformation retrieval. These retrieval techniquesincludes Database Queries, Data Mining andOnline Analytical Processing (OLAP).Theretrieved information is used for variouspurposes according to the differentrequirements. The retrieved information mightbe used for the purpose of Analysis, for thepurpose of various users behavior prediction orfor the purpose of Decision Support System(DSS).Now the confusion is that when to useDatabase Queries, when to use Data Mining andwhen to use Online Analytical Processing(OLAP).This paper elaborates the usage ofDatabase queries, Data Mining, OLAP accordingto the user’s purpose, requirements at theparticular instant. The paper also emphasized onperformance of these techniques withappropriate examples. The goal of paper is togive the better clue to the user about the usage oftechniques such as Database Queries, DataMining and OLAP in an application to get theinformation in an easy way with efficientperformance.Key Terms: Database Queries, Data Mining,Decision Support System (DSS), Online AnalyticalProcessing (OLAP)1. INTRODUCTIONIn today’s world because of the lack oftime number of the people avoids to go through thelarge volume of database. As a result the numbers ofvarious information retrieval techniques have beendeveloped. These information retrieval techniquesallows to retrieve the required ,interestedinformation from the large volume of databasewithin compact time and in an easy way that’s whythe number of people chooses these technique as asource of information retrieval. Such a techniquesincludes the Database Queries, Data Mining andOnline Analytical Processing (OLAP) .As we allknown these techniques plays an vital role inbusiness including Sales,Marketing,Finanace etc. forthe purpose of Sales forecasting, MarketingResearch and Finance Analysis. But the crucial skillis that utilizing appropriate techniqueat the appropriate time according to purpose,requirement, and performance is an importantscenario.This paper elaborates the usage of thesetechniques according to the requirement, purpose,and performance with appropriate examples in anefficient manner.The remaining of this paper is organized asfollows. Paper briefly review the related work insection 2, in section 3 described the usage of eachtechnique according to purpose, requirement andperformance, the future work and conclusion insection 4.2. RELATED WORKIn the previous research several studies [1,2, 3, 4] focused on the possibilities and combiningof OLAP and Data Mining.And also H.Zua [5] proposed and developedassociation rule mining with interesting pattern, thisapproach is called as Online Analytical Mining ofAssociation Rule.In paper [6] developed an approach that operates ontwo factors support and confidence of usersdemanded items. This technique identifies item setswhich are frequently accessed through the webpages on web site by particular user. The outputcontains list of customer associated with thatproduct.Dzeroski et al.[7] combines OLAP andoutput which proved that CUBE File outperformsas compared to the hierarchical clustering.The paper [8 ] also proved that the efficiency andcapability of integrating data mining into OLAP andOLAM systems .OLAM is an way for mining therequired data from large volume ofmultidimensional database.The paper [9] also gives an efficient way to convertthe XML schema to ROLAP which helps to speedup the execution time which results in improvedperformance of OLAP as compared to previousapproaches.In the paper [10] targeted to extract the previouslyunidentified information from large volume ofdatabase and supported the speed up process byautomatic OLAP schema generation whichultimately reduces the manual work and speed upthe performance.
  2. 2. Nitin S.Kharat / International Journal of Engineering Research and Applications(IJERA) ISSN: 2248-9622 www.ijera.comVol. 3, Issue 3, May-Jun 2013, pp.119-122120 | P a g e3. USAGE OF DIFFERENT TECHNIQUES3.1 Database queries:Basically Database Queries Composes of:.Database table which is representation ofmathematical relation that is set of items that sharethe certain attributes..Each table column represents an attribute ofrelation.In relational databases the tables are usually namedafter kind of they represent..Database queries allows the user to efficiently andeffectively to manipulate a database.3.1.1 When to Use:.When the user needs to retrieve the data fromdatabase table such a that the retrieved data mustsatisfies certain criteria defined by user..Suitable to use for the application which givesquery response as a small volume of data..When the User wants to retrieve the set of dataitems strictly fulfilling defined constraint..When the user needs a specific information or data.3.1.2 Example:Consider you have a following database:Table 1: Customer Database.If the user wants to retrieve the names whoseincome is greater than 3 LK and city is Pune thenhe should fireDatabase Queries as follows:. Select Name from customer where Income>3 LKand City=”Pune”.3.1.3 Performance:.Gives better performance when there are severalnumbers of the fulfilling criteria..Gives better performance when the retrieved sets ofitems are small in volume..It gives better performance against the specificretrieval of information as per user need..Gives better performance when user fires ad hocqueries to virtually any instance in a database.3.2 Data MiningBasically Data Mining Composes of:.Different models for the analysis purpose.Models can be viewed as high level summarizing ofunderlying data.Includes the different forms such as:-Decision Tree-Rule for classification task-Association Rule e.g. Market Basket Analysis-Clustering for Market Segmentation etc.3.2.1 When to Use:.It can be used when user wants to generate high-level, actionable summaries of data residing indatabase tables..When the user wants to builds the AssociationRules..When the database is too large for analysis..When user need to analyze the database in visualform..When the User wants to analyze dataset using moresimple form i.e. Models such as:-Decision Tree-Graphs-Histogram-Pie Chart etc.3.2.2 Example:Consider the same database:Table 2: Customer Database.Now here you can use Data Mining technique forbuilding the Association Rules for the DecisionSupport System (DSS)..The Association Rule for Income can be built asfollows:If Age>21 and City=”Mumbai” then Income> 3 LKORIf Vehicle=”BMW” and Total>3 LK thenIncome>3 LK.These Association Rules are very simple tounderstand for any user because it summarizesnumber of records from Customer database but ifwe use Database Queries then it will become a verycomplicated to the user to understand the abovescenario.3.2.3 Performance:.Gives the better performance than the DatabaseQueries over the large number of database foranalysis..Easy to understand as there are no more complexqueries. Association Rules gives analysis at a once glance..No need to analyze each and every query..Visual representation increases more effectivenessfor analysis.3.3 OLAP
  3. 3. Nitin S.Kharat / International Journal of Engineering Research and Applications(IJERA) ISSN: 2248-9622 www.ijera.comVol. 3, Issue 3, May-Jun 2013, pp.119-122121 | P a g eBasically OLAP (Online Analytical Processing)Composes of:.Summarizing the data before it is possible toexecute the queries..Summarization can be represented as cubes andsubcubes ..Cube is precalculated and preaggregated data.It allows reporting data, visualizing data andinteraction with views of data..It uses the star schema, snowflake schema whichconsists of fact table and dimension table.3.3.1 When to Use:.Used against the normalized set of database tablesto getthe set of items that fulfills the constraints on itsattribute value..When the user needs to be data collected fromdifferent tables i.e. Complex Joins..To speeds up the execution time because it usesROLAP..When user needs summarized data from detaileddata i.e. roll up..When the needs detailed data from summarizeddata i.e.Drill Down..When the user wants methodology for organizationof databases along the dimensions of a businessmaking the database more comprehensible..When the user needs to compare and contrastmeasures along the business dimension in real time.3.3.2 Example:Consider the same database:OLAP constructs the Star Schema(see Fig 1) forabove database to speed up the execution time asfollows:.The dimension tables give rise to the dimensions inthe pre-aggregated data cubes..The fact table relatesthe dimensions to each other and specifies themeasures which are to be aggregated..Here the measures are “dollar_total”, “sales_tax”,and shipping_charge”.Fig 1: Star Schema of the Database.Figure 2 shows a three-dimensional data cube pre-aggregated from the star schema.Data cubes can be seen as a compact representationof pre-computed query results..Essentially, each segment in a data cube representsa pre- computed query result to a particular querywithin a given star schema..The efficiency of cube querying allows the user tointeractively move from one segment in the cube toanother enabling the inspection of query results inreal time..Cube querying also allows the user to group andungroup segments, as well as project segmentsonto given dimensions..This corresponds to such OLAP operations as roll-ups, drill-downs, and slice-and-dice, respectively..Cube is represented as follows:Fig 2: 3-Dimensional Cube3.3.3 Performance:.Gives the better performance than Database queriesand Data mining when the data need to be collectedfrom different table’s i.e. over Complex Joins..Cube gives the better performance since segmentsinside the cubes are precalculated andpreaggregated.4. CONCLUSIONThis paper covers the correct usage ofinformation retrieval techniques such as DatabaseQueries, Data Mining and OLAP according to users’purpose, requirement and performance along withexamples. The paper also concluded that inDatabase Queries ad hoc queries are fired by user toaccess the specific information whereas in DataMining generally modeling is used to represent thehuge retrieved data while in OLAP, it allows theuser for real time gain to pre-aggregated measuresalong with dimension for more effective analysis.The future work can be enhanced in such away that one can integrate these different techniquesaccording to requirement, purpose and performancewhich is absolutely right for the application. If he isable to do so then it will lead to huge benefits totheir application as well as organization.
  4. 4. Nitin S.Kharat / International Journal of Engineering Research and Applications(IJERA) ISSN: 2248-9622 www.ijera.comVol. 3, Issue 3, May-Jun 2013, pp.119-122122 | P a g eREFERENCES[1]. S. Goil and A. Choudhary, “Highperformance OLAP and data mining onparallel computers,” Data Mining andKnowledge Discovery, vol. 1, no. 4, pp.391-417, Dec. 1997.[2]. S. Asghar, D. Alahakoon and A. Hsu,“Enhancing OLAP functionality usingselforganizing neural networks,” Neural,Parallel and Scientific Computations, vol.12, no. 1, pp. 1-20, March 2004.[3]. R. B. Messaoud, O. Boussaid and S.Rabaseda, “A new OLAP aggregationbased on the AHC technique,” in Proc. ofthe 7th ACM Int’l Workshop on DataWarehousing and OLAP (DOLAP), ACMNew York, 2004, pp. 65-72.[4]. J. Han, “Towards online analytical miningin large databases,” ACM SIGMODRecord, vol. 27, no. 1, pp. 97-107, March1998[5]. H. Zhu, “Online analytical mining ofassociation rules,” Master Thesis, SimonFraser University, 1998, pp. 1-117[6]. J. Fong, H. K. Wong and A. Fong, “Onlineanalytical mining Webpages ticksequences,” J. of Data Warehousing, vol.5, no. 4, pp. 59-67, 2000[7]. S. Dzeroski, D. Hristovski and B. Peterlin,“Using data mining and OLAP to discoverpatterns in a database of patients with Ychromosome deletions,” in Proc. AMIASymp., 2000, pp. 215–219.[8]. J. Seo, et. al., “Interactive color mosaic anddendrogram displays for signal/noiseoptimization in microarray data analysis,”in Proc. of the Intl. Conf. on Multimediaand Expo-Volume 3, 2003, pp. 461-464.[9] Sarbani Dasgupta, Soumya Sen, NabenduChaki; "A Framework To Convert XMLSchema to ROLAP"; Proc. of 2nd Intl.Conf. on Emerging Applications ofInformation Technology, 2011.[10] Muhammad Usman, Sohail Asghar, SimonFong”Data Mining and Automatic OLAPSchema Generation”.[11] Soumya Sen, Ranak Ghosh, DebanjaliPaul, NabenduChaki; "Integrating RelatedXML Data into MultipleData WarehouseSchemas"; Proc. of the First InternationalConference on Information TechnologyConvergence and Services (ITCS 2012),Bangalore, India.