Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Business intelligence and data warehousing


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Business intelligence and data warehousing

  1. 1. BUSINESS INTELLIGENCEAND DATA WAREHOUSING.Presented by:Vaishnavi Chigarapalle.
  2. 2. Agenda:• ID3 Algorithm.• WEKA.• Web Mining Applications for Business.• References.
  3. 3. ID3 Algorithm.
  4. 4. Overview:• What is ID3 ?• Decision Trees.• Simple example of Decision Trees.• ID3 Algorithm.• Problem.• Solution to the discussed problem.• Conclusion.
  5. 5. What is ID3 ?• ID3 Stands for Iterative Dichotomiser 3.• This is a mathematical algorithm for building Decision Trees from adataset.• Invented by J . Ross Quinlan in 1979.• Uses Information Theory invented by Shannon in 1948.• The algorithm attempts to create smallest possible decision tree fromtop down, with no backtracking.• ID3 is the precursor to the C4.5 algorithm.• This is typically used in machine learning and Natural LanguageProcessing Domains.
  6. 6. Decision trees• The tree consists of decision nodes and leaf nodes.• A decision node has two or more branches, each representing values for theattribute set.• A leaf node attribute produces a homogeneous result, which does notrequire additional classification testing.• Decision trees are produced by algorithms that identify various ways ofsplitting a data set into branch-like segments.• These segments form an inverted decision tree that originates with a rootnode at the top of the tree.
  7. 7. Simple Example of a Decision Tree.
  8. 8. ID3 Algorithm• First step involves creating a root node for the tree.• If all the examples turn out to be containing positive values then return thesingle-node r=tree root, with label „+‟.• If all the examples turn out to be containing negative values then return thesingle-node root, with label „-„.• If the number of predicting attributes is empty, then return the single nodetree root, with label being the most common value of the target attribute.• ElseA = Attribute that best classifies examples.Decision tree attribute for root that equals to A.For each possible value, vi, of A, Add a new tree branch below root, corresponding to the test A = vi.
  9. 9. ID3 Algorithm Let examples (vi), be the subset of examples that have the value vifor A. If examples (vi) is empty Then below this new branch add a leaf node with label equal to mostcommon target value in the examples.– Else below this new branch add the subtree ID3 (Examples(vi), Target_Attribute, Attributes-{A}).• End• Return Root.
  10. 10. Conclusion• ID3 attempts to make the shortest decision tree out of a set of learningdata, shortest is not always the best classification.• Requires learning data to have completely consistent patterns with nouncertainty.
  11. 11. WEKA(University of Waikato)
  12. 12. Overview• What is WEKA ?• WEKA GUI Chooser.• Data Mining with WEKA.• Problem.• Solution for the discussed problem.• Conclusion
  13. 13. What is WEKA ?• WEKA is an acronym for Waikato Analysis for Knowledge Analysis.• This is a popular suite of machine learning software written in Java.• This is developed at University of Waikato, New Zealand.• WEKA is portable, since it is fully implemented in the Java programminglanguage and thus runs on almost any modern computing platform.• WEKA is free software available under the GNU General Public License.• WEKA‟s applications: Explorer. Knowledge Flow. Experimenter. Simple CLI.
  14. 14. WEKA GUI Chooser.
  15. 15. Data Mining With WEKAInput•Raw dataData Mining by WEKA•Pre-processing•Classification•Regression•Clustering•Association Rules•VisualizationOutput•Result
  16. 16. Explorer• Explorer is WEKA‟s main user interface.• The Explorer interface features several panels providing access to the maincomponent of the work bench : Preprocess. Classify Associate Cluster Select Attributes Visualize.• Preprocess Panel: This can be used to transform the data and make itpossible to delete the instances and attributes according to specific criteria.• Classify Panel: Enables the users to apply classification and regressionalgorithms to resulting dataset, to estimate accuracy of the resultingpredictive model.
  17. 17. • Associate Panel: This provides access to association rule learners thatattempt to identify all important interrelationships between attributes in thedata.• Cluster Panel: This gives access to the clustering techniques in WEKA.• Select Panel: This panel provides algorithms for identifying the mostpredictive attributes in a dataset.• Visualize Panel: This panel shows a scatter plot matrix, where individualscatter plots can be selected and enlarged, and analyzed further usingvarious selection operators.
  18. 18. Experimenter• This allows the systematic comparison of the predictive performance ofWEKA‟s machine learning algorithms on a collection of datasets.• Experimenter also allows us to set large-scale experiments, start themrunning, leave them, and they analyze the performance statistics that havebeen collected.• They automate the experimental process.• The statistics can be stored in ARFF format.• It allows users to distribute the computing load across multiple machinesusing Java RMI.
  19. 19. The Experimenter
  20. 20. Knowledge Flow• The Knowledge Flow provides an alternative to the Explorer as a graphicalfront end to WEKA‟s core algorithms.• The Knowledge Flow presents a data-flow inspired interface to WEKA.• The user can select WEKA components from a tool bar, place them on alayout canvas and connect them together in order to form a knowledge forFlow processing and analyzing data.• Unlike the Explorer the Knowledge Flow can handle data eitherincrementally or in batches.
  21. 21. Knowledge-Flow
  22. 22. Simple CLI• Simple CLI provides a command line mode to access WEKA.
  23. 23. Conclusion• In sum, the overall goal of WEKA is to build a state-of-the-art facility fordeveloping machine learning (ML) techniques and allow people to applythem to real-world data mining problems.• Detailed documentation about different functions provided by WEKA canbe found on WEKA website.
  24. 24. WEB MINING
  25. 25. Overview• What is Web mining ?• Challenges related to web mining.• Web mining applications.• Problems with Web search.• Improvised search – adding structure to the web.• Conclusion.
  26. 26. What is Web Mining ?• Web mining is the use of data mining techniques to automatically discoverand extract information from web documents / services.• Discovering useful information from the World-wide Web and its usagepatterns.• Web mining can be divided into three different type: Web usage mining. Web Content mining. Web structure mining.
  27. 27. Challenges related to Web Mining• The web is a huge collection of documents except for the following: Hyperlink information Access and usage information.• The web is very dynamic, new pages are constantly being generated.• Challenge: The main challenge is to develop new web mining algorithmsand adapt traditional data mining algorithms to exploit hyperlinks andaccess patterns.
  28. 28. Web Mining Applications• E-Commerce (Infrastructure) Generate User profiles. Internet Advertising. Fraud. Similar Image Retrieval.• Information retrieval (search) on web Automatic generation of topic hierarchies. Web Knowledge bases. Extraction of schema for XML documents.• Network Management Performance Management. Fault Management.
  29. 29. User Profiling.• Important for improving customization: Provides users with pages, advertisements of interest. Example profiles: on-line trader, on-line shopper.• Generate user profiles based on their access patterns Cluster users based on frequently accessed URLs Use classifier to generate a profile for each cluster.
  30. 30. Internet Advertising.• Scheme 1: Manually associate a set of ads with each user profile. For each user, display an ad from the set based on profile.• Scheme 2: Automate association between ads and users. Use ad click information to cluster users. For each cluster, find ads that occur most frequently in the cluster and thesebecome the ads for the set of users in the cluster.
  31. 31. Fraud• With the growing popularity of E-commerce, systems to detect and preventfraud on the web become important.• Maintain a signature for each user based on buying patterns on the web.• If buying pattern changes significantly, then signal fraud.• HNC software uses domain knowledge and neural networks for credit cardfraud detection.
  32. 32. Image Retrieval System• Given: A set of images• Find: All images similar to a given image. All pairs of similar images.• Few applications of the image retrieval system are : Medical diagnosis. Weather Prediction Web search engine for images. E-commerce.
  33. 33. Problems with Web Search• Today‟s search engine are plagued by many problems and few of them areas mentioned below: The “abundance” problem. “Limited coverage” of the web.(largest crawlers cover less than 18% of all the web pages. “Limited Query” interface based on keyword-oriented search. “Limited customization” to individual users. Web is “highly dynamic”.
  34. 34. Improvised searching – Addingstructure to the web
  35. 35. Conclusion• Web mining systems needs to be implemented to: Understand visitor‟s profiles. Identify company‟s strength and weaknesses. Measure the effectiveness of online marketing efforts.• Web mining support on-going continuous improvements for E-businesses.
  36. 36. References•••••