MaxQDPro Team Anjan.K Harish.R II Sem M.Tech CSE 06/10/09 MCSE 202 : Topics in DB Systems
Introduction to GNOME  Need for Mining Mining Challenges GNOME Data Access Database usage grid Components Features GNOME Data miner Summary References 06/10/09 MCSE 202 : Topics in DB Systems
GNOME is acronym for GNU Network Object Model Environment. International Project that provides software development frameworks initially developed for desktop environment. GNU project compatible with Unix like OS and sit on the top Kernel GNOME-DB  aims to provide free unified data access architecture to GNOME projects. Known for its pretty good data management API’s. 06/10/09 MCSE 202 : Topics in DB Systems
The Explosive Growth of Data: from terabytes to petabytes Data collection and data availability Automated data collection tools, database systems, Web, computerized society Major sources of abundant data Business: Web, e-commerce, transactions, stocks, …  Science: Remote sensing, bioinformatics, scientific simulation, …  Society and everyone: news, digital cameras, YouTube  We are drowning in data, but starving for knowledge!   Data mining — Automated analysis of massive data sets also called as Knowledge Discovery Process (KDD). 06/10/09 MCSE 202 : Topics in DB Systems
06/10/09 MCSE 202 : Topics in DB Systems Copyright © Data Mining and warehousing by Han et al.,
Exceeds the designers expectations Data warehouses typically grow asynchronously. Establishing the scalability of a system across the lifetime . Data is everywhere Data is inconsistent Records are different in each system  Noisy Data Performance issues Running queries to summarize data for stipulated long period takes operating system for task (max. Load) 06/10/09 MCSE 202 : Topics in DB Systems
GNOME has its own tool for data access similar to proprietary Microsoft OLE. Key issue in the data access is with heterogeneous data sources and variety of different access methods to each of them Access methods and SQL are not standards de-facto. Its middleware to access various data sources Libgda is the actual tool used for this purpose. 06/10/09 MCSE 202 : Topics in DB Systems
06/10/09 MaxQDPro: Kettle-  ETL Tool
Consists of Three Major components Libgda  (Library Gnome Data access) Data abstraction layer Manages data stored in databases Interfaces with Glib and LibXML Can be use for non-GNOME applications Libgnomedb DB widget library Depends on GTK+ Mergeant Front end for DB administration and application developers. 06/10/09 MCSE 202 : Topics in DB Systems
Easier access to several database engine Metadata extractor Easy to use API’s Comes with Console and Graphical UI Open source or General Public license Direct editing of DB data  Compatible with most programming language Distributed transactions are supported. 06/10/09 MCSE 202 : Topics in DB Systems
Open Source Data Mining Tools, collection of experimental GUI-based tools written in Python and GTK by Togaware Uses GDA to access the heterogeneous data sources Build the warehouse after essential processing and transformation steps with help flexible GNOME API’s 06/10/09 MCSE 202 : Topics in DB Systems
GUI can be used for the visual checks. Used on Unix- variant system like Debian, Red Hat, Ubuntu etc.,  Mining system is generic so can be used for most of the routine works. New Data mining tool by GNOME is  Rattle Greening  is a decision tree builder with stochastic boosting and random forests  06/10/09 MCSE 202 : Topics in DB Systems
Some of the associated application with GDM Decision trees Apply  Apriori  Association rules for identifying Frequent item set. Bayes  Classification for building and classifying the trained data. Bar chart  and Binning Chart GDM plot utility  for Q-Q plot, Histogram analysis, Correlation plot 06/10/09 MCSE 202 : Topics in DB Systems
Introduction to GNOME Need for mining KDD Challenges GNOME Data Access Components Features GNOME Data Mining 06/10/09 MCSE 202 : Topics in DB Systems
[1] An article in URL  http://www.gnome.org [2] Han et.al., “Data Mining and Warehousing”   2 nd  Edition  [3] An article in URL  http://www.gnomedb.org [4] An article in wikipedia.org 06/10/09 MCSE 202 : Topics in DB Systems

Mining Gnome Data

  • 1.
    MaxQDPro Team Anjan.KHarish.R II Sem M.Tech CSE 06/10/09 MCSE 202 : Topics in DB Systems
  • 2.
    Introduction to GNOME Need for Mining Mining Challenges GNOME Data Access Database usage grid Components Features GNOME Data miner Summary References 06/10/09 MCSE 202 : Topics in DB Systems
  • 3.
    GNOME is acronymfor GNU Network Object Model Environment. International Project that provides software development frameworks initially developed for desktop environment. GNU project compatible with Unix like OS and sit on the top Kernel GNOME-DB aims to provide free unified data access architecture to GNOME projects. Known for its pretty good data management API’s. 06/10/09 MCSE 202 : Topics in DB Systems
  • 4.
    The Explosive Growthof Data: from terabytes to petabytes Data collection and data availability Automated data collection tools, database systems, Web, computerized society Major sources of abundant data Business: Web, e-commerce, transactions, stocks, … Science: Remote sensing, bioinformatics, scientific simulation, … Society and everyone: news, digital cameras, YouTube We are drowning in data, but starving for knowledge! Data mining — Automated analysis of massive data sets also called as Knowledge Discovery Process (KDD). 06/10/09 MCSE 202 : Topics in DB Systems
  • 5.
    06/10/09 MCSE 202: Topics in DB Systems Copyright © Data Mining and warehousing by Han et al.,
  • 6.
    Exceeds the designersexpectations Data warehouses typically grow asynchronously. Establishing the scalability of a system across the lifetime . Data is everywhere Data is inconsistent Records are different in each system Noisy Data Performance issues Running queries to summarize data for stipulated long period takes operating system for task (max. Load) 06/10/09 MCSE 202 : Topics in DB Systems
  • 7.
    GNOME has itsown tool for data access similar to proprietary Microsoft OLE. Key issue in the data access is with heterogeneous data sources and variety of different access methods to each of them Access methods and SQL are not standards de-facto. Its middleware to access various data sources Libgda is the actual tool used for this purpose. 06/10/09 MCSE 202 : Topics in DB Systems
  • 8.
  • 9.
    Consists of ThreeMajor components Libgda (Library Gnome Data access) Data abstraction layer Manages data stored in databases Interfaces with Glib and LibXML Can be use for non-GNOME applications Libgnomedb DB widget library Depends on GTK+ Mergeant Front end for DB administration and application developers. 06/10/09 MCSE 202 : Topics in DB Systems
  • 10.
    Easier access toseveral database engine Metadata extractor Easy to use API’s Comes with Console and Graphical UI Open source or General Public license Direct editing of DB data Compatible with most programming language Distributed transactions are supported. 06/10/09 MCSE 202 : Topics in DB Systems
  • 11.
    Open Source DataMining Tools, collection of experimental GUI-based tools written in Python and GTK by Togaware Uses GDA to access the heterogeneous data sources Build the warehouse after essential processing and transformation steps with help flexible GNOME API’s 06/10/09 MCSE 202 : Topics in DB Systems
  • 12.
    GUI can beused for the visual checks. Used on Unix- variant system like Debian, Red Hat, Ubuntu etc., Mining system is generic so can be used for most of the routine works. New Data mining tool by GNOME is Rattle Greening  is a decision tree builder with stochastic boosting and random forests 06/10/09 MCSE 202 : Topics in DB Systems
  • 13.
    Some of theassociated application with GDM Decision trees Apply Apriori Association rules for identifying Frequent item set. Bayes Classification for building and classifying the trained data. Bar chart and Binning Chart GDM plot utility for Q-Q plot, Histogram analysis, Correlation plot 06/10/09 MCSE 202 : Topics in DB Systems
  • 14.
    Introduction to GNOMENeed for mining KDD Challenges GNOME Data Access Components Features GNOME Data Mining 06/10/09 MCSE 202 : Topics in DB Systems
  • 15.
    [1] An articlein URL http://www.gnome.org [2] Han et.al., “Data Mining and Warehousing” 2 nd Edition [3] An article in URL http://www.gnomedb.org [4] An article in wikipedia.org 06/10/09 MCSE 202 : Topics in DB Systems