MaxQDPro Team
            Anjan.K                Harish.R
           II Sem M.Tech CSE



05/22/09        MCSE 202 : Topic...
   Introduction to GNOME
   Need for Mining
   Mining Challenges
   GNOME Data Access
    ◦ Database usage grid
    ◦ ...
   GNOME is acronym for GNU Network Object Model
    Environment.
   International Project that provides software
    de...
   The Explosive Growth of Data: from terabytes to petabytes
    ◦ Data collection and data availability
       Automate...
Copyright © Data Mining and warehousing by Han et al.,


             MCSE 202 : Topics in DB Systems       05/22/09   5
   Exceeds the designers expectations
   Data warehouses typically grow asynchronously.
   Establishing the scalability...
   GNOME has its own tool for data access
    similar to proprietary Microsoft OLE.
   Key issue in the data access is w...
MaxQDPro: Kettle- ETL Tool   05/22/09   8
   Consists of Three Major components
    ◦ Libgda (Library Gnome Data access)
      Data abstraction layer
      Manag...
   Easier access to several database engine
   Metadata extractor
   Easy to use API’s
   Comes with Console and Graph...
   Open Source Data Mining Tools,
    collection of experimental GUI-based
    tools written in Python and GTK by
    Tog...
   GUI can be used for the visual checks.
   Used on Unix- variant system like Debian,
    Red Hat, Ubuntu etc.,
   Min...
   Some of the associated application with GDM
    ◦ Decision trees
    ◦ Apply Apriori Association rules for identifying...
   Introduction to GNOME
   Need for mining
    ◦ KDD
    ◦ Challenges
   GNOME Data Access
    ◦ Components
    ◦ Feat...
[1] An article in URL http://www.gnome.org
[2] Han et.al., “Data Mining and Warehousing”
   2nd Edition
[3] An article in ...
Upcoming SlideShare
Loading in …5
×

Mining Gnome Data

1,615 views

Published on

Concentrates on the mining issue in the GNOME database and the access methodlogy.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,615
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
20
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Mining Gnome Data

  1. 1. MaxQDPro Team Anjan.K Harish.R II Sem M.Tech CSE 05/22/09 MCSE 202 : Topics in DB Systems 1
  2. 2.  Introduction to GNOME  Need for Mining  Mining Challenges  GNOME Data Access ◦ Database usage grid ◦ Components ◦ Features  GNOME Data miner  Summary References MCSE 202 : Topics in DB Systems 05/22/09 2 
  3. 3.  GNOME is acronym for GNU Network Object Model Environment.  International Project that provides software development frameworks initially developed for desktop environment.  GNU project compatible with Unix like OS and sit on the top Kernel  GNOME-DB ◦ aims to provide free unified data access architecture to GNOME projects. ◦ Known for its pretty good data management API’s. MCSE 202 : Topics in DB Systems 05/22/09 3
  4. 4.  The Explosive Growth of Data: from terabytes to petabytes ◦ Data collection and data availability  Automated data collection tools, database systems, Web, computerized society ◦ Major sources of abundant data  Business: Web, e-commerce, transactions, stocks, …  Science: Remote sensing, bioinformatics, scientific simulation, …  Society and everyone: news, digital cameras, YouTube  We are drowning in data, but starving for knowledge!  Data mining—Automated analysis of massive data sets also called as Knowledge Discovery Process (KDD). MCSE 202 : Topics in DB Systems 05/22/09 4
  5. 5. Copyright © Data Mining and warehousing by Han et al., MCSE 202 : Topics in DB Systems 05/22/09 5
  6. 6.  Exceeds the designers expectations  Data warehouses typically grow asynchronously.  Establishing the scalability of a system across the lifetime .  Data is everywhere  Data is inconsistent ◦ Records are different in each system ◦ Noisy Data  Performance issues ◦ Running queries to summarize data for stipulated long MCSE 202 : Topics in DB Systems 05/22/09 6 period takes operating system for task (max. Load)
  7. 7.  GNOME has its own tool for data access similar to proprietary Microsoft OLE.  Key issue in the data access is with heterogeneous data sources and variety of different access methods to each of them  Access methods and SQL are not standards de-facto.  Its middleware to access various data sources  Libgda is the actual tool used for this purpose. MCSE 202 : Topics in DB Systems 05/22/09 7
  8. 8. MaxQDPro: Kettle- ETL Tool 05/22/09 8
  9. 9.  Consists of Three Major components ◦ Libgda (Library Gnome Data access)  Data abstraction layer  Manages data stored in databases  Interfaces with Glib and LibXML  Can be use for non-GNOME applications ◦ Libgnomedb  DB widget library  Depends on GTK+ ◦ Mergeant  Front end for DB administration and Systems 05/22/09 MCSE 202 : Topics in DB application 9
  10. 10.  Easier access to several database engine  Metadata extractor  Easy to use API’s  Comes with Console and Graphical UI  Open source or General Public license  Direct editing of DB data  Compatible with most programming language  Distributed transactions are supported. MCSE 202 : Topics in DB Systems 05/22/09 10
  11. 11.  Open Source Data Mining Tools, collection of experimental GUI-based tools written in Python and GTK by Togaware  Uses GDA to access the heterogeneous data sources  Build the warehouse after essential processing and transformation steps with help flexible GNOME API’s MCSE 202 : Topics in DB Systems 05/22/09 11
  12. 12.  GUI can be used for the visual checks.  Used on Unix- variant system like Debian, Red Hat, Ubuntu etc.,  Mining system is generic so can be used for most of the routine works.  New Data mining tool by GNOME is Rattle  Greening is a decision tree builder with stochastic boosting and random forests MCSE 202 : Topics in DB Systems 05/22/09 12
  13. 13.  Some of the associated application with GDM ◦ Decision trees ◦ Apply Apriori Association rules for identifying Frequent item set. ◦ Bayes Classification for building and classifying the trained data. ◦ Bar chart and Binning Chart ◦ GDM plot utility for Q-Q plot, Histogram analysis, Correlation plot MCSE 202 : Topics in DB Systems 05/22/09 13
  14. 14.  Introduction to GNOME  Need for mining ◦ KDD ◦ Challenges  GNOME Data Access ◦ Components ◦ Features  GNOME Data Mining MCSE 202 : Topics in DB Systems 05/22/09 14
  15. 15. [1] An article in URL http://www.gnome.org [2] Han et.al., “Data Mining and Warehousing” 2nd Edition [3] An article in URL http://www.gnomedb.org [4] An article in wikipedia.org MCSE 202 : Topics in DB Systems 05/22/09 15

×