ConQueSt
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

ConQueSt

  • 809 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
809
On Slideshare
807
From Embeds
2
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
1

Embeds 2

http://www.slideshare.net 2

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Francesco Bonchi, Fosca Giannotti, Claudio Lucchese, Salvatore Orlando, Raffaele Perego, Roberto Trasarti KDD Laboratory HPC Laboratory ISTI – C.N.R. Italy “ On Interactive Pattern Mining from Relational Databases”
  • 2.
    • Demo @ ICDE’06 and Black Forest Workshop
    • New features:
      • Discretization tools
      • On the fly strenghtening/relaxing of contraints
      • Soft constraints (see this afternoon talk)
    • Plan of the talk:
      • in a nutshell
      • Constraint-based Frequent Pattern Discovery
      • Language, architecture, mining engine
      • Demo
      • Future developments
  • 3. in a nutshell
    • A Constraint-based Querying System aimed at supporting Frequent Patterns Discovery.
    • Follows the Inductive Database vision:
      • mining as a querying process
      • closure principle: patterns are first class citizens
      • mining engine amalgamated with commercial DBMS
    • Focus on constraint-based frequent patterns:
      • large variety of constraints handled
      • very efficient and robust mining engine
    • SPQL: “simple pattern query language”
      • superset of SQL
      • uses SQL to define the input data sources
      • plus some syntactic sugar to specify data prep-processing
      • plus some syntactic sugar to specify mining parameters
  • 4. in a nutshell
    • The Knowledge Discovery Process
  • 5. in a nutshell
    • Knowledge Discovery is an intrinsically exploratory process :
      • human-guided
      • interactive
      • Iterative
      • … efficiency is a issue!
    • Constraints can be used to drive the discovery process toward potentially interesting patterns.
    • Constraints can also be used to reduce the cost of pattern mining computation.
  • 6. Frequent Pattern Discovery
    • Frequent Pattern Discovery , i.e. mining patterns which satisfy a user-defined constraint of minimum frequency.
    • Basic step of “ Association Rules” mining
    • Market Basket Analysis
    Customer1 Customer2 Customer3 Milk, eggs, sugar, bread Milk, eggs, cereal, bread Eggs, sugar
  • 7. Constraint-based Frequent Patterns
    • I = {x 1 ,…,x n }
    • Constraint: C: 2 I  {True, False}
    • Frequency constraint:
      • D a bag of transactions t  I
      • sup D (x) = |{t  D| X  t}|
      • minimum support 
      • sup D (x)  
    • Other constraints:
      • defined on the items forming an itemset
      • defined on some attributes of the items
  • 8. Constraint-based Frequent Patterns
    • Q: sup D (x)  2  sum(x.price)  20
    • Solution set:
      • {meat}
      • {fruit,meat}
  • 9. Constraint-based Frequent Patterns
    • This is an ideal situation…
    • ... when you come to real data:
      • No transactions but relations
      • Functional dipendency item  attribute hardly held
      • (e.g. prices change along time)
  • 10. provides:
    • easy way to define the “mining view”
      • just indicate which features are items
      • which features are transactions
      • which features are items attributes
      • it handles both inter-attribute and intra-attribute frequent patterns mining
    • easy way to solve items-attribute conflicts
      • e.g. different prices for item “beer”
      • possible solutions: take-first, take-avg, take-min etc…
  • 11. ( Simple Pattern Query Language)
    • MINE PATTERNS WITH SUPP>= 5 IN
    • SELECT product.product_name, product.gross_weight, sales.time_id, sales.customer_id, sales.store_id
    • FROM [product], [sales_fact_1998]
    • WHERE sales_fact_1998.product_id=product.product_id
    • TRANSACTION sales.time_id, sales.customer_id, sales.store_id
    • ITEM product.product_name
    • ATTRIBUTE product.gross_weight
    • CONSTRAINED BY Average(product.gross_weight)<=15
  • 12.  
  • 13. ’ s mining engine
    • Level-wise apriori-like algorithm
    • DCI + ExAMiner + ExAMiner lam + …
    • Able to push a large variety of constraints
      • subset, supset, lenght, min, max, sum, range, avg, var, med, md, std, etc…
    • Efficient and robust
    • Modular
    • Data aware
    • Resource aware
  • 14. Demo
  • 15. : future developments
    • Strenghten the pattern browser
      • interactive querying
      • incremental mining
      • visualization tools
    • Strenghten post-processing of patterns
      • not only rules… build global models from the extracted patterns
    • More complex patterns
      • sequences, graphs etc…
  • 16. ’ s contacts
    • Webpage (wrk in progress):
    • http://www-kdd.isti.cnr.it/ConQueSt/
    • Contact:
    • [email_address]
    • [email_address]
  • 17. References
    • F. Bonchi, F. Giannotti, C. Lucchese, S. Orlando, R. Perego, R. Trasarti ConQueSt : a Constraint-based Querying System for Exploratory Pattern Discovery In Proceedings of The 22nd International Conference on Data Engineering (ICDE'06), ©IEEE. April 3-7, 2006, Atlanta, GA, USA.
    • S. Bistarelli, F. Bonchi, Interestingness is not a Dichotomy : Introducing Softness in Constrained Pattern Mining In Proceedings of the Ninth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'05) Lecture Notes in Computer Science, Volume 3721, ©Springer. October 3-7, 2005, Porto, Portugal.
    • F. Bonchi, C. Lucchese Pushing Tougher Constraints in Frequent Pattern Mining In Proceedings of the Ninth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'05). Lecture Notes in Computer Science, Volume 3518, ©Springer. May 18-20, 2005, Hanoi, Vietnam.
    • F. Bonchi, C. Lucchese On Closed Constrained Frequent Pattern Mining In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM'04), ©IEEE. November 01-04, 2004. Brighton, UK.
    • F. Bonchi, B. Goethals FP-Bonsai : the Art of Growing and Pruning Small FP-trees In Proceedings of the Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'04). Lecture Notes in Computer Science, Volume 3056, ©Springer. May 26-28, 2004, Sydney, Australia.
  • 18. References
    • F. Bonchi, F. Giannotti, A. Mazzanti, D. Pedreschi ExAMiner : Optimized Level-wise Frequent Pattern Mining with Monotone Constraints In Proceedings of the Third IEEE International Conference on Data Mining (ICDM'03), ©IEEE. November 19-22, 2003 Melbourne, Florida, USA.
    • F. Bonchi, F. Giannotti, A. Mazzanti, D. Pedreschi ExAnte : Anticipated Data Reduction in Constrained Pattern Mining In Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'03) Lecture Notes in Computer Science, Volume 2838, ©Springer. September 22-26, 2003, Cavtat-Dubrovnik, Croatia.
    • F. Bonchi, F. Giannotti, A. Mazzanti, D. Pedreschi Adaptive Constraint Pushing in Frequent Pattern Mining In Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'03) Lecture Notes in Computer Science, Volume 2838, ©Springer. September 22-26, 2003, Cavtat-Dubrovnik, Croatia.
    • F.Bonchi, C.Lucchese Extending the State-of-the-Art of Constraint-based Pattern Discovery Data and Knowledge Engineering (DKE) ©Elsevier, Accepted for Publication, 2006.
    • F.Bonchi, C.Lucchese On Condensed Representations of Constrained Frequent Patterns Knowledge and Information Systems - An International Journal (KAIS) ©Springer, 9(2), February 2006.
  • 19. References
    • F.Bonchi, F.Giannotti, A. Mazzanti, D. Pedreschi ExAnte : A Preprocessing Method for Frequent Pattern Mining IEEE Intelligent Systems, ©IEEE, 20(3):2-8 May/June 2005.
    • F.Bonchi, F.Giannotti, A. Mazzanti, D. Pedreschi Efficient Breadth-first Mining of Frequent Pattern with Monotone Constraints Knowledge and Information Systems - An International Journal (KAIS) ©Springer, 8(2), August 2005.
    • F. Bonchi, F.Giannotti, D.Pedreschi A Relational Query Primitive For Constraint-based Pattern Mining In &quot;Constraint-based Mining and Inductive Databases&quot;, Jean-Francois Boulicaut, Luc De Raedt and Heikki Mannila Ed., Lecture Notes in Computer Science, Volume 3848, ©Springer, 2005.
    • F. Bonchi, F.Giannotti Pushing Constraints To Detect Local Patterns In &quot;Detecting Local Patterns&quot;, Katharina Morik, Jean-Francois Boulicaut and Arno Siebes Ed., Lecture Notes in Computer Science, Volume 3539, ©Springer, 2005.
    • F. Bonchi, F. Giannotti, D. Pedreschi Frequent Pattern Queries for Flexible Knowledge Discovery In Proceedings of the Twelfth Italian Symposium on Advanced Database Systems (SEBD'04), 2004.
    • F. Bonchi Frequent Pattern Queries : Language and Optimizations Ph.D. Thesis, TD10-03, Dipartimento di Informatica Università di Pisa, 2003.