1. Data Mining Threat Identification of
the Australian Terrestrial Biodiversity
Database (NLWRA 2002)
Kurt Pudniks
24 Aug 2012
2. Overview
• Video clip (7 min)
• Presentation (15 min) 4-Mat system
– Requirements Why
– Design What
– Implementation How
– Maintenance -> future req If
• Questions (5 min)
3. Why process data?
• Efficiency
– Smart use of limited resources
– Develop knowledge management (eg. SOPs)
– Measurement of progress against goals
• Effectiveness
– Targeted goals that make a difference
– Workforce and policy focus on the right areas
– Awareness and flexibility to adapt tactics
4. What is datamining (arules)?
• Datamining is use of automated algorithms
– Encouraged to understand fundamentals (R, FOSS)
– eg. Recursive partition, random forest, classify, ML
• Association Rules (arules) is described as:
– Mining frequent itemsets and association rules is a popular and well
researched method for discovering interesting relations between
variables in large databases
5. What is datamining (arules)?
• Shopping basket:
• Support
– {milk,bread} has a support of 2/5 = 0.4
• Confidence
– {milk,bread} -> {butter} is 0.2 / 0.4 = 0.5
• Lift
– Dev from independent LHS & RHS ie. 0.2 / (0.4 * 0.6) = 0.5 / 0.6
– Keep this assumption in mind for later...
6. How to use arules
http://adl.brs.gov.au/anrdl/metadata_files/pa_badesr9nn__02211a01.xml
12. If future work were possible...
• Buyer beware!
– Lies, damned lies, and statistics…
• Read the fine print:
– The lift of a rule is defined as lift(X->Y) = supp(X U Y) / (supp(X) * supp(Y)) and can
be interpreted as the deviation of the support of the whole rule from the support
expected under independence given the supports of the LHS and the RHS. Greater
lift values indicate stronger associations.
• Good example of value of meta-data
– Is meta data simply “data about data”?
• Highlights the need for diversity of views
– Sharing of skills and experience in context