2. INTRODUCTION
Sensor association rules have been found to be very useful for
determining frequent patterns in wireless sensor networks
(WSN).
These mining algorithms generate patterns from the sensor
responses and further match it with already existing database
of frequent patterns. If any anomaly or delay is detected for ,
sensors are determined that have stopped working and
necessary actions are taken.
These mining algorithm have helped us with in two important
prospects :
1. Memory usage is optimized 2. Time of execution is reduced
We propose mining algorithms to overcome these essential
challenges :
3. LITERATURE REVIEW
Reference Technique used Performance
[1]P-N Tan, ” Knowledge
discovery from sensor
data,” Sensors, pp, 14-
19, 2006
[2]A. Boukerchhe and S.A.
Samarah,” Novel
algorithm for mining
association rules from
WSN”, WSN, vol.19,
no.17, pp- 865-877,
2008
Apriori Algorithm
• Database was
scanned multiple
times
• The time of
execution was high
• Memory usage was
high
This project N-RMP Algorithm
• Database was
scanned only once
• Time of execution
was very less
• Memory usage was
4. SYSTEM REQUIREMENTS
Hardware Requirements:
Processor : Any Processor above 500 MHz.
RAM : 128Mb.
Hard Disk : 10 Gb.
Compact Disk : 650 Mb.
Input device : Standard Keyboard and Mouse.
Output device : VGA and High Resolution Monitor.
Software requirements:
Operating System : Windows Family.
Language : JDK 1.5
5. SYSTEM ANALYSIS
EXISTING SYSTEM
• Existing systems run not on dynamic but static
datasets.
• These algorithms that run on static datasets are
bound to fail to provide correct response for
dynamic data.
• Systems run on algorithms with higher degree of
time complexity.
• The already existing systems have the following
disadvantages:
6. PROPOSED SYSTEM
• Systems connected to the internet and can run
database that is always updating
• System where the sliding window protocol can be
applied.
• Degree of time complexity is reduced in newer
algorithms applied in these systems.
• The systems will have the following advantages:
• Could run dynamic datasets
• Uses less memory thus efficiency is increased
• Reducing the time of execution manifolds
8. APPLIED ALGORITHMS ON THE DATABASE
COLLECTED FROM THE SINK NODE:
Apriori algorithm
N-RMP algorithm
IMPLEMENTATION
9. INTRODUCTION TO APRIORI
ALGORITHMS The Apriori Algorithms an influential algorithm for mining
frequent item-sets for boolean association rules Some key
points in Apriori algorithm –
To mine frequent item-sets from traditional database for
boolean association rules.
A subset of frequent item-set must also be frequent item-sets.
For example, if {l1, l2} is a frequent item-set then {l1}, {l2}
should be frequent item-sets.
An iterative way to find frequent item-sets.
Use the frequent item-sets to generate association rules.
10. CONCEPTS
A set of all items in a store
A set of all transactions (Transactional Database T) •
Each is a set of item set.
Each transaction has a Transaction ID (TID).
Initial frequent
set
Candidate
generation
Candidate pruning
Support
calculation
11. CONCEPTS
Uses level wise search where k item-sets are use to explore (k+1)
item-set.
Frequent subsets are extended one item at a time, which is known
as candidate generation process.
Groups of candidates are texted against the data.
It identifies the frequent individual items in the database and
extends them to larger and larger item-sets as long as those item-
sets appear sufficiently often in the database.
Apriori algorithm determines frequent item-set to determine
association rules.
12. APRIORI ALGORITHM – THE PSEUDO
CODE
Join Step: is generated by joining with itself.
Prune Step: Any (k-1) item-set that is not frequent cannot be a
subset of a frequent k item-set
Pseudo – Code:
C k: candidate item-set of size k
F k: frequent item-set of size k
L1= {frequent items};
for (k = 1; L k=! ; k++) do begin
C k+1candidate key generated from Lk
for each transaction t in database do increment the count of all
candidates in Ck+1 that are contained in t
L k+1= candidate in C k+1 with min_support
end
return Uk + Lk
13. HOW THE ALGORITHM WORKS
We have to build candidate list for k item-sets and
extract a frequent list of k-item-sets using support
count.
After that we use the frequent list of k item-sets in
determining the candidate and frequent list of k+1 item-
sets.
We use pruning to do that.
We repeat until we have an empty candidate or frequent
support of k item-sets.
14. EXAMPLE OF APRIORI ALGORITHM
Consider the following
database: TID Items
T1 1 2 3
T2 2 3 5
T3 1 2 3 5
T4 2 5
T5 1 3 5
17. STEP 3 :
Item-sets Supoort
{1,3}
3
{1,5}
2
{2,3}
2
{2,5}
3
{3,5} 3
Frequent item-set 2 :
Item-sets In FI2?
{1,2,3}
{1,2} {1,3} {2,3}
NO
{1,2,5}
{1,2} {1,5} {2,5}
NO
{1,3,5}
{1,3} {1,5} {3,5}
YES
{2,3,5}
{2,3} {2,5} {3,5}
YES
Discarded
Reason : A subset of a frequent item-set must also be frequent item-
set.
Item-sets Support
{1,3,5} 2
{2,3,5} 2
Frequent item-set 3 :Candidate item-set 3 :
18. INTRODUCTION TO N-RMP
ALGORITHM
The N-RMP (Non-Redundant Mining Process) algorithm is a three
step mining process algorithm :
Step 1 : Scanning of
dataset
Step 2 : Processing
the non-redundant
data and discarding
of redundant data
Step 3 : Generation
of frequent item-set
N-RMP is able to capture the information with one scan over
the stream of sensor data and store them in a memory-
efficient highly compact manner similar to FP-Tree.
19. CONCEPT
SP-Tree is a frequency- descending compact tree structure.
Epoch in sensor database DS is inserted into a SP-Tree according
to lexicographic order and a header list S-list is also build at this
stage.
Once all epochs of DS are inserted into the tree, it is re-organized
into a frequency descending tree based on the calculated
frequency of the S-list.
Therefore SP-tree construction consists of two phases:Insertion phase: Epochs from DS is
inserted to a lexicographic tree and
a header list S-Tree is build.
Reorganization phase: Rearranges
the S-list in frequency descending
order and restructure the SP-tree.
20. N-RMP ALGORITHM – PSEUDO CODE
1) FG { }; // global list of frequent generators
2) fill C1 with 1-itemsets and count their supports;
3) copy frequent item-sets from C1 to F1;
4) mark item-sets in F1 as “closed”;
5) mark item-sets in F1 as “key” if their support < |O|; //where
|O| is the number of objects in the input dataset
6) if there is a full column in the input dataset, then FG {;};
7) i +1;
8) loop
9) {
10) Ci+1 NRMP-Gen(Fi);
21. 11) if Ci+1 is empty then break from loop;
12) count the support of “key” itemsets in Ci+1;
13) if Ci+1 has an itemset whose support = pred
supp,
then mark it as “not key”;
14) copy frequent itemsets to Fi+1;
15) if an itemset in Fi+1 has a subset in Fi with the
same
Support, then mark the subset as “not closed”;
16) copy “closed” itemsets from Fi to Zi;
17) Find-Generators(Zi);
18) i i + 1;
19) }
20) copy itemsets from Fi to Zi;
22. EXAMPLE OF N-RMP :
TID Epoch
T100 s1 s2 s3 s4 s7 s8
T200 s1 s5 s6
T300 s2 s5 s6 s7 s8
T400 s1 s2 s4 s7
T500 s1 s2 s4 s5
T600 s1 s3 s4 s7
Consider the following
Dataset:
23. STEP 1.0 : BUILDING OF
LEXICOGRAPHIC TREE AND
CORRESPONDING SP-TREE
S-list
S1 = 5
S2 = 4
S3 = 2
S4 = 4
S5 = 3
S6 = 2
S7 = 4
S8 = 2
Table : S-list from the given data
{ }
s1:
5
s2:
1
S5:1
s7:
1
s3:
1
s4:
1
s4:
2
s5:
1
s7:
1
s4:
1
S7:1
s2:
3 s3:
1
s8:
1
s6:
1
S8:
1
S7:
1
S6:
1
s5:
1
Tree : Interpretation of S-list
24. STEP 2.0 : FREQUENCY DESCENDING SP-
TREE
S-list
S1 = 5
S2 = 4
S4 = 4
S7 = 4
S5 = 3
S3 = 2
S6 = 2
S8 = 2
Ssort
{ }
s1:
5
S2:1
S5:1
s7:
1
S6:1
s5:
2
s4:
1
s6:
1
s3:
1
s2:
2
s7:
3
s4:
3
s3:
1
s8:
1
s8:
1
s2:
1
Table : Sorted list from S-
list
Tree : Traversed according to descending
frequency
26. CONCLUSION
• Apriori
• The apriori has a higher runtime for execution.
• It consumes more memory than N-RMP.
• It uses large item-sets
• Assumes transaction database is memory resident.
• Requires multiple database scan.
• N-RMP
• The N-RMP algorithm is more efficient than the apriori.
• Lesser runtime.
• Consumes less memory
• Transaction database is discarded to save more memory.
• Requires just a single database scan.
28. FURTHER ENHANCEMENTS
• The algorithm is designed for dynamic datasets.
• The algorithm should adapt according to the rate of
dataflow;
if the rate of data flow is high, the algorithm selects
smaller time windows to run, whereas, if the data
flow is low, the algorithm
selects bigger time window thus making it a more
efficient algorithm.
29. REFERENCES
[1]P-N Tan, ” Knowledge discovery from sensor data,” Sensors,
pp, 14-19, 2006
[2]A. Boukerchhe and S.A. Samarah,” Novel algorithm for mining
association rules in Wireless Ad-hoc Sensor networks”, IEEE
Transactions on parallel and distributed Systems, vol.19,
no.17, pp- 865-877, 2008
[3]S.K. Tanbeer, C.F. Ahmed, B.S. Jeong, “ An efficient Single-
pass algorithm for mining association from wireless sensor
networks,” IETE Technical review, vol. 26, Issue 4, 2009.