VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
Mining data streams
1. 6. Mining Data Streams
Prepared By: Akash Gupta
Qamar Aazam Khan
Adil Shaikh
Sakib Shaikh
2. 1. Stream Management
In a DBMS, input is under the control of the programming staff.
Ex: SQL Insert commands or bulk loaders.
Stream Management is important when the input rate is controlled externally.
Ex: Google search queries
2. The stream Model
Input tuples enter at a rapid rate, at one or more input ports.
The System cannot store the entire steam accessibly.
So, how we would make critical calculations about the stream using the limited amount of
memory (Primary or Secondary).
To solve, this critical calculations there are two forms of query.
3. Two Forms of Query:
i. Ad-hoc queries
Normal queries asked one time about the streams
Ex: What is the maximum value seen so far in the stream S?
ii. Standing queries
Queries that are in principle asked all the times in the stream.
Ex: Report each new maximum value ever seen in the stream S.
4.
5. Archival storage
It only store the data in the database. If anyone wants to review the history of the data the it
takes a long time and for the any queries of user it’s takes a time to respond to the user.
Limited Working Storage
To overcome the limitations of the archival storage the limited working storage has come it
respond fast to the user queries and the processing time is slow.
Output
It gives the output of the stream in the model according to the user queries in the database.
6. Application's of Stream Model
1.Mining query streams
Google wants to know what are the queries more frequently used today than yesterday.
2. Mining Click Streams
Yahoo! wants to know which of its pages are getting an unusual number of hits in the past
hour.
3. IP Packets can be monitored at a switch
• Gather information of the optimal routing
• Detect the Denial-of-Service Attacks
7. Sliding Windows
A useful model of stream processing is that
queries are about a window of length N, the N is the most recent elements
received.
Where ‘N’ is so large that it cannot be stored in the main memory.
Ex: Windows size N=6.
q w e r t y u i o p a s d f g h j k l z x c v b n m
q w e r t y u i o p a s d f g h j k l z x c v b n m
.
.
q w e r t y u i o p a s d f g h j k l z x c v b n m
Past Future
8. DGIM ALGORITHM (DATAR-GIONIS-INDYK-
MOTWANI)
DGIM Algorithm is used for the counting numbers of 1’s in the stream.
Stores only O(𝑙𝑜𝑔2N) bits per stream (N=window size).
Gives approx. answer up to 50%.
Error factor can be reduced to any fraction greater than zero.
Error bit in the stream has an timestamp starting from 0,1, ….
9. Bloom Filter
A bloom filter is placed on the stream of the URL’s will declare that the
certain URL’s have seen before.
Others will be declared new and will be added to the list of the URL’s
that need ton be crawled.
Unfortunately, Bloom filter can have false and true value (i.e. 0 & 1
respectively)
A bloom filter us the array of bits together with a number of hash
functions
Initially all the bits are zero.
When the input x arrives we set to 1 the bits for the each hash functions.