Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Like this? Share it with your network








Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

PPT Presentation Transcript

  • 1. RFID Data Management Kamlesh Laddhad (05329014) Karthik B.(05329021) Guide: Prof. Bernard Menezes
  • 2. Outline
    • Introduction to RFID Technology.
    • Issues with RFID Technology.
    • RFID Data Characteristics.
    • Data Warehousing.
      • Expressive Temporal Model: Dynamic Relationship ER Model
      • RFID - Cuboids.
      • Use of Bitmap Datatype.
    • Data Cleaning.
      • Extensible Sensor stream Processing (ESP)
      • Statistical sMoothing for Unreliable RFid data.(SMURF)
    • Future Plans.
  • 3. Introduction
    • Radio Frequency Identification:
      • It is an Automatic Identification and Data Capture Technology.
      • Fast
      • No contact or line of sight.
      • Uses radio-frequency waves to transfer data
    • Components
      • Tag: small, low-cost device that can hold a limited amount of data.
        • Associated with objects, such as pallets, cases, and even individual items.
      • Reader: Recognize presence of tag and read info stored on it.
    • Unique electronic product code (EPC) associated with a tag.
    • By placing RFID tag readers at various locations, one can track the movement of objects through supply chain networks.
  • 4. Applications and Adoptions
    • Supply Chain Management: real-time inventory tracking.
      • US Department Of Defense: shipments to armed forces
    • Retail: Active shelves monitor product availability
      • Wal-Mart, Albertson: Major Retails stores
    • Access control: toll collection, transportation.
      • Airline luggage management:
        • British airways:20 million bags a year
        • Implemented to reduce lost/misplaced luggage
    • Anti-counterfeiting and security:
      • Food and Drug Administration: To reduce counterfeit in pharmaceutical supply chain
  • 5. Prospective for RFID research
    • The physics of building tags and readers
      • Tags have few gates: Apart from basic operation, very less computing power.
      • Radio-frequency has some issues with operating in certain physical mediums.
    • The privacy and safety issues:
      • Complex encryption schemes are not possible on RFID tags.
      • Counterfeiting by means of either illegitimate readers or spoofed tags are possible
      • Reader-tag communication is wireless: Third parties can eavesdrop on signals.
    • Software Architecture to collect, filter, organize, and answer online queries:
      • No. of tags are proportional to No of items being serviced/tracked.
      • No. of readers are proportional to traceable strategic locations/areas
        • Each Reader picks up tag signals on continuous basis.
        • Data generated by RFID systems is enormous:
        • E.g. Wal-Mart is expected to generate 7 terabytes of RFID data per day.
    • Our Focus: Third Stream.
  • 6. Data Warehousing Techniques
  • 7. Data Management Challenges
    • Data Explosion : Example
      • A retailer with 3,000 stores, selling 10,000 items a day per store.
      • Each item moves 10 times on average before being sold
        • Movement recorded as (EPC, location, second)
      • Data volume: 300 million tuples per day.
      • Example OLAP Query: “Average time for items to move from warehouse to checkout counter in March 2006?”.
        • Costly to answer if there are a billion tuples for March 2006.
  • 8. Data Characteristics
    • Temporal and history oriented
      • Applications dynamically generate observations (readings).
      • Objects location and containment relationship among objects changes
      • Need: Expressive data model.
    • Inaccurate data and implicit semantics
      • False positive: Non-existing tag incorrectly read.
      • False Negative: Reader missed a tag which was in its vicinity.
      • Noisy data & duplicate readings (redundancy): Same tag read more than once.
      • Need: Automated data filtering and transformation.
    • Streaming and large volume
      • Object stay in place for longer duration: Readers records them periodically. Large data keeps generating.
      • We need to preserve this data for tracking and monitoring.
      • Need: Scalable storage scheme, compression techniques to reduce data.
    • Data Granularity
      • Data collection granularity needs to be decided
      • Differs across applications.
  • 9. Warehousing Helps!!
    • Lossless compression
      • Remove redundancy: (r 1 ,l 1 ,t 1 ) (r 1 ,l 1 ,t 2 ) ... (r 1 ,l 1 ,t 10 ) => (r 1 ,l 1 ,t 1 ,t 10 )
      • Group objects that move and stay together.
    • Data cleaning: Multi-reading, missed-reading, error-reading, bulky movement.
    • Data mining: Find trends, outliers, frequent, sequential, flow patterns.
    • Multi-dimensional summary: product, location, time, …
      • Store manager: Check item movements from the backroom to different shelves in his store
      • Region manager: Collapse intra-store movements and look at distribution centers, warehouses, and stores
    • Query Processing
      • Support for OLAP: roll-up, drill-down, slice, and dice
      • Path query: New to RFID-Warehouses, about the structure of paths
        • What products that go through quality control have shorter paths?
        • What locations are common to the paths of a set of defective auto-parts?
        • Identify containers at a port that have deviated from their historic paths
  • 10. Dynamic Relationship ER Model
    • Proposed by Wang and Liu from Siemens.
    • RFID entities are static and are not altered.
    • RFID relationships: dynamic and change all the time.
    • Two types of dynamic relationships added:
      • Event-based dynamic relationship. A timestamp attribute added to represent the occurring timestamp of the event.
      • State-based dynamic relationship. tstart and tend attributes added to represent the lifespan of a state.
  • 11.
    • Static entity table
      • OBJECT (object_epc, name, description)
      • LOCATION (location_id, name, owner)
    • Dynamic relationship tables
      • OBSERVATION(sensor_epc, value, timestamp)
      • OBJECTLOCATION(epc, location_id, tstart, tend)
      • TRANSACTIONITEM(transaction_id, epc, timestamp)
      • SENSOR (sensor_epc, name, description)
      • TRANSACTION (transaction_id, transaction_type)
      • CONTAINMENT(epc, parent_epc, tstart, tend)
      • SENSORLOCATION(sensor epc, location id,position, tstart, tend)
  • 12. Monitoring.
    • Missing RFID Object Detection:
      • Find when and where object holding EPC= `MEPC’ was lost.
        • select location_id, tstart, tend from objectlocaiton where epc='MEPC' and tstart = ( select max(o.tstart) from objectlocation o where o.epc='MEPC' )
      • Check if there are missing objects at current location C, knowing that all objects were complete at previous location L at time T.
        • select l.epc from objectlocation l where l.location_id = 'L' and l.tstart <= 'T' and l.tend >= 'T' and l.epc not in ( select c.epc from objectlocation c where c.location_id = 'C' )
  • 13. Tracking
    • RFID Object Moving Time Inquiry:
      • Time it takes to supply ‘OEPC’ from location S to location E?
        • select (e.tstart-s.tstart) as supplying_time from objectlocation e, objectlocation s where e.epc = 'OEPC' and s.epc='OEPC' and s.location_id ='S' and e.locaiton_id='E'
  • 14. Compression Idea
    • Bulky object movements
      • Objects often move and stay together through the supply chain.
      • If 1000 packs of product P stay together at the distribution center, register a single record.
      • (GID, distribution center, time_in, time_out).
      • GID is a generalized identifier that represents the 1000 packs that stayed together at the distribution center
    • Analysis usually takes place at a much higher level of abstraction than the one present in raw RFID data
    Factory Dist. Center 1 Dist. Center2 … 10 pallets (1000 cases) store 1 store 2 … 20 cases (1000 packs) shelf 1 shelf 2 … 10 packs (12 sodas)
  • 15. RFID Cuboids
    • Fact Table: (EPC, location, time_in, time_out).
    • In supply chain: Items travel through a series of locations.
    • Query: what is the average time that product P stays at store in Location A?
    • Traditional cubes miss the path structure of the data
    • Stay Table: (GIDs, location, time_in, time_out: measures):
      • Records information on items that stay together at a given location
      • If using record transitions: difficult to answer queries, lots of intersections needed
    • Map Table: (GID, <GID1,..,GIDn>)
      • Links together stages that belong to the same path. Provides additional: compression and query processing efficiency
      • High level GID points to lower level GIDs
      • If saving complete EPC Lists: high costs of IO to retrieve long lists, costly query processing
    • Information Table: (EPC list, attribute 1,...,attribute n)
      • Records path-independent attributes of the items, e.g., color, manufacturer, price..
  • 16. EPC Overview
    • Electronic product code
      • Standard naming scheme, proposed by Auto-Id Center.
      • An EPC uniquely identifies an item.
      • Format: <Header, Manager_No., Object Class, Serial No.>
        • Header: Identifies the length, type, structure, version and generation of EPC.
        • Manager Number: Identifies an organizational entity.
        • Object Class: Identifies a “class”, or type of thing.
        • Serial Number: Specific instance of the Object Class being tagged.
      • We will refer to
        • <Header, Manager No, Object Class>: Prefix
        • <Serial No.>: Suffix
  • 17. Use of Bitmap Datatype
    • Observation: Items move together.
      • Groups of items in the same proximity - e.g. on a shelf, on a shipment
      • Groups of items with same property - e.g. Same product
    • Use a bitmap type for modeling a collection of EPCs that can occur in item tracking applications.
      • Instead of storing a tuple per item store a tuple for all the items having same prefix.
      • New extra fields instead of epc:
        • <Len, Suffix_length, Prefix, suffix_start, Suffix_end, bitmap>
  • 18. Example: Product Inventory
    • With EPC Collections
    • With epc_bitmaps
    … p2 p1 Prod_id … t2 t1 Time … s1 s1 Store_id … epc21, epc22, epc23, … epc11, epc12, epc13, … Item_collection … p2 p1 Prod_id … t2 t1 Time … s1 s1 Store_id … bmap2 bmap1 Item_bmap
  • 19. Use of Bitmap Datatype
    • Header EPC_Manager Object_Class Serial_Number
    • 2-bits 21-bits 17-bits 24-bits
    • 0x 4AA890001F 62C160
    • …………………………
    • 0x 4AA890001F A0B38E
    101001…00010 0xA0B38E 0x62C160 0x4AA890001F 24 64 bitmap Suff_end Suff_start Prefix Suff_len Len
  • 20. Bitmap Operations
    • To use this with such datatype in SQL, we need operations on such bitmaps.
    • Conversion and couting Operations: epc2Bmap, bmap2Epc and bmap2Count
    • Pairwise Logical Operations: bmapAnd, bmapOr, bmapMinus, and bmapXor
    • Maintenance Operations: bmapInsert and bmapDelete
    • Membership Testing Operation: bmapExists
    • Comparison Operation: bmapEqual
  • 21. Use of these operations in SQL
    • Items added to a given shelf between time t1 and t2.
      • SELECT bmap2Epc(bmapMinus(s2.item_bmap, s1.item_bmap)) FROM Shelf_Inventory s1, Shelf_Inventory s2 WHERE s1.shelf_id = <sid1> AND s1.shelf_id = s2.shelf_id AND s1.time = <t1> AND s2.time = <t2>;
    • Book store categorizes books in various categories.
      • Following query determines the shelves where the books with property ’Adventure’ and ’Romance’, are currently present in the store.
      • SELECT s.shelf_id FROM Shelf_Inventory s WHERE bmap2Count(bmapAnd( s.item_bmap, SELECT bmapAnd(p.Adventure, p.Romance) FROM Propery_Inventory p) ) > 0; AND s.time=<current_date>;
  • 22. Road Ahead
    • Extension to bitmap proposal:
      • Bitmap datatype is more appropriate for initial bulk-load & batch updates.
      • It performs badly for incremental updates.
      • A ‘hybrid Scheme’ for incremental Updates:
        • Maintain inventories periodic checkpoints using bitmaps.
        • For changes occurring between checkpoints, Maintain a traditional item-level table.
        • Answer queries by merging the latest checkpoint bitmap with the corresponding duration’s item-level data.
    • The epc_suffix in the collection may not be contiguous
      • The bitmap will be sparse- Lot of zeros.
      • Compress this using some encoding scheme
        • Good for initial bulk loading and batch updates
        • May reduce efficiency of bitmap operations.
  • 23. Open Problems
    • Efficient methods data mining problems
      • Trend analysis
      • Outlier detection
      • Path clustering
    • We will try exploring data mining applications to RFID data.
  • 24. RFID Data Cleaning
  • 25. Issues in Data Cleaning
    • Lack of Completeness
      • RFID readers capture only 60-70% of all tags that are in the vicinity
      • Smoothing of data is done to rectify the loss of intermediate messages
    • Temporal Nature of data or tag dynamics
      • RFID tags are in motion and that is what makes them more difficult to handle
      • But motion of a tag causes dropping of messages
    • RFID data streams are very fast and are huge in number
      • Hence filtering is important before sending them to database
  • 26. Current Strategies
    • Temporal Granule:
      • Based on the fact that tag data do not differ much over a small time period
      • Data can be clubbed on a small time frame
    • Spatial Granule:
      • Similarly, data from physically close readers are also homogeneous
  • 27. Stages of ESP
    • Point: operates over a single value in a sensor stream, filtered by a predicate in the WHERE clause
    • Smooth: granularity defined by applications to correct for missed readings temporally (over one input only); uses aggregate function over the input.
    • Merge: granularity specified by the application to correct for missed readings spatially; grouped by the specified spatial granule.
  • 28. Stages of ESP (contd.)
    • Arbitrate: deals with conflicts between different spatial granules; grouped by spatial granule first and then uses HAVING construct to determine those conflicts
    • Virtualize: used for combining data streams from different sources, could also be different devices; join construct is used to combine the different data streams and then filtered using some predicate
  • 29. Smooth stage
    • False Positives: (erroneous readings) reporting objects that are not actually present
    • False Negatives: (missed readings) not reporting objects that actually are present
    False positives and False Negatives [Jeff06]
  • 30. Tag List
    • The reader has an internal table called the Tag List .
    • An epoch is the smallest unit of interaction between the reader and the middleware.
    • Every epoch consists of certain number of Interrogation cycles
    • Interrogation Cycle is one run of the reader protocol to determine all tags
    • At every epoch the reader sends the tag list to the middleware.
    t2 1 12347890 t1 6 12341234 Timestamp Responses Tag ID
  • 31. SMURF – Per tag Cleaning
    • SMURF uses statistical methods to reduce the false negative and false positives happening in the RFID stream.
    • The goal here is two fold: one is to determine the statistical window size, and secondly, ensuring that the transition of the tags is determined.
    • To determine the window size we need to fit a probability distribution to the sample size
    • And to determine the transition of the tag out of the reader's vicinity, we define a 98% confidence interval within that probability distribution function on the sample size |S i | .
  • 32. SMURF – Per tag Cleaning (contd.)
    • Using the tag list, per-epoch sampling probability, p i,t is determined, p i,t = number of times tag was read in a epoch / interrogation cycles per epoch
    • We average this over the sample size |S i | to get the average read rate ( p i avg ) for a tag i .
    • If same probability of p i is assumed for each epoch throughout the window then each successful observation is like a Bernoulli trail.
  • 33. SMURF – Per tag Cleaning (contd.)
    • So, |S i | is the binomial random variable for a sample S i with mean = w i . p i avg and variance = w i . p i avg . (1-p i avg )
    • Now using this we can express the window size as a limit,
    • If the current window size is less than the calculated one then the window size is adjusted accordingly.
    • Similarly using the Central limit theorem for transition detection we get ||S i | - μ | > 2 σ
  • 34. Normal Sliding window….
    • Epoch based mid-point sliding window
    • Emits a reading with an epoch value corresponding to the middle of the window
  • 35. Ensuring Completeness
    • In the first window, p i avg demands a larger window
    • Thus window size is increased
  • 36. Transition Detection
    • In the first window the number of readings decreases significantly (and statistically)
    • Thus a transition is likely to have occurred; so window is halved
  • 37. SMURF – Multi-tag aggregate Cleaning
    • Similar to per-tag cleaning, the window for multi-tag cleaning is determined by: Here, p avg is the average per-epoch sampling probability over all observed tags.
    • To detect the transition in population count, we estimate the population count of two windows [ t – w i , t ] and [ t – w i /2 , t ]; with true populations: N w & N w’
    • Thus, for a transition to have happened, we need the difference between the two estimates to be within the limit: 2( σ w + σ w’ )
  • 38. SMURF – Multi-tag aggregate Cleaning
    • To calculate the estimate of population count, we use π -estimators; The estimated population count is given by:
    • Similarly by π -estimators, and assuming independence across different tags, the variance of the estimate is estimated as:
    • Here π i is probability of reading the tag i at least once during the whole window, given by 1 – (1 – p i avg ) w
  • 39. The Road ahead…
    • Applications in RFID do not accept any delays in the data delivery
    • Data is either present in the cache or the database; data in the database increases processing time and data in cache does not understand SQL like queries
    • Anomaly detection in object tracking is also an important part of object tracking
    • Issues like untraceability, forward security, and database desynchronization are still not completely resolved.
    • One more serious problem with RFID is counterfeiting
    • In the next stage we expect to look into some of these issues
  • 40.
    • ????
  • 41. Thank You.
  • 42. References
    • Xiaolei Li, Hector Gonzalez, Jiawei Han and Diego Klabjan. Warehousing and analyzing massive RFID data sets. ICDE, 2006.
    • Fusheng Wang and Peiya Liu. Temporal management of RFID data. VLDB , 2005.
    • Timothy Chorma, Ying Hu, Seema Sundara and Jagannathan Srinivasan. Supporting RFID-based item tracking applications in oracle DBMS using a bitmap datatype. VLDB , 2005.
  • 43. References
    • Minos Garofalakis, Shawn R. Jeffery and Michael J. Franklin. Adaptive cleaning for RFID data streams. VLDB , 2006.
    • J. Franklin, Wei Hong, Shawn R. Jeffery, Gustavo Alonso and Jennifer Widom. Declarative support for sensor data cleaning. In Pervasive , 2006.
    • Sridhar Ramachandran Sudarshan S. Chawathe, Venkat Krishnamurthy and Sanjay E. Sarma. Managing RFID data. VLDB , 2004.