Published on

Published in: Business, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. RFID Data Management Kamlesh Laddhad (05329014) Karthik B.(05329021) Guide: Prof. Bernard Menezes
  2. 2. Outline • Introduction to RFID Technology. • Issues with RFID Technology. • RFID Data Characteristics. • Data Warehousing. – Expressive Temporal Model: Dynamic Relationship ER Model – RFID - Cuboids. – Use of Bitmap Datatype. • Data Cleaning. – Extensible Sensor stream Processing (ESP) – Statistical sMoothing for Unreliable RFid data.(SMURF) • Future Plans.
  3. 3. Introduction • Radio Frequency Identification: – It is an Automatic Identification and Data Capture Technology. – Fast – No contact or line of sight. – Uses radio-frequency waves to transfer data • Components – Tag: small, low-cost device that can hold a limited amount of data. • Associated with objects, such as pallets, cases, and even individual items. – Reader: Recognize presence of tag and read info stored on it. • Unique electronic product code (EPC) associated with a tag. • By placing RFID tag readers at various locations, one can track the movement of objects through supply chain networks.
  4. 4. Applications and Adoptions • Supply Chain Management: real-time inventory tracking. – US Department Of Defense: shipments to armed forces • Retail: Active shelves monitor product availability – Wal-Mart, Albertson: Major Retails stores • Access control: toll collection, transportation. – Airline luggage management: • British airways:20 million bags a year • Implemented to reduce lost/misplaced luggage • Anti-counterfeiting and security: – Food and Drug Administration: To reduce counterfeit in pharmaceutical supply chain
  5. 5. Prospective for RFID research • The physics of building tags and readers – Tags have few gates: Apart from basic operation, very less computing power. – Radio-frequency has some issues with operating in certain physical mediums. • The privacy and safety issues: – Complex encryption schemes are not possible on RFID tags. – Counterfeiting by means of either illegitimate readers or spoofed tags are possible – Reader-tag communication is wireless: Third parties can eavesdrop on signals. • Software Architecture to collect, filter, organize, and answer online queries: – No. of tags are proportional to No of items being serviced/tracked. – No. of readers are proportional to traceable strategic locations/areas • Each Reader picks up tag signals on continuous basis. • Data generated by RFID systems is enormous: • E.g. Wal-Mart is expected to generate 7 terabytes of RFID data per day. • Our Focus: Third Stream.
  6. 6. Data Warehousing Techniques
  7. 7. Data Management Challenges • Data Explosion : Example – A retailer with 3,000 stores, selling 10,000 items a day per store. – Each item moves 10 times on average before being sold • Movement recorded as (EPC, location, second) – Data volume: 300 million tuples per day. – Example OLAP Query: “Average time for items to move from warehouse to checkout counter in March 2006?”. • Costly to answer if there are a billion tuples for March 2006.
  8. 8. Data Characteristics • Temporal and history oriented – Applications dynamically generate observations (readings). – Objects location and containment relationship among objects changes – Need: Expressive data model. • Inaccurate data and implicit semantics – False positive: Non-existing tag incorrectly read. – False Negative: Reader missed a tag which was in its vicinity. – Noisy data & duplicate readings (redundancy): Same tag read more than once. – Need: Automated data filtering and transformation. • Streaming and large volume – Object stay in place for longer duration: Readers records them periodically. Large data keeps generating. – We need to preserve this data for tracking and monitoring. – Need: Scalable storage scheme, compression techniques to reduce data. • Data Granularity – Data collection granularity needs to be decided – Differs across applications.
  9. 9. Warehousing Helps!! • Lossless compression – Remove redundancy: (r1,l1,t1) (r1,l1,t2) ... (r1,l1,t10) => (r1,l1,t1,t10) – Group objects that move and stay together. • Data cleaning: Multi-reading, missed-reading, error-reading, bulky movement. • Data mining: Find trends, outliers, frequent, sequential, flow patterns. • Multi-dimensional summary: product, location, time, … – Store manager: Check item movements from the backroom to different shelves in his store – Region manager: Collapse intra-store movements and look at distribution centers, warehouses, and stores • Query Processing – Support for OLAP: roll-up, drill-down, slice, and dice – Path query: New to RFID-Warehouses, about the structure of paths • What products that go through quality control have shorter paths? • What locations are common to the paths of a set of defective auto-parts? • Identify containers at a port that have deviated from their historic paths
  10. 10. Dynamic Relationship ER Model • Proposed by Wang and Liu from Siemens. • RFID entities are static and are not altered. • RFID relationships: dynamic and change all the time. • Two types of dynamic relationships added: – Event-based dynamic relationship. A timestamp attribute added to represent the occurring timestamp of the event. – State-based dynamic relationship. tstart and tend attributes added to represent the lifespan of a state.
  11. 11. • Static entity table – OBJECT (object_epc, name, description) – LOCATION (location_id, name, owner) • Dynamic relationship tables – OBSERVATION(sensor_epc, value, timestamp) – OBJECTLOCATION(epc, location_id, tstart, tend) – TRANSACTIONITEM(transaction_id, epc, timestamp) – SENSOR (sensor_epc, name, description) – TRANSACTION (transaction_id, transaction_type) – CONTAINMENT(epc, parent_epc, tstart, tend) – SENSORLOCATION(sensor epc, location id,position, tstart, tend)
  12. 12. Monitoring. • Missing RFID Object Detection: – Find when and where object holding EPC= `MEPC’ was lost. • select location_id, tstart, tend from objectlocaiton where epc='MEPC' and tstart = ( select max(o.tstart) from objectlocation o where o.epc='MEPC' ) – Check if there are missing objects at current location C, knowing that all objects were complete at previous location L at time T. • select l.epc from objectlocation l where l.location_id = 'L' and l.tstart <= 'T' and l.tend >= 'T' and l.epc not in ( select c.epc from objectlocation c where c.location_id = 'C' )
  13. 13. Tracking • RFID Object Moving Time Inquiry: – Time it takes to supply ‘OEPC’ from location S to location E? • select (e.tstart-s.tstart) as supplying_time from objectlocation e, objectlocation s where e.epc = 'OEPC' and s.epc='OEPC' and s.location_id ='S' and e.locaiton_id='E'
  14. 14. Compression Idea • Bulky object movements – Objects often move and stay together through the supply chain. – If 1000 packs of product P stay together at the distribution center, register a single record. – (GID, distribution center, time_in, time_out). – GID is a generalized identifier that represents the 1000 packs that stayed together at the distribution center • Analysis usually takes place at a much higher level of abstraction than the one present in raw RFID data Factory Dist. Center 1 Dist. Center2 … 10 pallets (1000 cases) store 1 store 2 … 20 cases (1000 packs) shelf 1 shelf 2 … 10 packs (12 sodas)
  15. 15. RFID Cuboids • Fact Table: (EPC, location, time_in, time_out). • In supply chain: Items travel through a series of locations. • Query: what is the average time that product P stays at store in Location A? • Traditional cubes miss the path structure of the data • Stay Table: (GIDs, location, time_in, time_out: measures): – Records information on items that stay together at a given location – If using record transitions: difficult to answer queries, lots of intersections needed • Map Table: (GID, <GID1,..,GIDn>) – Links together stages that belong to the same path. Provides additional: compression and query processing efficiency – High level GID points to lower level GIDs – If saving complete EPC Lists: high costs of IO to retrieve long lists, costly query processing • Information Table: (EPC list, attribute 1,...,attribute n) – Records path-independent attributes of the items, e.g., color, manufacturer, price..
  16. 16. EPC Overview • Electronic product code – Standard naming scheme, proposed by Auto-Id Center. – An EPC uniquely identifies an item. – Format: <Header, Manager_No., Object Class, Serial No.> • Header: Identifies the length, type, structure, version and generation of EPC. • Manager Number: Identifies an organizational entity. • Object Class: Identifies a “class”, or type of thing. • Serial Number: Specific instance of the Object Class being tagged. – We will refer to • <Header, Manager No, Object Class>: Prefix • <Serial No.>: Suffix
  17. 17. Use of Bitmap Datatype • Observation: Items move together. – Groups of items in the same proximity - e.g. on a shelf, on a shipment – Groups of items with same property - e.g. Same product • Use a bitmap type for modeling a collection of EPCs that can occur in item tracking applications. – Instead of storing a tuple per item store a tuple for all the items having same prefix. – New extra fields instead of epc: • <Len, Suffix_length, Prefix, suffix_start, Suffix_end, bitmap>
  18. 18. Example: Product Inventory • With EPC Collections • With epc_bitmaps Store_id Prod_id Time Item_collection s1 p1 t1 epc11, epc12, epc13, … s1 p2 t2 epc21, epc22, epc23, … … … … … Store_id Prod_id Time Item_bmap s1 p1 t1 bmap1 s1 p2 t2 bmap2 … … … …
  19. 19. Use of Bitmap Datatype Header EPC_Manager Object_Class Serial_Number 2-bits 21-bits 17-bits 24-bits 0x4AA890001F62C160 ………………………… 0x4AA890001FA0B38E Len Suff_len Prefix Suff_start Suff_end bitmap 64 24 0x4AA890001F 0x62C160 0xA0B38E 101001…00010
  20. 20. Bitmap Operations • To use this with such datatype in SQL, we need operations on such bitmaps. • Conversion and couting Operations: epc2Bmap, bmap2Epc and bmap2Count • Pairwise Logical Operations: bmapAnd, bmapOr, bmapMinus, and bmapXor • Maintenance Operations: bmapInsert and bmapDelete • Membership Testing Operation: bmapExists • Comparison Operation: bmapEqual
  21. 21. Use of these operations in SQL • Items added to a given shelf between time t1 and t2. – SELECT bmap2Epc(bmapMinus(s2.item_bmap, s1.item_bmap)) FROM Shelf_Inventory s1, Shelf_Inventory s2 WHERE s1.shelf_id = <sid1> AND s1.shelf_id = s2.shelf_id AND s1.time = <t1> AND s2.time = <t2>; • Book store categorizes books in various categories. – Following query determines the shelves where the books with property ’Adventure’ and ’Romance’, are currently present in the store. – SELECT s.shelf_id FROM Shelf_Inventory s WHERE bmap2Count(bmapAnd( s.item_bmap, SELECT bmapAnd(p.Adventure, p.Romance) FROM Propery_Inventory p) ) > 0; AND s.time=<current_date>;
  22. 22. Road Ahead • Extension to bitmap proposal: – Bitmap datatype is more appropriate for initial bulk-load & batch updates. – It performs badly for incremental updates. – A ‘hybrid Scheme’ for incremental Updates: • Maintain inventories periodic checkpoints using bitmaps. • For changes occurring between checkpoints, Maintain a traditional item-level table. • Answer queries by merging the latest checkpoint bitmap with the corresponding duration’s item-level data. • The epc_suffix in the collection may not be contiguous – The bitmap will be sparse- Lot of zeros. – Compress this using some encoding scheme • Good for initial bulk loading and batch updates • May reduce efficiency of bitmap operations.
  23. 23. Open Problems • Efficient methods data mining problems – Trend analysis – Outlier detection – Path clustering • We will try exploring data mining applications to RFID data.
  24. 24. RFID Data Cleaning
  25. 25. Issues in Data Cleaning • Lack of Completeness – RFID readers capture only 60-70% of all tags that are in the vicinity – Smoothing of data is done to rectify the loss of intermediate messages • Temporal Nature of data or tag dynamics – RFID tags are in motion and that is what makes them more difficult to handle – But motion of a tag causes dropping of messages • RFID data streams are very fast and are huge in number – Hence filtering is important before sending them to database
  26. 26. Current Strategies • Temporal Granule: – Based on the fact that tag data do not differ much over a small time period – Data can be clubbed on a small time frame • Spatial Granule: – Similarly, data from physically close readers are also homogeneous
  27. 27. Stages of ESP • Point: operates over a single value in a sensor stream, filtered by a predicate in the WHERE clause • Smooth: granularity defined by applications to correct for missed readings temporally (over one input only); uses aggregate function over the input. • Merge: granularity specified by the application to correct for missed readings spatially; grouped by the specified spatial granule.
  28. 28. Stages of ESP (contd.) • Arbitrate: deals with conflicts between different spatial granules; grouped by spatial granule first and then uses HAVING construct to determine those conflicts • Virtualize: used for combining data streams from different sources, could also be different devices; join construct is used to combine the different data streams and then filtered using some predicate
  29. 29. Smooth stage • False Positives: (erroneous readings) reporting objects that are not actually present • False Negatives: (missed readings) not reporting objects that actually are present False positives and False Negatives [Jeff06]
  30. 30. Tag List • The reader has an internal table called the Tag List. • An epoch is the smallest unit of interaction between the reader and the middleware. • Every epoch consists of certain number of Interrogation cycles • Interrogation Cycle is one run of the reader protocol to determine all tags • At every epoch the reader sends the tag list to the middleware. Tag ID Responses Timestamp 12341234 6 t1 12347890 1 t2
  31. 31. SMURF – Per tag Cleaning • SMURF uses statistical methods to reduce the false negative and false positives happening in the RFID stream. • The goal here is two fold: one is to determine the statistical window size, and secondly, ensuring that the transition of the tags is determined. • To determine the window size we need to fit a probability distribution to the sample size • And to determine the transition of the tag out of the reader's vicinity, we define a 98% confidence interval within that probability distribution function on the sample size |Si|.
  32. 32. SMURF – Per tag Cleaning (contd.) • Using the tag list, per-epoch sampling probability, pi,t is determined, pi,t = number of times tag was read in a epoch / interrogation cycles per epoch • We average this over the sample size |Si| to get the average read rate (pi avg) for a tag i. • If same probability of pi is assumed for each epoch throughout the window then each successful observation is like a Bernoulli trail.
  33. 33. SMURF – Per tag Cleaning (contd.) • So, |Si| is the binomial random variable for a sample Si with mean = wi. pi avg and variance = wi. pi avg. (1-pi avg) • Now using this we can express the window size as a limit, • If the current window size is less than the calculated one then the window size is adjusted accordingly. • Similarly using the Central limit theorem for transition detection we get ||Si| - μ| > 2 σ
  34. 34. Normal Sliding window…. • Epoch based mid-point sliding window • Emits a reading with an epoch value corresponding to the middle of the window
  35. 35. Ensuring Completeness • In the first window, pi avg demands a larger window • Thus window size is increased
  36. 36. Transition Detection • In the first window the number of readings decreases significantly (and statistically) • Thus a transition is likely to have occurred; so window is halved [Fraklin06]
  37. 37. SMURF – Multi-tag aggregate Cleaning • Similar to per-tag cleaning, the window for multi-tag cleaning is determined by: Here, pavg is the average per-epoch sampling probability over all observed tags. • To detect the transition in population count, we estimate the population count of two windows [t – wi, t] and [t – wi/2, t]; with true populations: Nw & Nw’ • Thus, for a transition to have happened, we need the difference between the two estimates to be within the limit: 2(σw + σw’)
  38. 38. SMURF – Multi-tag aggregate Cleaning • To calculate the estimate of population count, we use π-estimators; The estimated population count is given by: • Similarly by π-estimators, and assuming independence across different tags, the variance of the estimate is estimated as: • Here πi is probability of reading the tag i at least once during the whole window, given by 1 – (1 – pi avg)w
  39. 39. The Road ahead… • Applications in RFID do not accept any delays in the data delivery • Data is either present in the cache or the database; data in the database increases processing time and data in cache does not understand SQL like queries • Anomaly detection in object tracking is also an important part of object tracking • Issues like untraceability, forward security, and database desynchronization are still not completely resolved. • One more serious problem with RFID is counterfeiting • In the next stage we expect to look into some of these issues
  40. 40. ????
  41. 41. Thank You.
  42. 42. References • Xiaolei Li, Hector Gonzalez, Jiawei Han and Diego Klabjan. Warehousing and analyzing massive RFID data sets. ICDE, 2006. • Fusheng Wang and Peiya Liu. Temporal management of RFID data. VLDB, 2005. • Timothy Chorma, Ying Hu, Seema Sundara and Jagannathan Srinivasan. Supporting RFID-based item tracking applications in oracle DBMS using a bitmap datatype. VLDB, 2005.
  43. 43. References • Minos Garofalakis, Shawn R. Jeffery and Michael J. Franklin. Adaptive cleaning for RFID data streams. VLDB, 2006. • J. Franklin, Wei Hong, Shawn R. Jeffery, Gustavo Alonso and Jennifer Widom. Declarative support for sensor data cleaning. In Pervasive, 2006. • Sridhar Ramachandran Sudarshan S. Chawathe, Venkat Krishnamurthy and Sanjay E. Sarma. Managing RFID data. VLDB, 2004.