slides
Upcoming SlideShare
Loading in...5
×
 

slides

on

  • 632 views

 

Statistics

Views

Total Views
632
Views on SlideShare
632
Embed Views
0

Actions

Likes
0
Downloads
18
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • There are many problems ranging from: Scheduling algorithms for putting sensors in sleep mode, to efficient mac prototocols, security protocols, operating systems, programming languages, databases, etc
  • A Circuit board equipped with a low frequency processor, main memory, microphone, buzzer, on-board and off-board flash memory
  • Seismic and Structural Monitoring: Harvard conducted studies at various volcanoes in Ecuador Princeton in Kenya, Berkeley on the Great Duck Island and the Golden Gate Bridge, etc.
  • Bird Nests
  • Bird Nests
  • There are many problems ranging from: Scheduling algorithms for putting sensors in sleep mode, to efficient mac prototocols, security protocols, operating systems, programming languages, databases, etc
  • 2KM at 130dB, low frequency, 155Kbps
  • Advancements in networks and multimedia
  • Advancements in networks and multimedia
  • Advancements in networks and multimedia
  • Advancements in networks and multimedia
  • Declarative: if it describes what something is like, rather than how to create it. … versus imperative languages where we state what we want and how we want it.
  • Advancements in networks and multimedia
  • NAND, NOR : logic gate that holds a single bit Seek Time (rotational latency): It is the amount of time between when the CPU requests a file and when the first byte of the file is sent to CPU. Times between 10 to 20 milliseconds are common. Magnetic Disks, Spindle RPM Average latency (ms) 4200:7.14ms, 5400:5.55ms, 7200:4.17ms, 10000:3ms, 15000 2ms.
  • !
  • All these in constant or logarithmic time
  • C: Defines page that has to be split ,S: defines page that has to be moved to flash media.
  • ScaleSearch: Given than we have 1500 pages (timestamps in range: 0..1000) and we look for tq=500 instead of going to page e.g. 300, go

slides slides Presentation Transcript

  • Indexing and Searching in Wireless Sensor Networks Demetris Zeinalipour [ zeinalipour@ouc.ac.cy ] School of Pure and Applied Sciences Open University of Cyprus http://is.ouc.ac.cy/~zeinalipour/ HPCL, University of Cyprus, February 14 th , 2008
  • Presentation Goals
    • To provide an overview of recent developments in Wireless Sensor Network Technology
    • To highlight some important indexing and searching challenges that arise in this context
    • This is a joint work with my collaborators at the University of California – Riverside.
    • Our results were presented in the following papers:
      • "MicroHash: An Efficient Index Structure for Flash-Based Sensor Devices",
      • D. Zeinalipour-Yazti, S. Lin, V. Kalogeraki, D. Gunopulos and W. Najjar, The 4th USENIX Conference on File and Storage Technologies (FAST’05), San Francisco, USA, December, 2005.
      • " Efficient Indexing Data Structures for Flash-Based Sensor Devices ",
      • S. Lin, D. Zeinalipour-Yazti, V. Kalogeraki, D. Gunopulos, W. Najjar, ACM Transactions on Storage (TOS), ACM Press, Vol.2, No. 4, pp. 468-503, November 2006.
    Acknowledgements View slide
    • Capsule: an energy-optimized object storage system for memory-constrained sensor devices, ACM Sensys'06 ( UMass, Amherst )
    • Ultra-low power data storage for sensor networks, IEEE IPSN'06 ( UMass, Amherst )
    • TINX: a tiny index design for flash memory on wireless sensor devices, ACM Sensys'06 ( Stanford )
    • Re-thinking data management for storage-centric sensor networks, IEEE CIDR'07 (UMass, Amherst )
    • FlashDB: dynamic self-tuning database for NAND flash, IEEE IPSN'07, (Microsoft Research, Redmond )
    • MStore: Enabling Storage-Centric Sensornet Research, IEEE IPSN'07 (Stanford)
    • EnviroMic: Towards Cooperative Storage and Retrieval in Audio Sensor Networks, IEEE ICDCS'07 ( Univ. of Illinois at Urbana-Champaign )
    • DALi: A Communication-Centric Data Abstraction Layer for Energy-Constrained Devices in Mobile Sensor Networks, ACM Mobisys'07 (Princeton University)
    Main Citations View slide
  • Presentation Outline
    • Overview of Wireless Sensor Networks
    • Overview of Data Acquisition Models
    • The MicroHash Index Structure.
    • MicroHash Experimental Evaluation
    • Conclusions and Future Work
  • Wireless Sensor Networks
    • Resource constrained devices utilized for monitoring and studying the physical world at a high fidelity.
  • Wireless Sensor Device
  • Wireless Sensor Network
  • Wireless Sensor Networks
    • Applications have already emerged in:
      • Environmental and habitant monitoring
      • Seismic and Structural monitoring
      • Understanding Animal Migrations & Interactions between species
      • Automation, Tracking, Hazard Monitoring Scenarios, Urban Monitoring etc
    Great Duck Island – Maine (Temperature, Humidity etc). Golden Gate – SF, Vibration and Displacement of the bridge structure Zebranet (Kenya) GPS trajectory
  • Wireless Sensor Networks The Great Duck Island Study (Maine, USA)
    • Large-Scale deployment by Intel Research, Berkeley in 2002-2003 (Maine USA).
    • Focuses on monitoring microclimate in and around the nests of endangered species which are sensitive to disturbance .
    • They deployed more than 166 motes installed in remote locations (such as 1000 feets in the forest)
  • Wireless Sensor Networks WebServer
  • Wireless Sensor Networks The James Reserve Project, CA, USA Available at: http://dms.jamesreserve.edu/
  • The Anatomy of a Sensor Device
    • Processor, in various (sleep, idle, active) modes
    • Power source AA or Coin batteries, Solar Panels
    • SRAM used for the program code and for in-memory buffering.
    • LEDs used for debugging
    • Radio, used for transmitting the acquired data to some storage site (SINK) (9.6Kbps-250Kbps)
    • Sensors: Numeric readings in a limited range (e.g. temperature -40F..+250F with one decimal point precision) at a high frequency (2-2000Hz)
    Storage
  • Sensor Devices & Capabilities Sensing Capabilities
    • Light
    • Temperature
    • Humidity
    • Pressure
    • Detect Tone
    • Wind Speed
    • Soil Moisture
    • Location (GPS)
    • etc…
    Xbow’s i-mote2 UC-Riverside RISE Xbow’s Telos UC-Berkeley mica2dot Xbow’s Mica
  • Characteristics
    • The Energy Source is limited.
    • Energy source: AA batteries, Solar Panels
    • 2. Local Processing is cheaper than transmitting over the radio .
    • Transmitting 1 Byte over the Radio consumes as much energy as ~1200 CPU instructions.
    • 3. Local Storage is cheaper than transmitting over the radio .
    • Transmitting 512B over a single-hop 9.6Kbps (915MHz) radio requires 82,000 μ J, while writing to local flash only 760 μ J.
  • The Programming Cycle
    • The Operating System
    • TinyOS (UC-Berkeley): Component-based architecture that allows programmers to wire together the minimum required components in order to minimize code size and energy consumption
    • (The operating system is really a number of libraries that can be statically linked to the sensor binary at compile time)
    • The Programming Language
    • nesC (Intel Research, Berkeley): an event-based C-variant optimized for programming sensor devices
    event result_t Clock.fire() { state = !state; if (state) call Leds.redOn(); else call Leds.redOff(); } “ Hello World”: Blinking the red LED!
  • The Programming Cycle
    • The Testing Environment
    • Debugging code directly on a sensor device is a tedious procedure
    • nesC allows programmers to compile their code to
        • A Binary File that is installed on the sensor
        • A Binary File that runs on a PC
    • TOSSIM (TinyOS Simulation) is the environment which allows programmers to simulate the mote binary directly on a PC.
    • This enables accurate simulations, fine grained energy
    • modeling (with PowerTOSSIM ) and visualization ( TinyViz )
  • The Programming Cycle
    • The Pre-deployment Environment
    • Once you have created and debugged you code you can perform a deployment in a laboratory environment.
    • Harvard’s MoteLab uses 190 sensors, powered from wall power interconnected with an Ethernet connection.
    • The Ethernet is just for debugging and reprogramming, while the Radio for actual communication between motes
    • Motes can be reprogrammed through a web interface.
    Available at: http://motelab.eecs.harvard.edu/
  • Presentation Outline
    • Overview of Wireless Sensor Networks (WSN)
    • Overview of Data Acquisition Models
    • The MicroHash Index Structure
    • MicroHash Experimental Evaluation
    • Conclusions and Future Work
  • The Centralized Storage Model
    • A Database that collects readings from many Sensors.
    • Centralized: Storage, Indexing, Query Processing, Triggers, etc.
  • Centralized Storage I Available at: http://www.xbow.com/
    • Crossbow’s MoteView software
    • No in-network Aggregation
    • No in-Network Filtering
  • Centralized Storage II Available at: http:// telegraph.cs.berkeley.edu/tinydb /
    • TinyDB - A Declarative Interface for Data Acquisition in Sensor Networks .
    • In-Network Aggregation
    • In-Network Filtering (i.e., WHERE clause)
    e.g., SELECT MAX(temp) FROM sensors
  • Centralized Storage: Conclusions
    • Frameworks such as TinyDB:
      • - Are suitable for continuous queries.
      • - Push aggregation in the network but keep much of the processing at the sink.
    • New Challenges:
      • Many applications do not require the query to be evaluated continuously (e.g., Average temperature in the last 6 months?)
      • In many applications there is no sink (e.g., remote deployments and mobile sensor nets)
      • - Local Storage on sensors keeps increasing (e.g., RISE and more recently imote2)
  • Our Model: In-Situ Data Storage
    • The data remains In-situ (at the generating site) in a sliding window fashion.
    • When required, users conduct on-demand queries to retrieve information of interest.
    A Network of Sensor Databases
    • Soil-Organism Monitoring
    • (Center for Conservation Biology, UCR)
      • A set of sensors monitor the CO 2 levels in the soil over a large window of time.
      • Not a real-time application.
      • Many values may not be very interesting.
    In-Situ Data Storage: Motivation
      • D. Zeinalipour-Yazti, S. Neema, D. Gunopulos, V. Kalogeraki and W. Najjar,
      • "Data Acquision in Sensor Networks with Large Memories", IEEE Intl. Workshop on Networking Meets Databases NetDB ( ICDE'2005 ), Tokyo, Japan, 2005.
  • Presentation Outline
    • Overview of Wireless Sensor Networks
    • Overview of Data Acquisition Models
    • The MicroHash Index Structure
    • MicroHash Experimental Evaluation
    • Conclusions and Future Work
  • Flash Memory at a Glance
    • The most prevalent storage medium used for Sensor Devices is Flash Memory (NAND Flash)
    • Fastest growing memory market (‘05 $8.7B, ‘06:$11B)
    • (NAND) Flash Advantages
    • Simple Cell Architecture (high capacity in a small surface)
    • Fast Random Access (50-80 μ s) compared to 10-20ms in Disks
    • Economical Reproduction
    • Shock Resistant
    • Power Efficient
    Surface mount NAND flash Removable NAND Devices
  • Flash Memory at a Glance
    • Write-Constraint: Writing can only be performed at a page granularity (256B~512B), after the respective page (and its respective 8KB~64KB block!) has been deleted
    • Delete-Constraint: Erasure of a page can only be performed at a block granularity (i.e. 8KB~64KB)
    • Wear-Constraint: Each page can only be written a limited number of times (typically 10,000-100,000)
    Asymmetric Read/Write Energy Cost : Measurements using RISE Energy (Page Size = 512 B) Read = 24 μ J Write =763 μ J Block Erase = 425 μ J
  • MicroHash Objectives
    • General Objectives
    • Provide efficient access to any record stored on flash by timestamp or value
    • Execute a wide spectrum of queries based on our index, similarly to generic DB indexes.
    • Design Objectives:
    • Avoid wearing out specific pages.
    • Minimize random access deletions of pages.
    • Minimize SRAM structures
      • SRAM is extremely limited (8KB-64KB).
      • Small memory-footprint => quick initialization.
  • Main Structures
    • 4 Page Types: a) Root Page, b) Directory Page, c) Index Page and d) Data Page
    • 4 Phases of Operation: a) Initialization, b) Growing, c) Repartition and d) Garbage Collect.
  • Growing the MicroHash Index
    • Collect data in an SRAM buffer page P write
    • When P write is full flush it out to flash media
    • Next create index records for each data record in P write
    • If SRAM gets full, Index pages are forced out to flash media by an LRU policy .
    (ts, 74F) Index Pages Buffer P write Buffer P write 60 80 x 70 50 90 40 Directory Index
  • Growing the MicroHash Index Flash Media A populated Flash Media idx: next empty page
  • Garbage Collection in MicroHash
    • When the media gets full some pages need to be deleted => delete the oldest pages.
    • Oldest Block? The next block following the idx pointer.
    • Note:
    • This might create invalid index records.
    • This will be handled by our search algorithm
  • Directory Repartition in MicroHash
    • MicroHash starts out with a directory that is segmented into equiwidth buckets
      • e.g., divide the temperature range [-40,250] into c buckets)
    • Not efficient as certain buckets will not be utilized
      • Consider the first few or last few buckets below.
  • Directory Repartition in MicroHash
    • If bucket A links to more than τ index pages, evict the least used bucket B and segment bucket A into A and A’
    • We want to avoid bucket reassignments of old records as this would be very expensive
    Example: τ=2 C: #entries since last split S: timestamp of last addition
  • Searching in MicroHash
    • Searching by value
    • “ Find the timestamp (s) on which the temperature was 100F”
      • Simple operation in MicroHash
      • We simply find the right Directory Bucket, from there the respective index page and then data record (page-by-page)
    • Searching by timestamp
    • “ Find the temperature of some sensor on a given timestamp tq ”
      • Problem: Index pages are mixed together with data pages.
      • Solutions:
        • 1. Binary Search (O(logn), 18 pages for a 128MB media)
        • 2. LBSearch (less than 10 pages for a 128MB media)
        • 3. ScaleSearch (better than LBSearch, ~4.5 pages for a 128MB media)
  • LBSearch and ScaleSearch
    • Solutions to the Search-by-timestamp problem:
    • LBSearch: We recursively create a lower bound on the position of tq until the given timestamp is located.
    • ScaleSearch: Quite similar to LBSearch, however in the first step we proceed more aggressively (by exploiting data distribution)
    Query tq=500 tq=300 tq=350 tq=420 tq=490 tq=500
  • Presentation Outline
    • Overview of Wireless Sensor Networks
    • Overview of Data Acquisition Models
    • The MicroHash Index Structure
    • MicroHash Experimental Evaluation
    • Conclusions and Future Work
  • Experimental Evaluation
    • Implemented MicroHash in nesC.
    • We tested it using TinyOS along with a trace-driven experimental methodology.
    • Datasets :
      • Washington State Climate
        • 268MB dataset contains readings in 2000-2005.
      • Great Duck Island
        • 97,000 readings between October and November 2002.
    • Evaluation Parameters : i) Space Overhead, ii) Energy Overhead, iii) Search Performance
  • Space Overhead of Index
    • Index page overhead Φ = IndexPages/(DataPages+IndexPages)
    • Two Index page layouts
      • Offset , an index record has the following form {datapageid,offset}
      • NoOffset , in which an index record has the form {datapageid}
    • 128 MB flash media (256,000 pages)
  • Space Overhead of Index Black denotes the index pages Increasing the Buffer Decreases the Index Overhead
  • Search Performance
    • 128 MB flash media (256,000 pages), varied SRAM (buffer) size
    • 2 Index page layouts
      • Anchor: Index Pages store the last known timestamp
      • No Anchor: Timestamp is only stored in Data Pages
  • Indexing the Great Duck Island Trace
    • Used 3KB index buffer and a 4MB flash card to store all the 97,000 20-byte data readings.
      • The index pages never require more than 28% additional space
      • Indexing the records has only a small increase in energy demand: the energy cost of storing the records on flash without an index is 3042mJ
      • We were able to find any record by its timestamp with 4.75 page reads on average
  • Presentation Outline
    • Overview of Wireless Sensor Networks
    • Overview of Data Acquisition Models
    • The MicroHash Index Structure
    • MicroHash Experimental Evaluation
    • Conclusions and Future Work
  • Conclusions & Future Work
    • We proposed the MicroHash index , which is an efficient external memory hash index that addresses the distinct characteristics of flash memory
    • Our experimental evaluation shows that the structure we propose is both efficient and practical
    • Future work:
      • Develop a complete library of indexes and data structures (stacks, queues, b+trees, etc.)
      • Buffer optimizations and Online Compression
      • Support Range Queries
  • Indexing and Searching in Wireless Sensor Networks
    • Demetris Zeinalipour
    • Thank you!
    • Questions?
    • Related Publications
    • "MicroHash: An Efficient Index Structure for Flash-Based Sensor Devices", D. Zeinalipour,S. Lin, V. Kalogeraki, D. Gunopulos, W. Najjar, In USENIX FAST’05.
    • " Efficient Indexing Data Structures for Flash-Based Sensor Devices ",
    • ACM Transactions on Storage (TOS), November 2006.
    Presentation and publications available at: http://is.ouc.ac.cy/~zeinalipour/
  • Page Types in MicroHash
    • Root Page
      • contains information related to the state of the flash media, e.g. it contains the position of the last write (idx), the current write cycle (cycle) and meta information about the various indexes stored on the flash media
    • Directory Page (the hash table)
      • contains a number of directory records (buckets) each of which contains the address of the last known index page mapped to this bucket.
    • Index Page
      • contains a fixed number of index records and the 8 byte timestamp of the last known data record. The latter field, denoted as anchor is exploited by timestamp searches.
    • Data Page
      • contains a fixed number of data records
  • Searching Bottlenecks
    • Index Pages written on flash might not be fully occupied
    • When we access these pages we transfer a lot of empty bytes (padding) between the flash media and SRAM.
    • Proposed Solutions:
      • Solution 1: Two-Phase Page Reads
      • Solution 2: ELF-like Chaining of Index Pages
  • Improving Search Performance
    • Solution 1: Utilize Two-Phase Page Reads.
      • Reads the 8B header from the flash media.
      • Then read the correct payload in the next phase.
  • Improving Search Performance
    • Solution 2: Avoid non-full index pages using ELF*.
      • ELF: a linked list in which each page, other than the last page, is completely full.
      • keeps copying the last non-full page into a newer page, when new records are requested to be added.
    *Dai et. al., Efficient Log Structured Flash File System, SenSys 2004
  • Search Performance
    • We compared MicroHash vs. ELF Index Page Chaining by searching all values in the range [20,100]
    • Keeping full index pages increases search performance but decreases insertion performance .
    Decreasing indexing performance using ELF (15% more writes) Increasing search performance using ELF (10% less reads)