TinyDB Tutorial
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

TinyDB Tutorial

on

  • 2,979 views

 

Statistics

Views

Total Views
2,979
Views on SlideShare
2,979
Embed Views
0

Actions

Likes
2
Downloads
96
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Build?
  • Light = .6mJ/sample (~1S per acquisition) Pressure = .00875 mJ/sample (~.025S) Each sensor with 10 neighbors, 1 Child Processor : 1s waiting for samples, 1s to receive/forward data = 10mJ Radio : .07mJ sampling, ~1mJ receive, ~.2 mJ sending 84% energy processing, 5% sensing, 11% radio Idle time sensing, communicating dominates power
  • Parameterized interfaces? Needed for LogicalTime, TimerC
  • Comm interval = slot within longer epoch Tell you in a few moments how comm intervals are assigned Show you basic aggregation process
  • Say something about a new node arriving, or a node leaving Notice number of empty slots
  • Move out of experiments! Properties give us: effective, transparent optimization Extensibility
  • Cut math
  • details

TinyDB Tutorial Presentation Transcript

  • 1. Implementation and Research Issues in Query Processing for Wireless Sensor Networks Wei Hong Intel Research, Berkeley [email_address] Sam Madden MIT [email_address] ICDE 2004
  • 2. Motivation
    • Sensor networks (aka sensor webs, emnets) are here
      • Several widely deployed HW/SW platforms
        • Low power radio, small processor, RAM/Flash
      • Variety of (novel) applications: scientific, industrial, commercial
      • Great platform for mobile + ubicomp experimentation
    • Real, hard research problems to be solved
      • Networking, systems, languages, databases
    • We will summarize:
      • The state of the art
      • Our experiences building TinyDB
      • Current and future research directions
    Berkeley Mote
  • 3. Sensor Network Apps Habitat Monitoring : Storm petrels on Great Duck Island, microclimates on James Reserve. Traditional monitoring apparatus. Earthquake monitoring in shake-test sites. Vehicle detection : sensors along a road, collect data about passing vehicles.
  • 4. Declarative Queries
    • Programming Apps is Hard
      • Limited power budget
      • Lossy, low bandwidth communication
      • Require long-lived, zero admin deployments
      • Distributed Algorithms
      • Limited tools, debugging interfaces
    • Queries abstract away much of the complexity
      • Burden on the database developers
      • Users get:
        • Safe, optimizable programs
        • Freedom to think about apps instead of details
  • 5. TinyDB: Prototype declarative query processor
    • Platform: Berkeley Motes + TinyOS
    • Continuous variant of SQL : TinySQL
    • Power and data-acquisition based in-network optimization framework
    • Extensible interface for aggregates, new types of sensors
  • 6. Agenda
    • Part 1 : Sensor Networks (50 Minutes)
      • TinyOS
      • NesC
    • Short Break
    • Part 2: TinyDB (1 Hour)
      • Data Model and Query Language
      • Software Architecture
    • Long Break + Hands On
    • Part 3: Sensor Network Database Research Directions (1 Hour, 10 Minutes)
  • 7. Part 1
    • Sensornet Background
    • Motes + Mote Hardware
      • TinyOS
      • Programming Model + NesC
    • TinyOS Architecture
      • Major Software Subsystems
      • Networking Services
  • 8. A Brief History of Sensornets
    • People have used sensors for a long time
    • Recent CS History:
      • (1998) Pottie + Kaiser: Radio based networks of sensors
      • (1998) Pister et al: Smart Dust
        • Initial focus on optical communication
        • By 1999, radio based networks, COTS Dust, “Motes”
      • (1999) Estrin + Govindan
        • Ad-hoc networks of sensors
      • (2000) Culler/Hill et al: TinyOS + Motes
      • (2002) Hill / Dust: SPEC, mm^3 scale computing
    • UCLA / USC / Berkeley Continue to Lead Research
      • Many other players now
      • TinyOS/Motes as most common platform
    • Emerging commercial space:
      • Crossbow, Ember, Dust, Sensicast, Moteiv, Intel
  • 9. Why Now?
    • Commoditization of radio hardware
      • Cellular and cordless phones, wireless communication
    • Low cost -> many/tiny -> new applications!
    • Real application for ad-hoc network research from the late 90’s
    • Coming together of EE + CS communities
  • 10. Motes 4Mhz, 8 bit Atmel RISC uProc 40 kbit Radio 4 K RAM, 128 K Program Flash, 512 K Data Flash AA battery pack Based on TinyOS Mica Mote Mica2Dot
  • 11. History of Motes
    • Initial research goal wasn’t hardware
      • Has since become more of a priority with emerging hardware needs, e.g.:
        • Power consumption
        • (Ultrasonic) ranging + localization
          • MIT Cricket, NEST Project
        • Connectivity with diverse sensors
          • UCLA sensor board
      • Even so, now on the 5th generation of devices
        • Costs down to ~$50/node (Moteiv, Dust)
        • Greatly improved radio quality
        • Multitude of interfaces: USB, Ethernet, CF, etc.
        • Variety of form factors, packages
  • 12. Motes vs. Traditional Computing
    • Lossy, Adhoc Radio Communication
    • Sensing Hardware
    • Severe Power Constraints
  • 13. Radio Communication
    • Low Bandwidth Shared Radio Channel
      • ~40kBits on motes
      • Much less in practice
        • Encoding, Contention for Media Access (MAC)
    • Very lossy: 30% base loss rate
      • Argues against TCP-like end-to-end retransmission
        • And for link-layer retries
    • Generally, not well behaved
    From Ganesan, et al. “Complex Behavior at Scale.” UCLA/CSD-TR 02-0013
  • 14. Types of Sensors
    • Sensors attach via daughtercard
    • Weather
      • Temperature
      • Light x 2 (high intensity PAR, low intensity, full spectrum)
      • Air Pressure
      • Humidity
    • Vibration
      • 2 or 3 axis accelerometers
    • Tracking
      • Microphone (for ranging and acoustic signatures)
      • Magnetometer
    • GPS
  • 15. Power Consumption and Lifetime
    • Power typically supplied by a small battery
      • 1000-2000 mAH
      • 1 mAH = 1 milliamp current for 1 hour
        • Typically at optimum voltage, current drain rates
      • Power = Watts (W) = Amps (A) * Volts (V)
      • Energy = Joules (J) = W * time
    • Lifetime, power consumption varies by application
      • Processor: 5mA active, 1 mA idle, 5 uA sleeping
      • Radio: 5 mA listen, 10 mA xmit/receive, ~20mS / packet
      • Sensors: 1 uA -> 100’s mA, 1 uS -> 1 S / sample
  • 16.
    • Each mote collects 1 sample of (light,humidity) data every 10 seconds, forwards it
    • Each mote can “hear” 10 other motes
    • Process:
      • Wake up, collect samples (~ 1 second)
      • Listen to radio for messages to forward (~1 second)
      • Forward data
    Energy Usage in A Typical Data Collection Scenario
  • 17. Sensors: Slow, Power Hungry, Noisy
  • 18. Programming Sensornets: TinyOS
    • Component Based Programming Model
    • Suite of software components
      • Timers, clocks, clock synchronization
      • Single and multi-hop networking
      • Power management
      • Non-volatile storage management
  • 19. Programming Philosophy
    • Component Based
      • “ Wiring” to components together via interfaces, configurations
    • Split-Phased
      • Nothing blocks, ever.
      • Instead, completion events are signaled.
    • Highly Concurrent
      • Single thread of “tasks”, posted and scheduled FIFO
      • Events “fired” asynchronously in response to interrupts.
  • 20. NesC
    • C-like programming language with component model support
      • Compiles into GCC-compatible C
    • 3 types of files:
      • Interfaces
        • Set of function prototypes; no implementations or variables
      • Modules
        • Provide (implement) zero or more interfaces
        • Require zero or more interfaces
        • May define module variables, scoped to functions in module
      • Configurations
        • Wire (connect) modules according to requires/provides relationship
  • 21. Component Example: Leds
    • module LedsC {
    • provides interface Leds;
    • }
    • implementation
    • {
    • uint8_t ledsOn;
    • enum {
    • RED_BIT = 1,
    • GREEN_BIT = 2,
    • YELLOW_BIT = 4
    • };
    … . async command result_t Leds.redOn() { dbg(DBG_LED, "LEDS: Red on. "); atomic { TOSH_CLR_RED_LED_PIN(); ledsOn |= RED_BIT; } return SUCCESS; } … . }
  • 22. Configuration Example
    • configuration CntToLedsAndRfm {
    • }
    • implementation {
    • components Main, Counter, IntToLeds, IntToRfm, TimerC;
    • Main.StdControl -> Counter.StdControl;
    • Main.StdControl -> IntToLeds.StdControl;
    • Main.StdControl -> IntToRfm.StdControl;
    • Main.StdControl -> TimerC.StdControl;
    • Counter.Timer -> TimerC.Timer[ unique ("Timer")];
    • IntToLeds <- Counter.IntOutput;
    • Counter.IntOutput -> IntToRfm;
    • }
  • 23. Split Phase Example
    • module IntToRfmM { … }
    • implementation { …
    • command result_t IntOutput.output
    • (uint16_t value) {
    • IntMsg *message = (IntMsg *)data.data;
    • if (!pending) {
    • pending = TRUE;
    • message->val = value;
    • atomic {
    • message->src = TOS_LOCAL_ADDRESS;
    • }
    • if ( call Send.send(TOS_BCAST_ADDR,
    • sizeof(IntMsg), &data))
    • return SUCCESS;
    • pending = FALSE;
    • }
    • return FAIL;
    • }
    event result_t Send.sendDone (TOS_MsgPtr msg, result_t success) { if (pending && msg == &data) { pending = FALSE; signal IntOutput.outputComplete (success); } return SUCCESS; } } }
  • 24. Major Components
    • Timers: Clock, TimerC, LogicalTime
    • Networking: Send, GenericComm, AMStandard, lib/Route
    • Power Management: HPLPowerManagement
    • Storage Management: EEPROM , MatchBox
  • 25. Timers
    • Clock : Basic abstraction over hardware timers; periodic events, single frequency.
    • LogicalTime : Fire an event some number of H:M:S:ms in the future.
    • TimerC : Multiplex multiple periodic timers on top of LogicalTime.
  • 26. Radio Stack
    • Interfaces:
      • Send
        • Broadcast, or to a specific ID
        • split phase
      • Receive
        • asynchronous signal
    • Implementations:
      • AMStandard
        • Application specific messages
        • Id-based dispatch
      • GenericComm
        • AMStandard + Serial IO
      • Lib/Route
        • Mulithop
    IntMsg *message = (IntMsg *)data.data; … message->val = value; atomic { message->src = TOS_LOCAL_ADDRESS; } call Send.send(TOS_BCAST_ADDR, sizeof(IntMsg), &data)) event TOS_MsgPtr ReceiveIntMsg. receive(TOS_MsgPtr m) { IntMsg *message = (IntMsg *)m->data; call IntOutput.output(message->val); return m; } Wiring to equate IntMsg to ReceiveIntMsg
  • 27. Multihop Networking
    • Standard implementation “tree based routing”
    Problems: Parent Selection Asymmetric Links Adaptation vs. Stability A B C D F E B B B B B B B B B B B B R:{…} R:{…} R:{…} R:{…} R:{…} Node D Neigh Qual B .75 C .66 E .45 F .82 Node C Neigh Qual A .5 B .44 D .53 F .35
  • 28. Geographic Routing
    • Any-to-any routing via geographic coordinates
      • See “GPSR”, MOBICOM 2000, Karp + Kung.
    A B
    • Requires coordinate system*
    • Requires endpont coordinates
    • Hard to route around local minima (“holes”)
    *Could be virtual, as in Rao et al “Geographic Routing Without Coordinate Information.” MOBICOM 2003
  • 29. Power Management
    • HPLPowerManagement
      • TinyOS sleeps processor when possible
      • Observes the radio, sensor, and timer state
    • Application managed, for the most part
      • App. must turn off subsystems when not in use
      • Helper utility: ServiceScheduler
        • Peridically calls the “start” and “stop” methods of an app
      • More on power management in TinyDB later
      • Approach works because:
        • single application
        • no interactivity requirements
  • 30. Non-Volatile Storage
    • EEPROM
      • 512K off chip, 32K on chip
      • Writes at disk speeds, reads at RAM speeds
      • Interface : random access, read/write 256 byte pages
      • Maximum throughput ~10Kbytes / second
    • MatchBox Filing System
      • Provides a Unix-like file I/O interface
      • Single, flat directory
      • Only one file being read/written at a time
  • 31. TinyOS: Getting Started
    • The TinyOS home page:
      • http://webs.cs.berkeley.edu/ tinyos
      • Start with the tutorials!
    • The CVS repository
      • http://sf.net/projects/tinyos
    • The NesC Project Page
      • http://sf.net/projects/nescc
    • Crossbow motes (hardware):
      • http://www.xbow.com
    • Intel Imote
      • www. intel .com/research/exploratory/motes. htm .
  • 32. Part 2 The Design and Implementation of TinyDB
  • 33. Part 2 Outline
    • TinyDB Overview
    • Data Model and Query Language
    • TinyDB Java API and Scripting
    • Demo with TinyDB GUI
    • TinyDB Internals
    • Extending TinyDB
    • TinyDB Status and Roadmap
  • 34. TinyDB Revisited SELECT MAX (mag) FROM sensors WHERE mag > thresh SAMPLE PERIOD 64ms
    • High level abstraction:
      • Data centric programming
      • Interact with sensor network as a whole
      • Extensible framework
    • Under the hood:
      • Intelligent query processing: query optimization, power efficient execution
      • Fault Mitigation: automatically introduce redundancy, avoid problem areas
    App TinyDB Query, Trigger Data Sensor Network
  • 35. Feature Overview
    • Declarative SQL-like query interface
    • Metadata catalog management
    • Multiple concurrent queries
    • Network monitoring (via queries)
    • In-network, distributed query processing
    • Extensible framework for attributes, commands and aggregates
    • In-network, persistent storage
  • 36. Architecture TinyDB GUI TinyDB Client API DBMS Sensor network TinyDB query processor 0 4 0 1 5 2 6 3 7 JDBC Mote side PC side 8
  • 37. Data Model
    • Entire sensor network as one single, infinitely-long logical table: sensors
    • Columns consist of all the attributes defined in the network
    • Typical attributes:
      • Sensor readings
      • Meta-data: node id, location, etc.
      • Internal states: routing tree parent, timestamp, queue length, etc.
    • Nodes return NULL for unknown attributes
    • On server, all attributes are defined in catalog.xml
    • Discussion: other alternative data models?
  • 38. Query Language (TinySQL)
    • SELECT <aggregates>, <attributes>
    • [FROM {sensors | <buffer>}]
    • [WHERE <predicates>]
    • [GROUP BY <exprs>]
    • [SAMPLE PERIOD <const> | ONCE]
    • [INTO <buffer>]
    • [TRIGGER ACTION <command>]
  • 39. Comparison with SQL
    • Single table in FROM clause
    • Only conjunctive comparison predicates in WHERE and HAVING
    • No subqueries
    • No column alias in SELECT clause
    • Arithmetic expressions limited to column op constant
    • Only fundamental difference: SAMPLE PERIOD clause
  • 40. TinySQL Examples
      • SELECT nodeid, nestNo, light
      • FROM sensors
      • WHERE light > 400
      • EPOCH DURATION 1s
    1 Sensors “ Find the sensors in bright nests.” 2 1 2 1 Nodeid 405 25 1 422 17 1 389 25 0 455 17 0 Light nestNo Epoch
  • 41. TinySQL Examples (cont.) “ Count the number occupied nests in each loud region of the island.” 3 3 3 3 CNT(…) 520 370 520 360 AVG(…) South 0 North 1 South 1 North region 0 Epoch
      • SELECT region, CNT (occupied) AVG (sound)
      • FROM sensors
      • GROUP BY region
      • HAVING AVG (sound) > 200
      • EPOCH DURATION 10s
    3 Regions w/ AVG(sound) > 200 SELECT AVG(sound) FROM sensors EPOCH DURATION 10s 2
  • 42. Event-based Queries
    • ON event SELECT …
    • Run query only when interesting events happens
    • Event examples
      • Button pushed
      • Message arrival
      • Bird enters nest
    • Analogous to triggers but events are user-defined
  • 43. Query over Stored Data
    • Named buffers in Flash memory
    • Store query results in buffers
    • Query over named buffers
    • Analogous to materialized views
    • Example:
      • CREATE BUFFER name SIZE x (field1 type1, field2 type2, …)
      • SELECT a1, a2 FROM sensors SAMPLE PERIOD d INTO name
      • SELECT field1, field2, … FROM name SAMPLE PERIOD d
  • 44. Using the Java API
    • SensorQueryer
      • translateQuery() converts TinySQL string into TinyDBQuery object
      • Static query optimization
    • TinyDBNetwork
      • sendQuery() injects query into network
      • abortQuery() stops a running query
      • addResultListener() adds a ResultListener that is invoked for every QueryResult received
      • removeResultListener()
    • QueryResult
      • A complete result tuple, or
      • A partial aggregate result, call mergeQueryResult() to combine partial results
    • Key difference from JDBC: push vs. pull
  • 45. Writing Scripts with TinyDB
    • TinyDB’s text interface
      • java net.tinyos.tinydb.TinyDBMain –run “select …”
      • Query results printed out to the console
      • All motes get reset each time new query is posed
    • Handy for writing scripts with shell, perl, etc.
  • 46. Using the GUI Tools
    • Demo time
  • 47. Inside TinyDB TinyOS Schema Query Processor Multihop Network Filter light > 400 TinyDB ~10,000 Lines Embedded C Code ~5,000 Lines (PC-Side) Java ~3200 Bytes RAM (w/ 768 byte heap) ~58 kB compiled code (3x larger than 2 nd largest TinyOS Program) get (‘temp’) Agg avg(temp) Queries SELECT AVG(temp) WHERE light > 400 Results T :1, AVG : 225 T :2, AVG : 250 Tables Samples got(‘temp’) Name : temp Time to sample : 50 uS Cost to sample : 90 uJ Calibration Table : 3 Units : Deg. F Error : ± 5 Deg F Get f : getTempFunc() … getTempFunc(…)
  • 48. Tree-based Routing
    • Tree-based routing
      • Used in:
        • Query delivery
        • Data collection
        • In-network aggregation
      • Relationship to indexing?
    A B C D F E Q:SELECT … Q Q Q Q Q Q Q Q Q Q Q Q R:{…} R:{…} R:{…} R:{…} R:{…}
  • 49. Power Management Approach
    • Coarse-grained app-controlled communication scheduling
    1 2 3 4 5 Mote ID time Epoch (10s -100s of seconds) 2-4s Waking Period … zzz … … zzz …
  • 50. Time Synchronization
    • All messages include a 5 byte time stamp indicating system time in ms
      • Synchronize (e.g. set system time to timestamp) with
        • Any message from parent
        • Any new query message (even if not from parent)
      • Punt on multiple queries
      • Timestamps written just after preamble is xmitted
    • All nodes agree that the waking period begins when (system time % epoch dur = 0)
      • And lasts for WAKING_PERIOD ms
    • Adjustment of clock happens by changing duration of sleep cycle, not wake cycle.
  • 51. Extending TinyDB
    • Why extending TinyDB?
      • New sensors  attributes
      • New control/actuation  commands
      • New data processing logic  aggregates
      • New events
    • Analogous to concepts in object-relational databases
  • 52. Adding Attributes
    • Types of attributes
      • Sensor attributes: raw or cooked sensor readings
      • Introspective attributes: parent, voltage, ram usage, etc.
      • Constant attributes: constant values that can be statically or dynamically assigned to a mote, e.g., nodeid, location, etc.
  • 53. Adding Attributes (cont)
    • Interfaces provided by Attr component
      • StdControl: init, start, stop
      • AttrRegister
        • command registerAttr(name, type, len)
        • event getAttr(name, resultBuf, errorPtr)
        • event setAttr(name, val)
        • command getAttrDone(name, resultBuf, error)
      • AttrUse
        • command startAttr(attr)
        • event startAttrDone(attr)
        • command getAttrValue(name, resultBuf, errorPtr)
        • event getAttrDone(name, resultBuf, error)
        • command setAttrValue(name, val)
  • 54. Adding Attributes (cont)
    • Steps to adding attributes to TinyDB
      • Create attribute nesC components
      • Wire new attribute components to TinyDBAttr configuration
      • Reprogram TinyDB motes
      • Add new attribute entries to catalog.xml
    • Constant attributes can be added on the fly through TinyDB GUI
  • 55. Adding Aggregates
    • Step 1: wire new nesC components
  • 56. Adding Aggregates (cont)
    • Step 2: add entry to catalog.xml
      • <aggregate>
      • <name>AVG</name>
      • <id>5</id>
      • <temporal>false</temporal>
      • <readerClass>net.tinyos.tinydb.AverageClass</readerClass>
      • </aggregate>
    • Step 3 (optional): implement reader class in Java
      • a reader class interprets and finalizes aggregate state received from the mote network, returns final result as a string for display.
  • 57. TinyDB Status
    • Latest released with TinyOS 1.1 (9/03)
      • Install the task-tinydb package in TinyOS 1.1 distribution
      • First release in TinyOS 1.0 (9/02)
      • Widely used by research groups as well as industry pilot projects
    • Successful deployments in Intel Berkeley Lab and redwood trees at UC Botanical Garden
      • Largest deployment: ~80 weather station nodes
      • Network longevity: 4-5 months
  • 58. The Redwood Tree Deployment
    • Redwood Grove in UC Botanical Garden, Berkeley
    • Collect dense sensor readings to monitor climatic variations across
      • altitudes,
      • angles,
      • time,
      • forest locations, etc.
    • Versus sporadic monitoring points with 30lb loggers!
    • Current focus: study how dense sensor data affect predictions of conventional tree-growth models
  • 59. Data from Redwoods 36m 33m: 111 32m: 110 30m: 109,108,107 20m: 106,105,104 10m: 103, 102, 101
  • 60. TinyDB Roadmap (near term)
    • Support for high frequency sampling
      • Equipment vibration monitoring, structural monitoring, etc.
      • Store and forward
      • Bulk reliable data transfer
      • Scheduling of communications
    • Port to Intel Mote
    • Deployment in Intel Fab equipment monitoring application and the Golden Gate Bridge monitoring application
  • 61. For more information
    • http://berkeley.intel-research.net/tinydb or http:// triplerock . cs . bekeley . edu / tinydb
  • 62. Part 3
    • Database Research Issues in Sensor Networks
  • 63. Sensor Network Research
    • Very active research area
      • Can’t summarize it all
    • Focus: database-relevant research topics
      • Some outside of Berkeley
      • Other topics that are itching to be scratched
      • But , some bias towards work that we find compelling
  • 64. Topics
    • In-network aggregation
    • Acquisitional Query Processing
    • Heterogeneity
    • Intermittent Connectivity
    • In-network Storage
    • Statistics-based summarization and sampling
    • In-network Joins
    • Adaptivity and Sensor Networks
    • Multiple Queries
  • 65. Topics
    • In-network aggregation
    • Acquisitional Query Processing
    • Heterogeneity
    • Intermittent Connectivity
    • In-network Storage
    • Statistics-based summarization and sampling
    • In-network Joins
    • Adaptivity and Sensor Networks
    • Multiple Queries
  • 66. Tiny Aggregation (TAG)
    • In-network processing of aggregates
      • Common data analysis operation
        • Aka gather operation or reduction in || programming
      • Communication reducing
        • Operator dependent benefit
      • Across nodes during same epoch
    • Exploit query semantics to improve efficiency!
    Madden, Franklin, Hellerstein, Hong. Tiny AGgregation (TAG) , OSDI 2002 .
  • 67. Basic Aggregation
    • In each epoch:
      • Each node samples local sensors once
      • Generates partial state record ( PSR )
        • local readings
        • readings from children
      • Outputs PSR during assigned comm. interval
    • At end of epoch, PSR for whole network output at root
    • New result on each successive epoch
    • Extras:
      • Predicate-based partitioning via GROUP BY
    1 2 3 4 5
  • 68. Illustration: Aggregation 1 Sensor # Interval # Interval 4 SELECT COUNT(*) FROM sensors Epoch 1 2 3 1 4 4 5 4 3 2 1 1 2 3 4 5
  • 69. Illustration: Aggregation 2 Sensor # Interval 3 SELECT COUNT(*) FROM sensors Interval # 1 2 2 3 1 4 4 5 4 3 2 1 1 2 3 4 5
  • 70. Illustration: Aggregation 3 1 Sensor # Interval 2 SELECT COUNT(*) FROM sensors Interval # 1 3 1 2 2 3 1 4 4 5 4 3 2 1 1 2 3 4 5
  • 71. Illustration: Aggregation 5 Sensor # SELECT COUNT(*) FROM sensors Interval 1 Interval # 5 1 3 1 2 2 3 1 4 4 5 4 3 2 1 1 2 3 4 5
  • 72. Illustration: Aggregation 1 Sensor # SELECT COUNT(*) FROM sensors Interval 4 Interval # 5 1 3 1 2 2 3 1 4 1 4 5 4 3 2 1 1 2 3 4 5
  • 73. Aggregation Framework
    • As in extensible databases, TinyDB supports any aggregation function conforming to:
    Agg n ={f init , f merge , f evaluate } F init {a 0 }  <a 0 > F merge {<a 1 >,<a 2 >}  <a 12 > F evaluate {<a 1 >}  aggregate value Example: Average AVG init {v}  <v,1> AVG merge {<S 1 , C 1 >, <S 2 , C 2 >}  < S 1 + S 2 , C 1 + C 2 > AVG evaluate {<S, C>}  S/C Partial State Record (PSR) Restriction : Merge associative, commutative
  • 74. Taxonomy of Aggregates
    • TAG insight: classify aggregates according to various functional properties
      • Yields a general set of optimizations that can automatically be applied
    Drives an API! Hypothesis Testing, Snooping COUNT : monotonic AVG : non-monotonic Monotonicity Routing Redundancy MIN : dup. insensitive, AVG : dup. sensitive Duplicate Sensitivity Applicability of Sampling, Effect of Loss MAX : exemplary COUNT: summary Exemplary vs. Summary Effectiveness of TAG MEDIAN : unbounded, MAX : 1 record Partial State Affects Examples Property
  • 75. Use Multiple Parents
    • Use graph structure
      • Increase delivery probability with no communication overhead
    • For duplicate insensitive aggregates, or
    • Aggs expressible as sum of parts
      • Send (part of) aggregate to all parents
        • In just one message, via multicast
      • Assuming independence , decreases variance
    SELECT COUNT(*) P(link xmit successful) = p P(success from A->R) = p 2 E(cnt) = c * p 2 Var(cnt) = c 2 * p 2 * (1 – p 2 )  V # of parents = n E(cnt) = n * (c/n * p 2 ) Var(cnt) = n * (c/n) 2 * p 2 * (1 – p 2 ) = V/n A B C R A B C c R A B C c/n c/n R n = 2
  • 76. Multiple Parents Results
    • Better than previous analysis expected!
    • Losses aren’t independent!
    • Insight: spreads data over many links
    Critical Link! No Splitting With Splitting
  • 77. Acquisitional Query Processing (ACQP)
    • TinyDB acquires AND processes data
      • Could generate an infinite number of samples
    • An acqusitional query processor controls
      • when,
      • where,
      • and with what frequency data is collected!
    • Versus traditional systems where data is provided a priori
    Madden, Franklin, Hellerstein, and Hong. The Design of An Acqusitional Query Processor. SIGMOD, 2003.
  • 78. ACQP: What’s Different?
    • How should the query be processed?
      • Sampling as a first class operation
    • How does the user control acquisition?
      • Rates or lifetimes
      • Event-based triggers
    • Which nodes have relevant data?
      • Index-like data structures
    • Which samples should be transmitted?
      • Prioritization, summary, and rate control
  • 79. Operator Ordering: Interleave Sampling + Selection
    • SELECT light, mag
    • FROM sensors
    • WHERE pred1(mag)
    • AND pred2(light)
    • EPOCH DURATION 1s
    • E(sampling mag) >> E(sampling light)
      • 1500 uJ vs. 90 uJ
    At 1 sample / sec, total power savings could be as much as 3.5mW  Comparable to processor!  (pred1)  (pred2) mag light  (pred1)  (pred2) mag light  (pred1)  (pred2) mag light Traditional DBMS ACQP Correct ordering (unless pred1 is very selective and pred2 is not): Cheap Costly
  • 80. Exemplary Aggregate Pushdown
    • SELECT WINMAX(light,8s,8s)
    • FROM sensors
    • WHERE mag > x
    • EPOCH DURATION 1s
    • Novel, general pushdown technique
    • Mag sampling is the most expensive operation!
     WINMAX  (mag>x) mag light Traditional DBMS light mag  (mag>x)  WINMAX  (light > MAX) ACQP
  • 81. Topics
    • In-network aggregation
    • Acquisitional Query Processing
    • Heterogeneity
    • Intermittent Connectivity
    • In-network Storage
    • Statistics-based summarization and sampling
    • In-network Joins
    • Adaptivity and Sensor Networks
    • Multiple Queries
  • 82. Heterogeneous Sensor Networks
    • Leverage small numbers of high-end nodes to benefit large numbers of inexpensive nodes
    • Still must be transparent and ad-hoc
    • Key to scalability of sensor networks
    • Interesting heterogeneities
      • Energy: battery vs. outlet power
      • Link bandwidth: Chipcon vs. 802.11x
      • Computing and storage: ATMega128 vs. Xscale
      • Pre-computed results
      • Sensing nodes vs. QP nodes
  • 83. Computing Heterogeneity with TinyDB
    • Separate query processing from sensing
      • Provide query processing on a small number of nodes
      • Attract packets to query processors based on “service value”
    • Compare the total energy consumption of the network
    • No aggregation
    • All aggregation
    • Opportunistic aggregation
    • HSN proactive aggregation
    Mark Yarvis and York Liu, Intel’s Heterogeneous Sensor Network Project, ftp://download.intel.com/research/people/HSN_IR_Day_Poster_03.pdf .
  • 84. 5x7 TinyDB/HSN Mica2 Testbed
  • 85. Data Packet Saving
    • How many aggregators are desired?
    • Does placement matter?
    11% aggregators achieve 72% of max data reduction Optimal placement 2/3 distance from sink.
  • 86. Occasionally Connected Sensornets TinyDB QP TinyDB QP TinyDB Server GTWY Mobile GTWY Mobile GTWY TinyDB QP GTWY internet GTWY Mobile GTWY
  • 87. Occasionally Connected Sensornets Challenges
    • Networking support
      • Tradeoff between reliability, power consumption and delay
      • Data custody transfer: duplicates?
      • Load shedding
      • Routing of mobile gateways
    • Query processing
      • Operation placement: in-network vs. on mobile gateways
      • Proactive pre-computation and data movement
    • Tight interaction between networking and QP
    Fall, Hong and Madden, Custody Transfer for Reliable Delivery in Delay Tolerant Networks, http://www.intel-research.net/Publications/Berkeley/081220030852_157.pdf .
  • 88. Distributed In-network Storage
    • Collectively, sensornets have large amounts of in-network storage
    • Good for in-network consumption or caching
    • Challenges
      • Distributed indexing for fast query dissemination
      • Resilience to node or link failures
      • Graceful adaptation to data skews
      • Minimizing index insertion/maintenance cost
  • 89. Example: DIM
    • Functionality
      • Efficient range query for multidimensional data.
    • Approaches
      • Divide sensor field into bins.
      • Locality preserving mapping from m- d space to geographic locations.
      • Use geographic routing such as GPSR .
    • Assumptions
      • Nodes know their locations and network boundary
      • No node mobility
    Xin Li, Young Jin Kim, Ramesh Govindan and Wei Hong , Distributed Index for Multi-dimentional Data (DIM) in Sensor Networks, SenSys 2003. E 2 = <0.6, 0.7> E 1 = <0.7, 0.8> Q 1 =<.5-.7, .5-1>
  • 90. Statistical Techniques
    • Approximations, summaries, and sampling based on statistics and statistical models
    • Applications:
      • Limited bandwidth and large number of nodes -> data reduction
      • Lossiness -> predictive modeling
      • Uncertainty -> tracking correlations and changes over time
      • Physical models -> improved query answering
  • 91. Correlated Attributes
    • Data in sensor networks is correlated; e.g.,
      • Temperature and voltage
      • Temperature and light
      • Temperature and humidity
      • Temperature and time of day
      • etc.
  • 92. IDSQ
    • Idea: task sensors in order of best improvement to estimate of some value:
      • Choose leader(s)
        • Suppress subordinates
        • Task subordinates, one at a time
          • Until some measure of goodness (error bound) is met
            • E.g. “Mahalanobis Distance” -- Accounts for correlations in axes, tends to favor minimizing principal axis
    See “Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks.” Chu, Haussecker and Zhao. Xerox TR P2001-10113. May, 2001.
  • 93. Model location estimate as a point with 2-dimensional Gaussian uncertainty. Graphical Representation Principal Axis S 1 Residual 1 Preferred because it reduces error along principal axis Residual 2 S 2 Area of residuals is equal
  • 94. MQSN: Model-based Probabilistic Querying over Sensor Networks Query Processor Model 1 2 3 5 6 4 7 8 9 Joint work with Amol Desphande, Carlos Guestrin, and Joe Hellerstein
  • 95. MQSN: Model-based Probabilistic Querying over Sensor Networks Query Processor Model 1 2 3 5 6 4 7 8 9 Consult Model Probabilistic Query select NodeID, Temp ± 0.1C where NodeID in [1..9] with conf(0.95) Observation Plan [Temp, 3], [Temp, 9]
  • 96. MQSN: Model-based Probabilistic Querying over Sensor Networks Query Processor Model 1 2 3 5 6 4 7 8 9 Consult Model Observation Plan [Temp, 3], [Temp, 9] Probabilistic Query select NodeID, Temp ± 0.1C where NodeID in [1..9] with conf(0.95)
  • 97. MQSN: Model-based Probabilistic Querying over Sensor Networks Query Processor Model 1 2 3 5 6 4 7 8 9 Query Results Update Model Data [Temp, 3] = …, [Temp, 9] = …
  • 98. Challenges
    • What kind of models to use ?
    • Optimization problem:
      • Given a model and a query, find the best set of attributes to observe
      • Cost not easy to measure
        • Non-uniform network communication costs
        • Changing network topologies
      • Large plan space
        • Might be cheaper to observe attributes not in query
          • e.g. Voltage instead of Temperature
        • Conditional Plans:
          • Change the observation plan based on observed values
  • 99. MQSN: Current Prototype
    • Multi-variate Gaussian Models
      • Kalman Filters to capture correlations across time
    • Handles:
      • Range predicate queries
        • sensor value within [x,y], w/ confidence
      • Value queries
        • sensor value = x, w/in epsilon, w/ confidence
      • Simple aggregate queries
        • AVG(sensor value)  n, w/in epsilon, w/confidence
    • Uses a greedy algorithm to choose the observation plan
  • 100. In-Net Regression
    • Linear regression : simple way to predict future values, identify outliers
    • Regression can be across local or remote values, multiple dimensions, or with high degree polynomials
      • E.g., node A readings vs. node B’s
      • Or, location (X,Y), versus temperature
        • E.g., over many nodes
    Guestrin, Thibaux, Bodik, Paskin, Madden. “Distributed Regression: an Efficient Framework for Modeling Sensor Network Data .” Under submission.
  • 101. In-Net Regression (Continued)
    • Problem: may require data from all sensors to build model
    • Solution: partition sensors into overlapping “kernels” that influence each other
      • Run regression in each kernel
        • Requiring just local communication
      • Blend data between kernels
      • Requires some clever matrix manipulation
    • End result: regressed model at every node
      • Useful in failure detection, missing value estimation
  • 102. Exploiting Correlations in Query Processing
    • Simple idea:
      • Given predicate P(A) over expensive attribute A
      • Replace it with P’ over cheap attribute A’ such that P’ evaluates to P
      • Problem: unless A and A’ are perfectly correlated, P’ ≠ P for all time
        • So we could incorrectly accept or reject some readings
    • Alternative: use correlations to improve selectivity estimates in query optimization
      • Construct conditional plans that vary predicate order based on prior observations
  • 103. Exploiting Correlations (Cont.)
    • Insight: by observing a (cheap and correlated) variable not involved in the query, it may be possible to improve query performance
      • Improves estimates of selectivities
    • Use conditional plans
    • Example
    Light > 100 Lux Temp < 20° C Cost = 100 Selectivity = .5 Cost = 100 Selectivity = .5 Expected Cost = 150 Light > 100 Lux Temp < 20° C Cost = 100 Selectivity = .5 Cost = 100 Selectivity = .5 Expected Cost = 150 Light > 100 Lux Temp < 20° C Cost = 100 Selectivity = .1 Cost = 100 Selectivity = .9 Expected Cost = 110 Light > 100 Lux Temp < 20° C Cost = 100 Selectivity = .1 Cost = 100 Selectivity = .9 Expected Cost = 110 Time in [6pm, 6am] T F
  • 104. In-Network Join Strategies
    • Types of joins:
      • non-sensor -> sensor
      • sensor -> sensor
    • Optimization questions:
      • Should the join be pushed down?
      • If so, where should it be placed?
      • What if a join table exceeds the memory available on one node?
  • 105. Choosing Where to Place Operators
    • Idea : choose a “join node” to run the operator
    • Over time, explore other candidate placements
      • Nodes advertise data rates to their neighbors
      • Neighbors compute expected cost of running the join based on these rates
      • Neighbors advertise costs
      • Current join node selects a new, lower cost node
    Bonfils + Bonnet, Adaptive and Decentralized Operator Placement for In-Network QueryProcessing IPSN 2003.
  • 106. Topics
    • In-network aggregation
    • Acquisitional Query Processing
    • Heterogeneity
    • Intermittent Connectivity
    • In-network Storage
    • Statistics-based summarization and sampling
    • In-network Joins
    • Adaptivity and Sensor Networks
    • Multiple Queries
  • 107. Adaptivity In Sensor Networks
    • Queries are long running
    • Selectivities change
      • E.g. night vs day
    • Network load and available energy vary
    • All suggest that some adaptivity is needed
      • Of data rates or granularity of aggregation when optimizing for lifetimes
      • Of operator orderings or placements when selectivities change (c.f., conditional plans for correlations)
    • As far as we know, this is an open problem!
  • 108. Multiple Queries and Work Sharing
    • As sensornets evolve, users will run many queries simultaneously
      • E.g., traffic monitoring
    • Likely that queries will be similar
      • But have different end points, parameters, etc
    • Would like to share processing, routing as much as possible
    • But how? Again, an open problem.
  • 109. Concluding Remarks
    • Sensor networks are an exciting emerging technology, with a wide variety of applications
    • Many research challenges in all areas of computer science
      • Database community included
      • Some agreement that a declarative interface is right
    • TinyDB and other early work are an important first step
    • But there’s lots more to be done!