Your SlideShare is downloading. ×
TinyDB Tutorial
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

TinyDB Tutorial

2,464
views

Published on


0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,464
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
98
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Build?
  • Light = .6mJ/sample (~1S per acquisition) Pressure = .00875 mJ/sample (~.025S) Each sensor with 10 neighbors, 1 Child Processor : 1s waiting for samples, 1s to receive/forward data = 10mJ Radio : .07mJ sampling, ~1mJ receive, ~.2 mJ sending 84% energy processing, 5% sensing, 11% radio Idle time sensing, communicating dominates power
  • Parameterized interfaces? Needed for LogicalTime, TimerC
  • Comm interval = slot within longer epoch Tell you in a few moments how comm intervals are assigned Show you basic aggregation process
  • Say something about a new node arriving, or a node leaving Notice number of empty slots
  • Move out of experiments! Properties give us: effective, transparent optimization Extensibility
  • Cut math
  • details
  • Transcript

    • 1. Implementation and Research Issues in Query Processing for Wireless Sensor Networks Wei Hong Intel Research, Berkeley [email_address] Sam Madden MIT [email_address] ICDE 2004
    • 2. Motivation
      • Sensor networks (aka sensor webs, emnets) are here
        • Several widely deployed HW/SW platforms
          • Low power radio, small processor, RAM/Flash
        • Variety of (novel) applications: scientific, industrial, commercial
        • Great platform for mobile + ubicomp experimentation
      • Real, hard research problems to be solved
        • Networking, systems, languages, databases
      • We will summarize:
        • The state of the art
        • Our experiences building TinyDB
        • Current and future research directions
      Berkeley Mote
    • 3. Sensor Network Apps Habitat Monitoring : Storm petrels on Great Duck Island, microclimates on James Reserve. Traditional monitoring apparatus. Earthquake monitoring in shake-test sites. Vehicle detection : sensors along a road, collect data about passing vehicles.
    • 4. Declarative Queries
      • Programming Apps is Hard
        • Limited power budget
        • Lossy, low bandwidth communication
        • Require long-lived, zero admin deployments
        • Distributed Algorithms
        • Limited tools, debugging interfaces
      • Queries abstract away much of the complexity
        • Burden on the database developers
        • Users get:
          • Safe, optimizable programs
          • Freedom to think about apps instead of details
    • 5. TinyDB: Prototype declarative query processor
      • Platform: Berkeley Motes + TinyOS
      • Continuous variant of SQL : TinySQL
      • Power and data-acquisition based in-network optimization framework
      • Extensible interface for aggregates, new types of sensors
    • 6. Agenda
      • Part 1 : Sensor Networks (50 Minutes)
        • TinyOS
        • NesC
      • Short Break
      • Part 2: TinyDB (1 Hour)
        • Data Model and Query Language
        • Software Architecture
      • Long Break + Hands On
      • Part 3: Sensor Network Database Research Directions (1 Hour, 10 Minutes)
    • 7. Part 1
      • Sensornet Background
      • Motes + Mote Hardware
        • TinyOS
        • Programming Model + NesC
      • TinyOS Architecture
        • Major Software Subsystems
        • Networking Services
    • 8. A Brief History of Sensornets
      • People have used sensors for a long time
      • Recent CS History:
        • (1998) Pottie + Kaiser: Radio based networks of sensors
        • (1998) Pister et al: Smart Dust
          • Initial focus on optical communication
          • By 1999, radio based networks, COTS Dust, “Motes”
        • (1999) Estrin + Govindan
          • Ad-hoc networks of sensors
        • (2000) Culler/Hill et al: TinyOS + Motes
        • (2002) Hill / Dust: SPEC, mm^3 scale computing
      • UCLA / USC / Berkeley Continue to Lead Research
        • Many other players now
        • TinyOS/Motes as most common platform
      • Emerging commercial space:
        • Crossbow, Ember, Dust, Sensicast, Moteiv, Intel
    • 9. Why Now?
      • Commoditization of radio hardware
        • Cellular and cordless phones, wireless communication
      • Low cost -> many/tiny -> new applications!
      • Real application for ad-hoc network research from the late 90’s
      • Coming together of EE + CS communities
    • 10. Motes 4Mhz, 8 bit Atmel RISC uProc 40 kbit Radio 4 K RAM, 128 K Program Flash, 512 K Data Flash AA battery pack Based on TinyOS Mica Mote Mica2Dot
    • 11. History of Motes
      • Initial research goal wasn’t hardware
        • Has since become more of a priority with emerging hardware needs, e.g.:
          • Power consumption
          • (Ultrasonic) ranging + localization
            • MIT Cricket, NEST Project
          • Connectivity with diverse sensors
            • UCLA sensor board
        • Even so, now on the 5th generation of devices
          • Costs down to ~$50/node (Moteiv, Dust)
          • Greatly improved radio quality
          • Multitude of interfaces: USB, Ethernet, CF, etc.
          • Variety of form factors, packages
    • 12. Motes vs. Traditional Computing
      • Lossy, Adhoc Radio Communication
      • Sensing Hardware
      • Severe Power Constraints
    • 13. Radio Communication
      • Low Bandwidth Shared Radio Channel
        • ~40kBits on motes
        • Much less in practice
          • Encoding, Contention for Media Access (MAC)
      • Very lossy: 30% base loss rate
        • Argues against TCP-like end-to-end retransmission
          • And for link-layer retries
      • Generally, not well behaved
      From Ganesan, et al. “Complex Behavior at Scale.” UCLA/CSD-TR 02-0013
    • 14. Types of Sensors
      • Sensors attach via daughtercard
      • Weather
        • Temperature
        • Light x 2 (high intensity PAR, low intensity, full spectrum)
        • Air Pressure
        • Humidity
      • Vibration
        • 2 or 3 axis accelerometers
      • Tracking
        • Microphone (for ranging and acoustic signatures)
        • Magnetometer
      • GPS
    • 15. Power Consumption and Lifetime
      • Power typically supplied by a small battery
        • 1000-2000 mAH
        • 1 mAH = 1 milliamp current for 1 hour
          • Typically at optimum voltage, current drain rates
        • Power = Watts (W) = Amps (A) * Volts (V)
        • Energy = Joules (J) = W * time
      • Lifetime, power consumption varies by application
        • Processor: 5mA active, 1 mA idle, 5 uA sleeping
        • Radio: 5 mA listen, 10 mA xmit/receive, ~20mS / packet
        • Sensors: 1 uA -> 100’s mA, 1 uS -> 1 S / sample
    • 16.
      • Each mote collects 1 sample of (light,humidity) data every 10 seconds, forwards it
      • Each mote can “hear” 10 other motes
      • Process:
        • Wake up, collect samples (~ 1 second)
        • Listen to radio for messages to forward (~1 second)
        • Forward data
      Energy Usage in A Typical Data Collection Scenario
    • 17. Sensors: Slow, Power Hungry, Noisy
    • 18. Programming Sensornets: TinyOS
      • Component Based Programming Model
      • Suite of software components
        • Timers, clocks, clock synchronization
        • Single and multi-hop networking
        • Power management
        • Non-volatile storage management
    • 19. Programming Philosophy
      • Component Based
        • “ Wiring” to components together via interfaces, configurations
      • Split-Phased
        • Nothing blocks, ever.
        • Instead, completion events are signaled.
      • Highly Concurrent
        • Single thread of “tasks”, posted and scheduled FIFO
        • Events “fired” asynchronously in response to interrupts.
    • 20. NesC
      • C-like programming language with component model support
        • Compiles into GCC-compatible C
      • 3 types of files:
        • Interfaces
          • Set of function prototypes; no implementations or variables
        • Modules
          • Provide (implement) zero or more interfaces
          • Require zero or more interfaces
          • May define module variables, scoped to functions in module
        • Configurations
          • Wire (connect) modules according to requires/provides relationship
    • 21. Component Example: Leds
      • module LedsC {
      • provides interface Leds;
      • }
      • implementation
      • {
      • uint8_t ledsOn;
      • enum {
      • RED_BIT = 1,
      • GREEN_BIT = 2,
      • YELLOW_BIT = 4
      • };
      … . async command result_t Leds.redOn() { dbg(DBG_LED, "LEDS: Red on. "); atomic { TOSH_CLR_RED_LED_PIN(); ledsOn |= RED_BIT; } return SUCCESS; } … . }
    • 22. Configuration Example
      • configuration CntToLedsAndRfm {
      • }
      • implementation {
      • components Main, Counter, IntToLeds, IntToRfm, TimerC;
      • Main.StdControl -> Counter.StdControl;
      • Main.StdControl -> IntToLeds.StdControl;
      • Main.StdControl -> IntToRfm.StdControl;
      • Main.StdControl -> TimerC.StdControl;
      • Counter.Timer -> TimerC.Timer[ unique ("Timer")];
      • IntToLeds <- Counter.IntOutput;
      • Counter.IntOutput -> IntToRfm;
      • }
    • 23. Split Phase Example
      • module IntToRfmM { … }
      • implementation { …
      • command result_t IntOutput.output
      • (uint16_t value) {
      • IntMsg *message = (IntMsg *)data.data;
      • if (!pending) {
      • pending = TRUE;
      • message->val = value;
      • atomic {
      • message->src = TOS_LOCAL_ADDRESS;
      • }
      • if ( call Send.send(TOS_BCAST_ADDR,
      • sizeof(IntMsg), &data))
      • return SUCCESS;
      • pending = FALSE;
      • }
      • return FAIL;
      • }
      event result_t Send.sendDone (TOS_MsgPtr msg, result_t success) { if (pending && msg == &data) { pending = FALSE; signal IntOutput.outputComplete (success); } return SUCCESS; } } }
    • 24. Major Components
      • Timers: Clock, TimerC, LogicalTime
      • Networking: Send, GenericComm, AMStandard, lib/Route
      • Power Management: HPLPowerManagement
      • Storage Management: EEPROM , MatchBox
    • 25. Timers
      • Clock : Basic abstraction over hardware timers; periodic events, single frequency.
      • LogicalTime : Fire an event some number of H:M:S:ms in the future.
      • TimerC : Multiplex multiple periodic timers on top of LogicalTime.
    • 26. Radio Stack
      • Interfaces:
        • Send
          • Broadcast, or to a specific ID
          • split phase
        • Receive
          • asynchronous signal
      • Implementations:
        • AMStandard
          • Application specific messages
          • Id-based dispatch
        • GenericComm
          • AMStandard + Serial IO
        • Lib/Route
          • Mulithop
      IntMsg *message = (IntMsg *)data.data; … message->val = value; atomic { message->src = TOS_LOCAL_ADDRESS; } call Send.send(TOS_BCAST_ADDR, sizeof(IntMsg), &data)) event TOS_MsgPtr ReceiveIntMsg. receive(TOS_MsgPtr m) { IntMsg *message = (IntMsg *)m->data; call IntOutput.output(message->val); return m; } Wiring to equate IntMsg to ReceiveIntMsg
    • 27. Multihop Networking
      • Standard implementation “tree based routing”
      Problems: Parent Selection Asymmetric Links Adaptation vs. Stability A B C D F E B B B B B B B B B B B B R:{…} R:{…} R:{…} R:{…} R:{…} Node D Neigh Qual B .75 C .66 E .45 F .82 Node C Neigh Qual A .5 B .44 D .53 F .35
    • 28. Geographic Routing
      • Any-to-any routing via geographic coordinates
        • See “GPSR”, MOBICOM 2000, Karp + Kung.
      A B
      • Requires coordinate system*
      • Requires endpont coordinates
      • Hard to route around local minima (“holes”)
      *Could be virtual, as in Rao et al “Geographic Routing Without Coordinate Information.” MOBICOM 2003
    • 29. Power Management
      • HPLPowerManagement
        • TinyOS sleeps processor when possible
        • Observes the radio, sensor, and timer state
      • Application managed, for the most part
        • App. must turn off subsystems when not in use
        • Helper utility: ServiceScheduler
          • Peridically calls the “start” and “stop” methods of an app
        • More on power management in TinyDB later
        • Approach works because:
          • single application
          • no interactivity requirements
    • 30. Non-Volatile Storage
      • EEPROM
        • 512K off chip, 32K on chip
        • Writes at disk speeds, reads at RAM speeds
        • Interface : random access, read/write 256 byte pages
        • Maximum throughput ~10Kbytes / second
      • MatchBox Filing System
        • Provides a Unix-like file I/O interface
        • Single, flat directory
        • Only one file being read/written at a time
    • 31. TinyOS: Getting Started
      • The TinyOS home page:
        • http://webs.cs.berkeley.edu/ tinyos
        • Start with the tutorials!
      • The CVS repository
        • http://sf.net/projects/tinyos
      • The NesC Project Page
        • http://sf.net/projects/nescc
      • Crossbow motes (hardware):
        • http://www.xbow.com
      • Intel Imote
        • www. intel .com/research/exploratory/motes. htm .
    • 32. Part 2 The Design and Implementation of TinyDB
    • 33. Part 2 Outline
      • TinyDB Overview
      • Data Model and Query Language
      • TinyDB Java API and Scripting
      • Demo with TinyDB GUI
      • TinyDB Internals
      • Extending TinyDB
      • TinyDB Status and Roadmap
    • 34. TinyDB Revisited SELECT MAX (mag) FROM sensors WHERE mag > thresh SAMPLE PERIOD 64ms
      • High level abstraction:
        • Data centric programming
        • Interact with sensor network as a whole
        • Extensible framework
      • Under the hood:
        • Intelligent query processing: query optimization, power efficient execution
        • Fault Mitigation: automatically introduce redundancy, avoid problem areas
      App TinyDB Query, Trigger Data Sensor Network
    • 35. Feature Overview
      • Declarative SQL-like query interface
      • Metadata catalog management
      • Multiple concurrent queries
      • Network monitoring (via queries)
      • In-network, distributed query processing
      • Extensible framework for attributes, commands and aggregates
      • In-network, persistent storage
    • 36. Architecture TinyDB GUI TinyDB Client API DBMS Sensor network TinyDB query processor 0 4 0 1 5 2 6 3 7 JDBC Mote side PC side 8
    • 37. Data Model
      • Entire sensor network as one single, infinitely-long logical table: sensors
      • Columns consist of all the attributes defined in the network
      • Typical attributes:
        • Sensor readings
        • Meta-data: node id, location, etc.
        • Internal states: routing tree parent, timestamp, queue length, etc.
      • Nodes return NULL for unknown attributes
      • On server, all attributes are defined in catalog.xml
      • Discussion: other alternative data models?
    • 38. Query Language (TinySQL)
      • SELECT <aggregates>, <attributes>
      • [FROM {sensors | <buffer>}]
      • [WHERE <predicates>]
      • [GROUP BY <exprs>]
      • [SAMPLE PERIOD <const> | ONCE]
      • [INTO <buffer>]
      • [TRIGGER ACTION <command>]
    • 39. Comparison with SQL
      • Single table in FROM clause
      • Only conjunctive comparison predicates in WHERE and HAVING
      • No subqueries
      • No column alias in SELECT clause
      • Arithmetic expressions limited to column op constant
      • Only fundamental difference: SAMPLE PERIOD clause
    • 40. TinySQL Examples
        • SELECT nodeid, nestNo, light
        • FROM sensors
        • WHERE light > 400
        • EPOCH DURATION 1s
      1 Sensors “ Find the sensors in bright nests.” 2 1 2 1 Nodeid 405 25 1 422 17 1 389 25 0 455 17 0 Light nestNo Epoch
    • 41. TinySQL Examples (cont.) “ Count the number occupied nests in each loud region of the island.” 3 3 3 3 CNT(…) 520 370 520 360 AVG(…) South 0 North 1 South 1 North region 0 Epoch
        • SELECT region, CNT (occupied) AVG (sound)
        • FROM sensors
        • GROUP BY region
        • HAVING AVG (sound) > 200
        • EPOCH DURATION 10s
      3 Regions w/ AVG(sound) > 200 SELECT AVG(sound) FROM sensors EPOCH DURATION 10s 2
    • 42. Event-based Queries
      • ON event SELECT …
      • Run query only when interesting events happens
      • Event examples
        • Button pushed
        • Message arrival
        • Bird enters nest
      • Analogous to triggers but events are user-defined
    • 43. Query over Stored Data
      • Named buffers in Flash memory
      • Store query results in buffers
      • Query over named buffers
      • Analogous to materialized views
      • Example:
        • CREATE BUFFER name SIZE x (field1 type1, field2 type2, …)
        • SELECT a1, a2 FROM sensors SAMPLE PERIOD d INTO name
        • SELECT field1, field2, … FROM name SAMPLE PERIOD d
    • 44. Using the Java API
      • SensorQueryer
        • translateQuery() converts TinySQL string into TinyDBQuery object
        • Static query optimization
      • TinyDBNetwork
        • sendQuery() injects query into network
        • abortQuery() stops a running query
        • addResultListener() adds a ResultListener that is invoked for every QueryResult received
        • removeResultListener()
      • QueryResult
        • A complete result tuple, or
        • A partial aggregate result, call mergeQueryResult() to combine partial results
      • Key difference from JDBC: push vs. pull
    • 45. Writing Scripts with TinyDB
      • TinyDB’s text interface
        • java net.tinyos.tinydb.TinyDBMain –run “select …”
        • Query results printed out to the console
        • All motes get reset each time new query is posed
      • Handy for writing scripts with shell, perl, etc.
    • 46. Using the GUI Tools
      • Demo time
    • 47. Inside TinyDB TinyOS Schema Query Processor Multihop Network Filter light > 400 TinyDB ~10,000 Lines Embedded C Code ~5,000 Lines (PC-Side) Java ~3200 Bytes RAM (w/ 768 byte heap) ~58 kB compiled code (3x larger than 2 nd largest TinyOS Program) get (‘temp’) Agg avg(temp) Queries SELECT AVG(temp) WHERE light > 400 Results T :1, AVG : 225 T :2, AVG : 250 Tables Samples got(‘temp’) Name : temp Time to sample : 50 uS Cost to sample : 90 uJ Calibration Table : 3 Units : Deg. F Error : ± 5 Deg F Get f : getTempFunc() … getTempFunc(…)
    • 48. Tree-based Routing
      • Tree-based routing
        • Used in:
          • Query delivery
          • Data collection
          • In-network aggregation
        • Relationship to indexing?
      A B C D F E Q:SELECT … Q Q Q Q Q Q Q Q Q Q Q Q R:{…} R:{…} R:{…} R:{…} R:{…}
    • 49. Power Management Approach
      • Coarse-grained app-controlled communication scheduling
      1 2 3 4 5 Mote ID time Epoch (10s -100s of seconds) 2-4s Waking Period … zzz … … zzz …
    • 50. Time Synchronization
      • All messages include a 5 byte time stamp indicating system time in ms
        • Synchronize (e.g. set system time to timestamp) with
          • Any message from parent
          • Any new query message (even if not from parent)
        • Punt on multiple queries
        • Timestamps written just after preamble is xmitted
      • All nodes agree that the waking period begins when (system time % epoch dur = 0)
        • And lasts for WAKING_PERIOD ms
      • Adjustment of clock happens by changing duration of sleep cycle, not wake cycle.
    • 51. Extending TinyDB
      • Why extending TinyDB?
        • New sensors  attributes
        • New control/actuation  commands
        • New data processing logic  aggregates
        • New events
      • Analogous to concepts in object-relational databases
    • 52. Adding Attributes
      • Types of attributes
        • Sensor attributes: raw or cooked sensor readings
        • Introspective attributes: parent, voltage, ram usage, etc.
        • Constant attributes: constant values that can be statically or dynamically assigned to a mote, e.g., nodeid, location, etc.
    • 53. Adding Attributes (cont)
      • Interfaces provided by Attr component
        • StdControl: init, start, stop
        • AttrRegister
          • command registerAttr(name, type, len)
          • event getAttr(name, resultBuf, errorPtr)
          • event setAttr(name, val)
          • command getAttrDone(name, resultBuf, error)
        • AttrUse
          • command startAttr(attr)
          • event startAttrDone(attr)
          • command getAttrValue(name, resultBuf, errorPtr)
          • event getAttrDone(name, resultBuf, error)
          • command setAttrValue(name, val)
    • 54. Adding Attributes (cont)
      • Steps to adding attributes to TinyDB
        • Create attribute nesC components
        • Wire new attribute components to TinyDBAttr configuration
        • Reprogram TinyDB motes
        • Add new attribute entries to catalog.xml
      • Constant attributes can be added on the fly through TinyDB GUI
    • 55. Adding Aggregates
      • Step 1: wire new nesC components
    • 56. Adding Aggregates (cont)
      • Step 2: add entry to catalog.xml
        • <aggregate>
        • <name>AVG</name>
        • <id>5</id>
        • <temporal>false</temporal>
        • <readerClass>net.tinyos.tinydb.AverageClass</readerClass>
        • </aggregate>
      • Step 3 (optional): implement reader class in Java
        • a reader class interprets and finalizes aggregate state received from the mote network, returns final result as a string for display.
    • 57. TinyDB Status
      • Latest released with TinyOS 1.1 (9/03)
        • Install the task-tinydb package in TinyOS 1.1 distribution
        • First release in TinyOS 1.0 (9/02)
        • Widely used by research groups as well as industry pilot projects
      • Successful deployments in Intel Berkeley Lab and redwood trees at UC Botanical Garden
        • Largest deployment: ~80 weather station nodes
        • Network longevity: 4-5 months
    • 58. The Redwood Tree Deployment
      • Redwood Grove in UC Botanical Garden, Berkeley
      • Collect dense sensor readings to monitor climatic variations across
        • altitudes,
        • angles,
        • time,
        • forest locations, etc.
      • Versus sporadic monitoring points with 30lb loggers!
      • Current focus: study how dense sensor data affect predictions of conventional tree-growth models
    • 59. Data from Redwoods 36m 33m: 111 32m: 110 30m: 109,108,107 20m: 106,105,104 10m: 103, 102, 101
    • 60. TinyDB Roadmap (near term)
      • Support for high frequency sampling
        • Equipment vibration monitoring, structural monitoring, etc.
        • Store and forward
        • Bulk reliable data transfer
        • Scheduling of communications
      • Port to Intel Mote
      • Deployment in Intel Fab equipment monitoring application and the Golden Gate Bridge monitoring application
    • 61. For more information
      • http://berkeley.intel-research.net/tinydb or http:// triplerock . cs . bekeley . edu / tinydb
    • 62. Part 3
      • Database Research Issues in Sensor Networks
    • 63. Sensor Network Research
      • Very active research area
        • Can’t summarize it all
      • Focus: database-relevant research topics
        • Some outside of Berkeley
        • Other topics that are itching to be scratched
        • But , some bias towards work that we find compelling
    • 64. Topics
      • In-network aggregation
      • Acquisitional Query Processing
      • Heterogeneity
      • Intermittent Connectivity
      • In-network Storage
      • Statistics-based summarization and sampling
      • In-network Joins
      • Adaptivity and Sensor Networks
      • Multiple Queries
    • 65. Topics
      • In-network aggregation
      • Acquisitional Query Processing
      • Heterogeneity
      • Intermittent Connectivity
      • In-network Storage
      • Statistics-based summarization and sampling
      • In-network Joins
      • Adaptivity and Sensor Networks
      • Multiple Queries
    • 66. Tiny Aggregation (TAG)
      • In-network processing of aggregates
        • Common data analysis operation
          • Aka gather operation or reduction in || programming
        • Communication reducing
          • Operator dependent benefit
        • Across nodes during same epoch
      • Exploit query semantics to improve efficiency!
      Madden, Franklin, Hellerstein, Hong. Tiny AGgregation (TAG) , OSDI 2002 .
    • 67. Basic Aggregation
      • In each epoch:
        • Each node samples local sensors once
        • Generates partial state record ( PSR )
          • local readings
          • readings from children
        • Outputs PSR during assigned comm. interval
      • At end of epoch, PSR for whole network output at root
      • New result on each successive epoch
      • Extras:
        • Predicate-based partitioning via GROUP BY
      1 2 3 4 5
    • 68. Illustration: Aggregation 1 Sensor # Interval # Interval 4 SELECT COUNT(*) FROM sensors Epoch 1 2 3 1 4 4 5 4 3 2 1 1 2 3 4 5
    • 69. Illustration: Aggregation 2 Sensor # Interval 3 SELECT COUNT(*) FROM sensors Interval # 1 2 2 3 1 4 4 5 4 3 2 1 1 2 3 4 5
    • 70. Illustration: Aggregation 3 1 Sensor # Interval 2 SELECT COUNT(*) FROM sensors Interval # 1 3 1 2 2 3 1 4 4 5 4 3 2 1 1 2 3 4 5
    • 71. Illustration: Aggregation 5 Sensor # SELECT COUNT(*) FROM sensors Interval 1 Interval # 5 1 3 1 2 2 3 1 4 4 5 4 3 2 1 1 2 3 4 5
    • 72. Illustration: Aggregation 1 Sensor # SELECT COUNT(*) FROM sensors Interval 4 Interval # 5 1 3 1 2 2 3 1 4 1 4 5 4 3 2 1 1 2 3 4 5
    • 73. Aggregation Framework
      • As in extensible databases, TinyDB supports any aggregation function conforming to:
      Agg n ={f init , f merge , f evaluate } F init {a 0 }  <a 0 > F merge {<a 1 >,<a 2 >}  <a 12 > F evaluate {<a 1 >}  aggregate value Example: Average AVG init {v}  <v,1> AVG merge {<S 1 , C 1 >, <S 2 , C 2 >}  < S 1 + S 2 , C 1 + C 2 > AVG evaluate {<S, C>}  S/C Partial State Record (PSR) Restriction : Merge associative, commutative
    • 74. Taxonomy of Aggregates
      • TAG insight: classify aggregates according to various functional properties
        • Yields a general set of optimizations that can automatically be applied
      Drives an API! Hypothesis Testing, Snooping COUNT : monotonic AVG : non-monotonic Monotonicity Routing Redundancy MIN : dup. insensitive, AVG : dup. sensitive Duplicate Sensitivity Applicability of Sampling, Effect of Loss MAX : exemplary COUNT: summary Exemplary vs. Summary Effectiveness of TAG MEDIAN : unbounded, MAX : 1 record Partial State Affects Examples Property
    • 75. Use Multiple Parents
      • Use graph structure
        • Increase delivery probability with no communication overhead
      • For duplicate insensitive aggregates, or
      • Aggs expressible as sum of parts
        • Send (part of) aggregate to all parents
          • In just one message, via multicast
        • Assuming independence , decreases variance
      SELECT COUNT(*) P(link xmit successful) = p P(success from A->R) = p 2 E(cnt) = c * p 2 Var(cnt) = c 2 * p 2 * (1 – p 2 )  V # of parents = n E(cnt) = n * (c/n * p 2 ) Var(cnt) = n * (c/n) 2 * p 2 * (1 – p 2 ) = V/n A B C R A B C c R A B C c/n c/n R n = 2
    • 76. Multiple Parents Results
      • Better than previous analysis expected!
      • Losses aren’t independent!
      • Insight: spreads data over many links
      Critical Link! No Splitting With Splitting
    • 77. Acquisitional Query Processing (ACQP)
      • TinyDB acquires AND processes data
        • Could generate an infinite number of samples
      • An acqusitional query processor controls
        • when,
        • where,
        • and with what frequency data is collected!
      • Versus traditional systems where data is provided a priori
      Madden, Franklin, Hellerstein, and Hong. The Design of An Acqusitional Query Processor. SIGMOD, 2003.
    • 78. ACQP: What’s Different?
      • How should the query be processed?
        • Sampling as a first class operation
      • How does the user control acquisition?
        • Rates or lifetimes
        • Event-based triggers
      • Which nodes have relevant data?
        • Index-like data structures
      • Which samples should be transmitted?
        • Prioritization, summary, and rate control
    • 79. Operator Ordering: Interleave Sampling + Selection
      • SELECT light, mag
      • FROM sensors
      • WHERE pred1(mag)
      • AND pred2(light)
      • EPOCH DURATION 1s
      • E(sampling mag) >> E(sampling light)
        • 1500 uJ vs. 90 uJ
      At 1 sample / sec, total power savings could be as much as 3.5mW  Comparable to processor!  (pred1)  (pred2) mag light  (pred1)  (pred2) mag light  (pred1)  (pred2) mag light Traditional DBMS ACQP Correct ordering (unless pred1 is very selective and pred2 is not): Cheap Costly
    • 80. Exemplary Aggregate Pushdown
      • SELECT WINMAX(light,8s,8s)
      • FROM sensors
      • WHERE mag > x
      • EPOCH DURATION 1s
      • Novel, general pushdown technique
      • Mag sampling is the most expensive operation!
       WINMAX  (mag>x) mag light Traditional DBMS light mag  (mag>x)  WINMAX  (light > MAX) ACQP
    • 81. Topics
      • In-network aggregation
      • Acquisitional Query Processing
      • Heterogeneity
      • Intermittent Connectivity
      • In-network Storage
      • Statistics-based summarization and sampling
      • In-network Joins
      • Adaptivity and Sensor Networks
      • Multiple Queries
    • 82. Heterogeneous Sensor Networks
      • Leverage small numbers of high-end nodes to benefit large numbers of inexpensive nodes
      • Still must be transparent and ad-hoc
      • Key to scalability of sensor networks
      • Interesting heterogeneities
        • Energy: battery vs. outlet power
        • Link bandwidth: Chipcon vs. 802.11x
        • Computing and storage: ATMega128 vs. Xscale
        • Pre-computed results
        • Sensing nodes vs. QP nodes
    • 83. Computing Heterogeneity with TinyDB
      • Separate query processing from sensing
        • Provide query processing on a small number of nodes
        • Attract packets to query processors based on “service value”
      • Compare the total energy consumption of the network
      • No aggregation
      • All aggregation
      • Opportunistic aggregation
      • HSN proactive aggregation
      Mark Yarvis and York Liu, Intel’s Heterogeneous Sensor Network Project, ftp://download.intel.com/research/people/HSN_IR_Day_Poster_03.pdf .
    • 84. 5x7 TinyDB/HSN Mica2 Testbed
    • 85. Data Packet Saving
      • How many aggregators are desired?
      • Does placement matter?
      11% aggregators achieve 72% of max data reduction Optimal placement 2/3 distance from sink.
    • 86. Occasionally Connected Sensornets TinyDB QP TinyDB QP TinyDB Server GTWY Mobile GTWY Mobile GTWY TinyDB QP GTWY internet GTWY Mobile GTWY
    • 87. Occasionally Connected Sensornets Challenges
      • Networking support
        • Tradeoff between reliability, power consumption and delay
        • Data custody transfer: duplicates?
        • Load shedding
        • Routing of mobile gateways
      • Query processing
        • Operation placement: in-network vs. on mobile gateways
        • Proactive pre-computation and data movement
      • Tight interaction between networking and QP
      Fall, Hong and Madden, Custody Transfer for Reliable Delivery in Delay Tolerant Networks, http://www.intel-research.net/Publications/Berkeley/081220030852_157.pdf .
    • 88. Distributed In-network Storage
      • Collectively, sensornets have large amounts of in-network storage
      • Good for in-network consumption or caching
      • Challenges
        • Distributed indexing for fast query dissemination
        • Resilience to node or link failures
        • Graceful adaptation to data skews
        • Minimizing index insertion/maintenance cost
    • 89. Example: DIM
      • Functionality
        • Efficient range query for multidimensional data.
      • Approaches
        • Divide sensor field into bins.
        • Locality preserving mapping from m- d space to geographic locations.
        • Use geographic routing such as GPSR .
      • Assumptions
        • Nodes know their locations and network boundary
        • No node mobility
      Xin Li, Young Jin Kim, Ramesh Govindan and Wei Hong , Distributed Index for Multi-dimentional Data (DIM) in Sensor Networks, SenSys 2003. E 2 = <0.6, 0.7> E 1 = <0.7, 0.8> Q 1 =<.5-.7, .5-1>
    • 90. Statistical Techniques
      • Approximations, summaries, and sampling based on statistics and statistical models
      • Applications:
        • Limited bandwidth and large number of nodes -> data reduction
        • Lossiness -> predictive modeling
        • Uncertainty -> tracking correlations and changes over time
        • Physical models -> improved query answering
    • 91. Correlated Attributes
      • Data in sensor networks is correlated; e.g.,
        • Temperature and voltage
        • Temperature and light
        • Temperature and humidity
        • Temperature and time of day
        • etc.
    • 92. IDSQ
      • Idea: task sensors in order of best improvement to estimate of some value:
        • Choose leader(s)
          • Suppress subordinates
          • Task subordinates, one at a time
            • Until some measure of goodness (error bound) is met
              • E.g. “Mahalanobis Distance” -- Accounts for correlations in axes, tends to favor minimizing principal axis
      See “Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks.” Chu, Haussecker and Zhao. Xerox TR P2001-10113. May, 2001.
    • 93. Model location estimate as a point with 2-dimensional Gaussian uncertainty. Graphical Representation Principal Axis S 1 Residual 1 Preferred because it reduces error along principal axis Residual 2 S 2 Area of residuals is equal
    • 94. MQSN: Model-based Probabilistic Querying over Sensor Networks Query Processor Model 1 2 3 5 6 4 7 8 9 Joint work with Amol Desphande, Carlos Guestrin, and Joe Hellerstein
    • 95. MQSN: Model-based Probabilistic Querying over Sensor Networks Query Processor Model 1 2 3 5 6 4 7 8 9 Consult Model Probabilistic Query select NodeID, Temp ± 0.1C where NodeID in [1..9] with conf(0.95) Observation Plan [Temp, 3], [Temp, 9]
    • 96. MQSN: Model-based Probabilistic Querying over Sensor Networks Query Processor Model 1 2 3 5 6 4 7 8 9 Consult Model Observation Plan [Temp, 3], [Temp, 9] Probabilistic Query select NodeID, Temp ± 0.1C where NodeID in [1..9] with conf(0.95)
    • 97. MQSN: Model-based Probabilistic Querying over Sensor Networks Query Processor Model 1 2 3 5 6 4 7 8 9 Query Results Update Model Data [Temp, 3] = …, [Temp, 9] = …
    • 98. Challenges
      • What kind of models to use ?
      • Optimization problem:
        • Given a model and a query, find the best set of attributes to observe
        • Cost not easy to measure
          • Non-uniform network communication costs
          • Changing network topologies
        • Large plan space
          • Might be cheaper to observe attributes not in query
            • e.g. Voltage instead of Temperature
          • Conditional Plans:
            • Change the observation plan based on observed values
    • 99. MQSN: Current Prototype
      • Multi-variate Gaussian Models
        • Kalman Filters to capture correlations across time
      • Handles:
        • Range predicate queries
          • sensor value within [x,y], w/ confidence
        • Value queries
          • sensor value = x, w/in epsilon, w/ confidence
        • Simple aggregate queries
          • AVG(sensor value)  n, w/in epsilon, w/confidence
      • Uses a greedy algorithm to choose the observation plan
    • 100. In-Net Regression
      • Linear regression : simple way to predict future values, identify outliers
      • Regression can be across local or remote values, multiple dimensions, or with high degree polynomials
        • E.g., node A readings vs. node B’s
        • Or, location (X,Y), versus temperature
          • E.g., over many nodes
      Guestrin, Thibaux, Bodik, Paskin, Madden. “Distributed Regression: an Efficient Framework for Modeling Sensor Network Data .” Under submission.
    • 101. In-Net Regression (Continued)
      • Problem: may require data from all sensors to build model
      • Solution: partition sensors into overlapping “kernels” that influence each other
        • Run regression in each kernel
          • Requiring just local communication
        • Blend data between kernels
        • Requires some clever matrix manipulation
      • End result: regressed model at every node
        • Useful in failure detection, missing value estimation
    • 102. Exploiting Correlations in Query Processing
      • Simple idea:
        • Given predicate P(A) over expensive attribute A
        • Replace it with P’ over cheap attribute A’ such that P’ evaluates to P
        • Problem: unless A and A’ are perfectly correlated, P’ ≠ P for all time
          • So we could incorrectly accept or reject some readings
      • Alternative: use correlations to improve selectivity estimates in query optimization
        • Construct conditional plans that vary predicate order based on prior observations
    • 103. Exploiting Correlations (Cont.)
      • Insight: by observing a (cheap and correlated) variable not involved in the query, it may be possible to improve query performance
        • Improves estimates of selectivities
      • Use conditional plans
      • Example
      Light > 100 Lux Temp < 20° C Cost = 100 Selectivity = .5 Cost = 100 Selectivity = .5 Expected Cost = 150 Light > 100 Lux Temp < 20° C Cost = 100 Selectivity = .5 Cost = 100 Selectivity = .5 Expected Cost = 150 Light > 100 Lux Temp < 20° C Cost = 100 Selectivity = .1 Cost = 100 Selectivity = .9 Expected Cost = 110 Light > 100 Lux Temp < 20° C Cost = 100 Selectivity = .1 Cost = 100 Selectivity = .9 Expected Cost = 110 Time in [6pm, 6am] T F
    • 104. In-Network Join Strategies
      • Types of joins:
        • non-sensor -> sensor
        • sensor -> sensor
      • Optimization questions:
        • Should the join be pushed down?
        • If so, where should it be placed?
        • What if a join table exceeds the memory available on one node?
    • 105. Choosing Where to Place Operators
      • Idea : choose a “join node” to run the operator
      • Over time, explore other candidate placements
        • Nodes advertise data rates to their neighbors
        • Neighbors compute expected cost of running the join based on these rates
        • Neighbors advertise costs
        • Current join node selects a new, lower cost node
      Bonfils + Bonnet, Adaptive and Decentralized Operator Placement for In-Network QueryProcessing IPSN 2003.
    • 106. Topics
      • In-network aggregation
      • Acquisitional Query Processing
      • Heterogeneity
      • Intermittent Connectivity
      • In-network Storage
      • Statistics-based summarization and sampling
      • In-network Joins
      • Adaptivity and Sensor Networks
      • Multiple Queries
    • 107. Adaptivity In Sensor Networks
      • Queries are long running
      • Selectivities change
        • E.g. night vs day
      • Network load and available energy vary
      • All suggest that some adaptivity is needed
        • Of data rates or granularity of aggregation when optimizing for lifetimes
        • Of operator orderings or placements when selectivities change (c.f., conditional plans for correlations)
      • As far as we know, this is an open problem!
    • 108. Multiple Queries and Work Sharing
      • As sensornets evolve, users will run many queries simultaneously
        • E.g., traffic monitoring
      • Likely that queries will be similar
        • But have different end points, parameters, etc
      • Would like to share processing, routing as much as possible
      • But how? Again, an open problem.
    • 109. Concluding Remarks
      • Sensor networks are an exciting emerging technology, with a wide variety of applications
      • Many research challenges in all areas of computer science
        • Database community included
        • Some agreement that a declarative interface is right
      • TinyDB and other early work are an important first step
      • But there’s lots more to be done!