TinyDB Tutorial


Published on

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Build?
  • Light = .6mJ/sample (~1S per acquisition) Pressure = .00875 mJ/sample (~.025S) Each sensor with 10 neighbors, 1 Child Processor : 1s waiting for samples, 1s to receive/forward data = 10mJ Radio : .07mJ sampling, ~1mJ receive, ~.2 mJ sending 84% energy processing, 5% sensing, 11% radio Idle time sensing, communicating dominates power
  • Parameterized interfaces? Needed for LogicalTime, TimerC
  • Comm interval = slot within longer epoch Tell you in a few moments how comm intervals are assigned Show you basic aggregation process
  • Say something about a new node arriving, or a node leaving Notice number of empty slots
  • Move out of experiments! Properties give us: effective, transparent optimization Extensibility
  • Cut math
  • details
  • TinyDB Tutorial

    1. 1. Implementation and Research Issues in Query Processing for Wireless Sensor Networks Wei Hong Intel Research, Berkeley [email_address] Sam Madden MIT [email_address] ICDE 2004
    2. 2. Motivation <ul><li>Sensor networks (aka sensor webs, emnets) are here </li></ul><ul><ul><li>Several widely deployed HW/SW platforms </li></ul></ul><ul><ul><ul><li>Low power radio, small processor, RAM/Flash </li></ul></ul></ul><ul><ul><li>Variety of (novel) applications: scientific, industrial, commercial </li></ul></ul><ul><ul><li>Great platform for mobile + ubicomp experimentation </li></ul></ul><ul><li>Real, hard research problems to be solved </li></ul><ul><ul><li>Networking, systems, languages, databases </li></ul></ul><ul><li>We will summarize: </li></ul><ul><ul><li>The state of the art </li></ul></ul><ul><ul><li>Our experiences building TinyDB </li></ul></ul><ul><ul><li>Current and future research directions </li></ul></ul>Berkeley Mote
    3. 3. Sensor Network Apps Habitat Monitoring : Storm petrels on Great Duck Island, microclimates on James Reserve. Traditional monitoring apparatus. Earthquake monitoring in shake-test sites. Vehicle detection : sensors along a road, collect data about passing vehicles.
    4. 4. Declarative Queries <ul><li>Programming Apps is Hard </li></ul><ul><ul><li>Limited power budget </li></ul></ul><ul><ul><li>Lossy, low bandwidth communication </li></ul></ul><ul><ul><li>Require long-lived, zero admin deployments </li></ul></ul><ul><ul><li>Distributed Algorithms </li></ul></ul><ul><ul><li>Limited tools, debugging interfaces </li></ul></ul><ul><li>Queries abstract away much of the complexity </li></ul><ul><ul><li>Burden on the database developers </li></ul></ul><ul><ul><li>Users get: </li></ul></ul><ul><ul><ul><li>Safe, optimizable programs </li></ul></ul></ul><ul><ul><ul><li>Freedom to think about apps instead of details </li></ul></ul></ul>
    5. 5. TinyDB: Prototype declarative query processor <ul><li>Platform: Berkeley Motes + TinyOS </li></ul><ul><li>Continuous variant of SQL : TinySQL </li></ul><ul><li>Power and data-acquisition based in-network optimization framework </li></ul><ul><li>Extensible interface for aggregates, new types of sensors </li></ul>
    6. 6. Agenda <ul><li>Part 1 : Sensor Networks (50 Minutes) </li></ul><ul><ul><li>TinyOS </li></ul></ul><ul><ul><li>NesC </li></ul></ul><ul><li>Short Break </li></ul><ul><li>Part 2: TinyDB (1 Hour) </li></ul><ul><ul><li>Data Model and Query Language </li></ul></ul><ul><ul><li>Software Architecture </li></ul></ul><ul><li>Long Break + Hands On </li></ul><ul><li>Part 3: Sensor Network Database Research Directions (1 Hour, 10 Minutes) </li></ul>
    7. 7. Part 1 <ul><li>Sensornet Background </li></ul><ul><li>Motes + Mote Hardware </li></ul><ul><ul><li>TinyOS </li></ul></ul><ul><ul><li>Programming Model + NesC </li></ul></ul><ul><li>TinyOS Architecture </li></ul><ul><ul><li>Major Software Subsystems </li></ul></ul><ul><ul><li>Networking Services </li></ul></ul>
    8. 8. A Brief History of Sensornets <ul><li>People have used sensors for a long time </li></ul><ul><li>Recent CS History: </li></ul><ul><ul><li>(1998) Pottie + Kaiser: Radio based networks of sensors </li></ul></ul><ul><ul><li>(1998) Pister et al: Smart Dust </li></ul></ul><ul><ul><ul><li>Initial focus on optical communication </li></ul></ul></ul><ul><ul><ul><li>By 1999, radio based networks, COTS Dust, “Motes” </li></ul></ul></ul><ul><ul><li>(1999) Estrin + Govindan </li></ul></ul><ul><ul><ul><li>Ad-hoc networks of sensors </li></ul></ul></ul><ul><ul><li>(2000) Culler/Hill et al: TinyOS + Motes </li></ul></ul><ul><ul><li>(2002) Hill / Dust: SPEC, mm^3 scale computing </li></ul></ul><ul><li>UCLA / USC / Berkeley Continue to Lead Research </li></ul><ul><ul><li>Many other players now </li></ul></ul><ul><ul><li>TinyOS/Motes as most common platform </li></ul></ul><ul><li>Emerging commercial space: </li></ul><ul><ul><li>Crossbow, Ember, Dust, Sensicast, Moteiv, Intel </li></ul></ul>
    9. 9. Why Now? <ul><li>Commoditization of radio hardware </li></ul><ul><ul><li>Cellular and cordless phones, wireless communication </li></ul></ul><ul><li>Low cost -> many/tiny -> new applications! </li></ul><ul><li>Real application for ad-hoc network research from the late 90’s </li></ul><ul><li>Coming together of EE + CS communities </li></ul>
    10. 10. Motes 4Mhz, 8 bit Atmel RISC uProc 40 kbit Radio 4 K RAM, 128 K Program Flash, 512 K Data Flash AA battery pack Based on TinyOS Mica Mote Mica2Dot
    11. 11. History of Motes <ul><li>Initial research goal wasn’t hardware </li></ul><ul><ul><li>Has since become more of a priority with emerging hardware needs, e.g.: </li></ul></ul><ul><ul><ul><li>Power consumption </li></ul></ul></ul><ul><ul><ul><li>(Ultrasonic) ranging + localization </li></ul></ul></ul><ul><ul><ul><ul><li>MIT Cricket, NEST Project </li></ul></ul></ul></ul><ul><ul><ul><li>Connectivity with diverse sensors </li></ul></ul></ul><ul><ul><ul><ul><li>UCLA sensor board </li></ul></ul></ul></ul><ul><ul><li>Even so, now on the 5th generation of devices </li></ul></ul><ul><ul><ul><li>Costs down to ~$50/node (Moteiv, Dust) </li></ul></ul></ul><ul><ul><ul><li>Greatly improved radio quality </li></ul></ul></ul><ul><ul><ul><li>Multitude of interfaces: USB, Ethernet, CF, etc. </li></ul></ul></ul><ul><ul><ul><li>Variety of form factors, packages </li></ul></ul></ul>
    12. 12. Motes vs. Traditional Computing <ul><li>Lossy, Adhoc Radio Communication </li></ul><ul><li>Sensing Hardware </li></ul><ul><li>Severe Power Constraints </li></ul>
    13. 13. Radio Communication <ul><li>Low Bandwidth Shared Radio Channel </li></ul><ul><ul><li>~40kBits on motes </li></ul></ul><ul><ul><li>Much less in practice </li></ul></ul><ul><ul><ul><li>Encoding, Contention for Media Access (MAC) </li></ul></ul></ul><ul><li>Very lossy: 30% base loss rate </li></ul><ul><ul><li>Argues against TCP-like end-to-end retransmission </li></ul></ul><ul><ul><ul><li>And for link-layer retries </li></ul></ul></ul><ul><li>Generally, not well behaved </li></ul>From Ganesan, et al. “Complex Behavior at Scale.” UCLA/CSD-TR 02-0013
    14. 14. Types of Sensors <ul><li>Sensors attach via daughtercard </li></ul><ul><li>Weather </li></ul><ul><ul><li>Temperature </li></ul></ul><ul><ul><li>Light x 2 (high intensity PAR, low intensity, full spectrum) </li></ul></ul><ul><ul><li>Air Pressure </li></ul></ul><ul><ul><li>Humidity </li></ul></ul><ul><li>Vibration </li></ul><ul><ul><li>2 or 3 axis accelerometers </li></ul></ul><ul><li>Tracking </li></ul><ul><ul><li>Microphone (for ranging and acoustic signatures) </li></ul></ul><ul><ul><li>Magnetometer </li></ul></ul><ul><li>GPS </li></ul>
    15. 15. Power Consumption and Lifetime <ul><li>Power typically supplied by a small battery </li></ul><ul><ul><li>1000-2000 mAH </li></ul></ul><ul><ul><li>1 mAH = 1 milliamp current for 1 hour </li></ul></ul><ul><ul><ul><li>Typically at optimum voltage, current drain rates </li></ul></ul></ul><ul><ul><li>Power = Watts (W) = Amps (A) * Volts (V) </li></ul></ul><ul><ul><li>Energy = Joules (J) = W * time </li></ul></ul><ul><li>Lifetime, power consumption varies by application </li></ul><ul><ul><li>Processor: 5mA active, 1 mA idle, 5 uA sleeping </li></ul></ul><ul><ul><li>Radio: 5 mA listen, 10 mA xmit/receive, ~20mS / packet </li></ul></ul><ul><ul><li>Sensors: 1 uA -> 100’s mA, 1 uS -> 1 S / sample </li></ul></ul>
    16. 16. <ul><li>Each mote collects 1 sample of (light,humidity) data every 10 seconds, forwards it </li></ul><ul><li>Each mote can “hear” 10 other motes </li></ul><ul><li>Process: </li></ul><ul><ul><li>Wake up, collect samples (~ 1 second) </li></ul></ul><ul><ul><li>Listen to radio for messages to forward (~1 second) </li></ul></ul><ul><ul><li>Forward data </li></ul></ul>Energy Usage in A Typical Data Collection Scenario
    17. 17. Sensors: Slow, Power Hungry, Noisy
    18. 18. Programming Sensornets: TinyOS <ul><li>Component Based Programming Model </li></ul><ul><li>Suite of software components </li></ul><ul><ul><li>Timers, clocks, clock synchronization </li></ul></ul><ul><ul><li>Single and multi-hop networking </li></ul></ul><ul><ul><li>Power management </li></ul></ul><ul><ul><li>Non-volatile storage management </li></ul></ul>
    19. 19. Programming Philosophy <ul><li>Component Based </li></ul><ul><ul><li>“ Wiring” to components together via interfaces, configurations </li></ul></ul><ul><li>Split-Phased </li></ul><ul><ul><li>Nothing blocks, ever. </li></ul></ul><ul><ul><li>Instead, completion events are signaled. </li></ul></ul><ul><li>Highly Concurrent </li></ul><ul><ul><li>Single thread of “tasks”, posted and scheduled FIFO </li></ul></ul><ul><ul><li>Events “fired” asynchronously in response to interrupts. </li></ul></ul>
    20. 20. NesC <ul><li>C-like programming language with component model support </li></ul><ul><ul><li>Compiles into GCC-compatible C </li></ul></ul><ul><li>3 types of files: </li></ul><ul><ul><li>Interfaces </li></ul></ul><ul><ul><ul><li>Set of function prototypes; no implementations or variables </li></ul></ul></ul><ul><ul><li>Modules </li></ul></ul><ul><ul><ul><li>Provide (implement) zero or more interfaces </li></ul></ul></ul><ul><ul><ul><li>Require zero or more interfaces </li></ul></ul></ul><ul><ul><ul><li>May define module variables, scoped to functions in module </li></ul></ul></ul><ul><ul><li>Configurations </li></ul></ul><ul><ul><ul><li>Wire (connect) modules according to requires/provides relationship </li></ul></ul></ul>
    21. 21. Component Example: Leds <ul><li>module LedsC { </li></ul><ul><li>provides interface Leds; </li></ul><ul><li>} </li></ul><ul><li>implementation </li></ul><ul><li>{ </li></ul><ul><li>uint8_t ledsOn; </li></ul><ul><li>enum { </li></ul><ul><li>RED_BIT = 1, </li></ul><ul><li>GREEN_BIT = 2, </li></ul><ul><li>YELLOW_BIT = 4 </li></ul><ul><li>}; </li></ul>… . async command result_t Leds.redOn() { dbg(DBG_LED, &quot;LEDS: Red on. &quot;); atomic { TOSH_CLR_RED_LED_PIN(); ledsOn |= RED_BIT; } return SUCCESS; } … . }
    22. 22. Configuration Example <ul><li>configuration CntToLedsAndRfm { </li></ul><ul><li>} </li></ul><ul><li>implementation { </li></ul><ul><li>components Main, Counter, IntToLeds, IntToRfm, TimerC; </li></ul><ul><li>Main.StdControl -> Counter.StdControl; </li></ul><ul><li>Main.StdControl -> IntToLeds.StdControl; </li></ul><ul><li>Main.StdControl -> IntToRfm.StdControl; </li></ul><ul><li>Main.StdControl -> TimerC.StdControl; </li></ul><ul><li>Counter.Timer -> TimerC.Timer[ unique (&quot;Timer&quot;)]; </li></ul><ul><li>IntToLeds <- Counter.IntOutput; </li></ul><ul><li>Counter.IntOutput -> IntToRfm; </li></ul><ul><li>} </li></ul>
    23. 23. Split Phase Example <ul><li>module IntToRfmM { … } </li></ul><ul><li>implementation { … </li></ul><ul><li>command result_t IntOutput.output </li></ul><ul><li>(uint16_t value) { </li></ul><ul><li>IntMsg *message = (IntMsg *)data.data; </li></ul><ul><li>if (!pending) { </li></ul><ul><li>pending = TRUE; </li></ul><ul><li>message->val = value; </li></ul><ul><li>atomic { </li></ul><ul><li>message->src = TOS_LOCAL_ADDRESS; </li></ul><ul><li>} </li></ul><ul><li>if ( call Send.send(TOS_BCAST_ADDR, </li></ul><ul><li>sizeof(IntMsg), &data)) </li></ul><ul><li>return SUCCESS; </li></ul><ul><li>pending = FALSE; </li></ul><ul><li>} </li></ul><ul><li>return FAIL; </li></ul><ul><li>} </li></ul>event result_t Send.sendDone (TOS_MsgPtr msg, result_t success) { if (pending && msg == &data) { pending = FALSE; signal IntOutput.outputComplete (success); } return SUCCESS; } } }
    24. 24. Major Components <ul><li>Timers: Clock, TimerC, LogicalTime </li></ul><ul><li>Networking: Send, GenericComm, AMStandard, lib/Route </li></ul><ul><li>Power Management: HPLPowerManagement </li></ul><ul><li>Storage Management: EEPROM , MatchBox </li></ul>
    25. 25. Timers <ul><li>Clock : Basic abstraction over hardware timers; periodic events, single frequency. </li></ul><ul><li>LogicalTime : Fire an event some number of H:M:S:ms in the future. </li></ul><ul><li>TimerC : Multiplex multiple periodic timers on top of LogicalTime. </li></ul>
    26. 26. Radio Stack <ul><li>Interfaces: </li></ul><ul><ul><li>Send </li></ul></ul><ul><ul><ul><li>Broadcast, or to a specific ID </li></ul></ul></ul><ul><ul><ul><li>split phase </li></ul></ul></ul><ul><ul><li>Receive </li></ul></ul><ul><ul><ul><li>asynchronous signal </li></ul></ul></ul><ul><li>Implementations: </li></ul><ul><ul><li>AMStandard </li></ul></ul><ul><ul><ul><li>Application specific messages </li></ul></ul></ul><ul><ul><ul><li>Id-based dispatch </li></ul></ul></ul><ul><ul><li>GenericComm </li></ul></ul><ul><ul><ul><li>AMStandard + Serial IO </li></ul></ul></ul><ul><ul><li>Lib/Route </li></ul></ul><ul><ul><ul><li>Mulithop </li></ul></ul></ul>IntMsg *message = (IntMsg *)data.data; … message->val = value; atomic { message->src = TOS_LOCAL_ADDRESS; } call Send.send(TOS_BCAST_ADDR, sizeof(IntMsg), &data)) event TOS_MsgPtr ReceiveIntMsg. receive(TOS_MsgPtr m) { IntMsg *message = (IntMsg *)m->data; call IntOutput.output(message->val); return m; } Wiring to equate IntMsg to ReceiveIntMsg
    27. 27. Multihop Networking <ul><li>Standard implementation “tree based routing” </li></ul>Problems: Parent Selection Asymmetric Links Adaptation vs. Stability A B C D F E B B B B B B B B B B B B R:{…} R:{…} R:{…} R:{…} R:{…} Node D Neigh Qual B .75 C .66 E .45 F .82 Node C Neigh Qual A .5 B .44 D .53 F .35
    28. 28. Geographic Routing <ul><li>Any-to-any routing via geographic coordinates </li></ul><ul><ul><li>See “GPSR”, MOBICOM 2000, Karp + Kung. </li></ul></ul>A B <ul><li>Requires coordinate system* </li></ul><ul><li>Requires endpont coordinates </li></ul><ul><li>Hard to route around local minima (“holes”) </li></ul>*Could be virtual, as in Rao et al “Geographic Routing Without Coordinate Information.” MOBICOM 2003
    29. 29. Power Management <ul><li>HPLPowerManagement </li></ul><ul><ul><li>TinyOS sleeps processor when possible </li></ul></ul><ul><ul><li>Observes the radio, sensor, and timer state </li></ul></ul><ul><li>Application managed, for the most part </li></ul><ul><ul><li>App. must turn off subsystems when not in use </li></ul></ul><ul><ul><li>Helper utility: ServiceScheduler </li></ul></ul><ul><ul><ul><li>Peridically calls the “start” and “stop” methods of an app </li></ul></ul></ul><ul><ul><li>More on power management in TinyDB later </li></ul></ul><ul><ul><li>Approach works because: </li></ul></ul><ul><ul><ul><li>single application </li></ul></ul></ul><ul><ul><ul><li>no interactivity requirements </li></ul></ul></ul>
    30. 30. Non-Volatile Storage <ul><li>EEPROM </li></ul><ul><ul><li>512K off chip, 32K on chip </li></ul></ul><ul><ul><li>Writes at disk speeds, reads at RAM speeds </li></ul></ul><ul><ul><li>Interface : random access, read/write 256 byte pages </li></ul></ul><ul><ul><li>Maximum throughput ~10Kbytes / second </li></ul></ul><ul><li>MatchBox Filing System </li></ul><ul><ul><li>Provides a Unix-like file I/O interface </li></ul></ul><ul><ul><li>Single, flat directory </li></ul></ul><ul><ul><li>Only one file being read/written at a time </li></ul></ul>
    31. 31. TinyOS: Getting Started <ul><li>The TinyOS home page: </li></ul><ul><ul><li>http://webs.cs.berkeley.edu/ tinyos </li></ul></ul><ul><ul><li>Start with the tutorials! </li></ul></ul><ul><li>The CVS repository </li></ul><ul><ul><li>http://sf.net/projects/tinyos </li></ul></ul><ul><li>The NesC Project Page </li></ul><ul><ul><li>http://sf.net/projects/nescc </li></ul></ul><ul><li>Crossbow motes (hardware): </li></ul><ul><ul><li>http://www.xbow.com </li></ul></ul><ul><li>Intel Imote </li></ul><ul><ul><li>www. intel .com/research/exploratory/motes. htm . </li></ul></ul>
    32. 32. Part 2 The Design and Implementation of TinyDB
    33. 33. Part 2 Outline <ul><li>TinyDB Overview </li></ul><ul><li>Data Model and Query Language </li></ul><ul><li>TinyDB Java API and Scripting </li></ul><ul><li>Demo with TinyDB GUI </li></ul><ul><li>TinyDB Internals </li></ul><ul><li>Extending TinyDB </li></ul><ul><li>TinyDB Status and Roadmap </li></ul>
    34. 34. TinyDB Revisited SELECT MAX (mag) FROM sensors WHERE mag > thresh SAMPLE PERIOD 64ms <ul><li>High level abstraction: </li></ul><ul><ul><li>Data centric programming </li></ul></ul><ul><ul><li>Interact with sensor network as a whole </li></ul></ul><ul><ul><li>Extensible framework </li></ul></ul><ul><li>Under the hood: </li></ul><ul><ul><li>Intelligent query processing: query optimization, power efficient execution </li></ul></ul><ul><ul><li>Fault Mitigation: automatically introduce redundancy, avoid problem areas </li></ul></ul>App TinyDB Query, Trigger Data Sensor Network
    35. 35. Feature Overview <ul><li>Declarative SQL-like query interface </li></ul><ul><li>Metadata catalog management </li></ul><ul><li>Multiple concurrent queries </li></ul><ul><li>Network monitoring (via queries) </li></ul><ul><li>In-network, distributed query processing </li></ul><ul><li>Extensible framework for attributes, commands and aggregates </li></ul><ul><li>In-network, persistent storage </li></ul>
    36. 36. Architecture TinyDB GUI TinyDB Client API DBMS Sensor network TinyDB query processor 0 4 0 1 5 2 6 3 7 JDBC Mote side PC side 8
    37. 37. Data Model <ul><li>Entire sensor network as one single, infinitely-long logical table: sensors </li></ul><ul><li>Columns consist of all the attributes defined in the network </li></ul><ul><li>Typical attributes: </li></ul><ul><ul><li>Sensor readings </li></ul></ul><ul><ul><li>Meta-data: node id, location, etc. </li></ul></ul><ul><ul><li>Internal states: routing tree parent, timestamp, queue length, etc. </li></ul></ul><ul><li>Nodes return NULL for unknown attributes </li></ul><ul><li>On server, all attributes are defined in catalog.xml </li></ul><ul><li>Discussion: other alternative data models? </li></ul>
    38. 38. Query Language (TinySQL) <ul><li>SELECT <aggregates>, <attributes> </li></ul><ul><li>[FROM {sensors | <buffer>}] </li></ul><ul><li>[WHERE <predicates>] </li></ul><ul><li>[GROUP BY <exprs>] </li></ul><ul><li>[SAMPLE PERIOD <const> | ONCE] </li></ul><ul><li>[INTO <buffer>] </li></ul><ul><li>[TRIGGER ACTION <command>] </li></ul>
    39. 39. Comparison with SQL <ul><li>Single table in FROM clause </li></ul><ul><li>Only conjunctive comparison predicates in WHERE and HAVING </li></ul><ul><li>No subqueries </li></ul><ul><li>No column alias in SELECT clause </li></ul><ul><li>Arithmetic expressions limited to column op constant </li></ul><ul><li>Only fundamental difference: SAMPLE PERIOD clause </li></ul>
    40. 40. TinySQL Examples <ul><ul><li>SELECT nodeid, nestNo, light </li></ul></ul><ul><ul><li>FROM sensors </li></ul></ul><ul><ul><li>WHERE light > 400 </li></ul></ul><ul><ul><li>EPOCH DURATION 1s </li></ul></ul>1 Sensors “ Find the sensors in bright nests.” 2 1 2 1 Nodeid 405 25 1 422 17 1 389 25 0 455 17 0 Light nestNo Epoch
    41. 41. TinySQL Examples (cont.) “ Count the number occupied nests in each loud region of the island.” 3 3 3 3 CNT(…) 520 370 520 360 AVG(…) South 0 North 1 South 1 North region 0 Epoch <ul><ul><li>SELECT region, CNT (occupied) AVG (sound) </li></ul></ul><ul><ul><li>FROM sensors </li></ul></ul><ul><ul><li>GROUP BY region </li></ul></ul><ul><ul><li>HAVING AVG (sound) > 200 </li></ul></ul><ul><ul><li>EPOCH DURATION 10s </li></ul></ul>3 Regions w/ AVG(sound) > 200 SELECT AVG(sound) FROM sensors EPOCH DURATION 10s 2
    42. 42. Event-based Queries <ul><li>ON event SELECT … </li></ul><ul><li>Run query only when interesting events happens </li></ul><ul><li>Event examples </li></ul><ul><ul><li>Button pushed </li></ul></ul><ul><ul><li>Message arrival </li></ul></ul><ul><ul><li>Bird enters nest </li></ul></ul><ul><li>Analogous to triggers but events are user-defined </li></ul>
    43. 43. Query over Stored Data <ul><li>Named buffers in Flash memory </li></ul><ul><li>Store query results in buffers </li></ul><ul><li>Query over named buffers </li></ul><ul><li>Analogous to materialized views </li></ul><ul><li>Example: </li></ul><ul><ul><li>CREATE BUFFER name SIZE x (field1 type1, field2 type2, …) </li></ul></ul><ul><ul><li>SELECT a1, a2 FROM sensors SAMPLE PERIOD d INTO name </li></ul></ul><ul><ul><li>SELECT field1, field2, … FROM name SAMPLE PERIOD d </li></ul></ul>
    44. 44. Using the Java API <ul><li>SensorQueryer </li></ul><ul><ul><li>translateQuery() converts TinySQL string into TinyDBQuery object </li></ul></ul><ul><ul><li>Static query optimization </li></ul></ul><ul><li>TinyDBNetwork </li></ul><ul><ul><li>sendQuery() injects query into network </li></ul></ul><ul><ul><li>abortQuery() stops a running query </li></ul></ul><ul><ul><li>addResultListener() adds a ResultListener that is invoked for every QueryResult received </li></ul></ul><ul><ul><li>removeResultListener() </li></ul></ul><ul><li>QueryResult </li></ul><ul><ul><li>A complete result tuple, or </li></ul></ul><ul><ul><li>A partial aggregate result, call mergeQueryResult() to combine partial results </li></ul></ul><ul><li>Key difference from JDBC: push vs. pull </li></ul>
    45. 45. Writing Scripts with TinyDB <ul><li>TinyDB’s text interface </li></ul><ul><ul><li>java net.tinyos.tinydb.TinyDBMain –run “select …” </li></ul></ul><ul><ul><li>Query results printed out to the console </li></ul></ul><ul><ul><li>All motes get reset each time new query is posed </li></ul></ul><ul><li>Handy for writing scripts with shell, perl, etc. </li></ul>
    46. 46. Using the GUI Tools <ul><li>Demo time </li></ul>
    47. 47. Inside TinyDB TinyOS Schema Query Processor Multihop Network Filter light > 400 TinyDB ~10,000 Lines Embedded C Code ~5,000 Lines (PC-Side) Java ~3200 Bytes RAM (w/ 768 byte heap) ~58 kB compiled code (3x larger than 2 nd largest TinyOS Program) get (‘temp’) Agg avg(temp) Queries SELECT AVG(temp) WHERE light > 400 Results T :1, AVG : 225 T :2, AVG : 250 Tables Samples got(‘temp’) Name : temp Time to sample : 50 uS Cost to sample : 90 uJ Calibration Table : 3 Units : Deg. F Error : ± 5 Deg F Get f : getTempFunc() … getTempFunc(…)
    48. 48. Tree-based Routing <ul><li>Tree-based routing </li></ul><ul><ul><li>Used in: </li></ul></ul><ul><ul><ul><li>Query delivery </li></ul></ul></ul><ul><ul><ul><li>Data collection </li></ul></ul></ul><ul><ul><ul><li>In-network aggregation </li></ul></ul></ul><ul><ul><li>Relationship to indexing? </li></ul></ul>A B C D F E Q:SELECT … Q Q Q Q Q Q Q Q Q Q Q Q R:{…} R:{…} R:{…} R:{…} R:{…}
    49. 49. Power Management Approach <ul><li>Coarse-grained app-controlled communication scheduling </li></ul>1 2 3 4 5 Mote ID time Epoch (10s -100s of seconds) 2-4s Waking Period … zzz … … zzz …
    50. 50. Time Synchronization <ul><li>All messages include a 5 byte time stamp indicating system time in ms </li></ul><ul><ul><li>Synchronize (e.g. set system time to timestamp) with </li></ul></ul><ul><ul><ul><li>Any message from parent </li></ul></ul></ul><ul><ul><ul><li>Any new query message (even if not from parent) </li></ul></ul></ul><ul><ul><li>Punt on multiple queries </li></ul></ul><ul><ul><li>Timestamps written just after preamble is xmitted </li></ul></ul><ul><li>All nodes agree that the waking period begins when (system time % epoch dur = 0) </li></ul><ul><ul><li>And lasts for WAKING_PERIOD ms </li></ul></ul><ul><li>Adjustment of clock happens by changing duration of sleep cycle, not wake cycle. </li></ul>
    51. 51. Extending TinyDB <ul><li>Why extending TinyDB? </li></ul><ul><ul><li>New sensors  attributes </li></ul></ul><ul><ul><li>New control/actuation  commands </li></ul></ul><ul><ul><li>New data processing logic  aggregates </li></ul></ul><ul><ul><li>New events </li></ul></ul><ul><li>Analogous to concepts in object-relational databases </li></ul>
    52. 52. Adding Attributes <ul><li>Types of attributes </li></ul><ul><ul><li>Sensor attributes: raw or cooked sensor readings </li></ul></ul><ul><ul><li>Introspective attributes: parent, voltage, ram usage, etc. </li></ul></ul><ul><ul><li>Constant attributes: constant values that can be statically or dynamically assigned to a mote, e.g., nodeid, location, etc. </li></ul></ul>
    53. 53. Adding Attributes (cont) <ul><li>Interfaces provided by Attr component </li></ul><ul><ul><li>StdControl: init, start, stop </li></ul></ul><ul><ul><li>AttrRegister </li></ul></ul><ul><ul><ul><li>command registerAttr(name, type, len) </li></ul></ul></ul><ul><ul><ul><li>event getAttr(name, resultBuf, errorPtr) </li></ul></ul></ul><ul><ul><ul><li>event setAttr(name, val) </li></ul></ul></ul><ul><ul><ul><li>command getAttrDone(name, resultBuf, error) </li></ul></ul></ul><ul><ul><li>AttrUse </li></ul></ul><ul><ul><ul><li>command startAttr(attr) </li></ul></ul></ul><ul><ul><ul><li>event startAttrDone(attr) </li></ul></ul></ul><ul><ul><ul><li>command getAttrValue(name, resultBuf, errorPtr) </li></ul></ul></ul><ul><ul><ul><li>event getAttrDone(name, resultBuf, error) </li></ul></ul></ul><ul><ul><ul><li>command setAttrValue(name, val) </li></ul></ul></ul>
    54. 54. Adding Attributes (cont) <ul><li>Steps to adding attributes to TinyDB </li></ul><ul><ul><li>Create attribute nesC components </li></ul></ul><ul><ul><li>Wire new attribute components to TinyDBAttr configuration </li></ul></ul><ul><ul><li>Reprogram TinyDB motes </li></ul></ul><ul><ul><li>Add new attribute entries to catalog.xml </li></ul></ul><ul><li>Constant attributes can be added on the fly through TinyDB GUI </li></ul>
    55. 55. Adding Aggregates <ul><li>Step 1: wire new nesC components </li></ul>
    56. 56. Adding Aggregates (cont) <ul><li>Step 2: add entry to catalog.xml </li></ul><ul><ul><li><aggregate> </li></ul></ul><ul><ul><li><name>AVG</name> </li></ul></ul><ul><ul><li><id>5</id> </li></ul></ul><ul><ul><li><temporal>false</temporal> </li></ul></ul><ul><ul><li><readerClass>net.tinyos.tinydb.AverageClass</readerClass> </li></ul></ul><ul><ul><li></aggregate> </li></ul></ul><ul><li>Step 3 (optional): implement reader class in Java </li></ul><ul><ul><li>a reader class interprets and finalizes aggregate state received from the mote network, returns final result as a string for display. </li></ul></ul>
    57. 57. TinyDB Status <ul><li>Latest released with TinyOS 1.1 (9/03) </li></ul><ul><ul><li>Install the task-tinydb package in TinyOS 1.1 distribution </li></ul></ul><ul><ul><li>First release in TinyOS 1.0 (9/02) </li></ul></ul><ul><ul><li>Widely used by research groups as well as industry pilot projects </li></ul></ul><ul><li>Successful deployments in Intel Berkeley Lab and redwood trees at UC Botanical Garden </li></ul><ul><ul><li>Largest deployment: ~80 weather station nodes </li></ul></ul><ul><ul><li>Network longevity: 4-5 months </li></ul></ul>
    58. 58. The Redwood Tree Deployment <ul><li>Redwood Grove in UC Botanical Garden, Berkeley </li></ul><ul><li>Collect dense sensor readings to monitor climatic variations across </li></ul><ul><ul><li>altitudes, </li></ul></ul><ul><ul><li>angles, </li></ul></ul><ul><ul><li>time, </li></ul></ul><ul><ul><li>forest locations, etc. </li></ul></ul><ul><li>Versus sporadic monitoring points with 30lb loggers! </li></ul><ul><li>Current focus: study how dense sensor data affect predictions of conventional tree-growth models </li></ul>
    59. 59. Data from Redwoods 36m 33m: 111 32m: 110 30m: 109,108,107 20m: 106,105,104 10m: 103, 102, 101
    60. 60. TinyDB Roadmap (near term) <ul><li>Support for high frequency sampling </li></ul><ul><ul><li>Equipment vibration monitoring, structural monitoring, etc. </li></ul></ul><ul><ul><li>Store and forward </li></ul></ul><ul><ul><li>Bulk reliable data transfer </li></ul></ul><ul><ul><li>Scheduling of communications </li></ul></ul><ul><li>Port to Intel Mote </li></ul><ul><li>Deployment in Intel Fab equipment monitoring application and the Golden Gate Bridge monitoring application </li></ul>
    61. 61. For more information <ul><li>http://berkeley.intel-research.net/tinydb or http:// triplerock . cs . bekeley . edu / tinydb </li></ul>
    62. 62. Part 3 <ul><li>Database Research Issues in Sensor Networks </li></ul>
    63. 63. Sensor Network Research <ul><li>Very active research area </li></ul><ul><ul><li>Can’t summarize it all </li></ul></ul><ul><li>Focus: database-relevant research topics </li></ul><ul><ul><li>Some outside of Berkeley </li></ul></ul><ul><ul><li>Other topics that are itching to be scratched </li></ul></ul><ul><ul><li>But , some bias towards work that we find compelling </li></ul></ul>
    64. 64. Topics <ul><li>In-network aggregation </li></ul><ul><li>Acquisitional Query Processing </li></ul><ul><li>Heterogeneity </li></ul><ul><li>Intermittent Connectivity </li></ul><ul><li>In-network Storage </li></ul><ul><li>Statistics-based summarization and sampling </li></ul><ul><li>In-network Joins </li></ul><ul><li>Adaptivity and Sensor Networks </li></ul><ul><li>Multiple Queries </li></ul>
    65. 65. Topics <ul><li>In-network aggregation </li></ul><ul><li>Acquisitional Query Processing </li></ul><ul><li>Heterogeneity </li></ul><ul><li>Intermittent Connectivity </li></ul><ul><li>In-network Storage </li></ul><ul><li>Statistics-based summarization and sampling </li></ul><ul><li>In-network Joins </li></ul><ul><li>Adaptivity and Sensor Networks </li></ul><ul><li>Multiple Queries </li></ul>
    66. 66. Tiny Aggregation (TAG) <ul><li>In-network processing of aggregates </li></ul><ul><ul><li>Common data analysis operation </li></ul></ul><ul><ul><ul><li>Aka gather operation or reduction in || programming </li></ul></ul></ul><ul><ul><li>Communication reducing </li></ul></ul><ul><ul><ul><li>Operator dependent benefit </li></ul></ul></ul><ul><ul><li>Across nodes during same epoch </li></ul></ul><ul><li>Exploit query semantics to improve efficiency! </li></ul>Madden, Franklin, Hellerstein, Hong. Tiny AGgregation (TAG) , OSDI 2002 .
    67. 67. Basic Aggregation <ul><li>In each epoch: </li></ul><ul><ul><li>Each node samples local sensors once </li></ul></ul><ul><ul><li>Generates partial state record ( PSR ) </li></ul></ul><ul><ul><ul><li>local readings </li></ul></ul></ul><ul><ul><ul><li>readings from children </li></ul></ul></ul><ul><ul><li>Outputs PSR during assigned comm. interval </li></ul></ul><ul><li>At end of epoch, PSR for whole network output at root </li></ul><ul><li>New result on each successive epoch </li></ul><ul><li>Extras: </li></ul><ul><ul><li>Predicate-based partitioning via GROUP BY </li></ul></ul>1 2 3 4 5
    68. 68. Illustration: Aggregation 1 Sensor # Interval # Interval 4 SELECT COUNT(*) FROM sensors Epoch 1 2 3 1 4 4 5 4 3 2 1 1 2 3 4 5
    69. 69. Illustration: Aggregation 2 Sensor # Interval 3 SELECT COUNT(*) FROM sensors Interval # 1 2 2 3 1 4 4 5 4 3 2 1 1 2 3 4 5
    70. 70. Illustration: Aggregation 3 1 Sensor # Interval 2 SELECT COUNT(*) FROM sensors Interval # 1 3 1 2 2 3 1 4 4 5 4 3 2 1 1 2 3 4 5
    71. 71. Illustration: Aggregation 5 Sensor # SELECT COUNT(*) FROM sensors Interval 1 Interval # 5 1 3 1 2 2 3 1 4 4 5 4 3 2 1 1 2 3 4 5
    72. 72. Illustration: Aggregation 1 Sensor # SELECT COUNT(*) FROM sensors Interval 4 Interval # 5 1 3 1 2 2 3 1 4 1 4 5 4 3 2 1 1 2 3 4 5
    73. 73. Aggregation Framework <ul><li>As in extensible databases, TinyDB supports any aggregation function conforming to: </li></ul>Agg n ={f init , f merge , f evaluate } F init {a 0 }  <a 0 > F merge {<a 1 >,<a 2 >}  <a 12 > F evaluate {<a 1 >}  aggregate value Example: Average AVG init {v}  <v,1> AVG merge {<S 1 , C 1 >, <S 2 , C 2 >}  < S 1 + S 2 , C 1 + C 2 > AVG evaluate {<S, C>}  S/C Partial State Record (PSR) Restriction : Merge associative, commutative
    74. 74. Taxonomy of Aggregates <ul><li>TAG insight: classify aggregates according to various functional properties </li></ul><ul><ul><li>Yields a general set of optimizations that can automatically be applied </li></ul></ul>Drives an API! Hypothesis Testing, Snooping COUNT : monotonic AVG : non-monotonic Monotonicity Routing Redundancy MIN : dup. insensitive, AVG : dup. sensitive Duplicate Sensitivity Applicability of Sampling, Effect of Loss MAX : exemplary COUNT: summary Exemplary vs. Summary Effectiveness of TAG MEDIAN : unbounded, MAX : 1 record Partial State Affects Examples Property
    75. 75. Use Multiple Parents <ul><li>Use graph structure </li></ul><ul><ul><li>Increase delivery probability with no communication overhead </li></ul></ul><ul><li>For duplicate insensitive aggregates, or </li></ul><ul><li>Aggs expressible as sum of parts </li></ul><ul><ul><li>Send (part of) aggregate to all parents </li></ul></ul><ul><ul><ul><li>In just one message, via multicast </li></ul></ul></ul><ul><ul><li>Assuming independence , decreases variance </li></ul></ul>SELECT COUNT(*) P(link xmit successful) = p P(success from A->R) = p 2 E(cnt) = c * p 2 Var(cnt) = c 2 * p 2 * (1 – p 2 )  V # of parents = n E(cnt) = n * (c/n * p 2 ) Var(cnt) = n * (c/n) 2 * p 2 * (1 – p 2 ) = V/n A B C R A B C c R A B C c/n c/n R n = 2
    76. 76. Multiple Parents Results <ul><li>Better than previous analysis expected! </li></ul><ul><li>Losses aren’t independent! </li></ul><ul><li>Insight: spreads data over many links </li></ul>Critical Link! No Splitting With Splitting
    77. 77. Acquisitional Query Processing (ACQP) <ul><li>TinyDB acquires AND processes data </li></ul><ul><ul><li>Could generate an infinite number of samples </li></ul></ul><ul><li>An acqusitional query processor controls </li></ul><ul><ul><li>when, </li></ul></ul><ul><ul><li>where, </li></ul></ul><ul><ul><li>and with what frequency data is collected! </li></ul></ul><ul><li>Versus traditional systems where data is provided a priori </li></ul>Madden, Franklin, Hellerstein, and Hong. The Design of An Acqusitional Query Processor. SIGMOD, 2003.
    78. 78. ACQP: What’s Different? <ul><li>How should the query be processed? </li></ul><ul><ul><li>Sampling as a first class operation </li></ul></ul><ul><li>How does the user control acquisition? </li></ul><ul><ul><li>Rates or lifetimes </li></ul></ul><ul><ul><li>Event-based triggers </li></ul></ul><ul><li>Which nodes have relevant data? </li></ul><ul><ul><li>Index-like data structures </li></ul></ul><ul><li>Which samples should be transmitted? </li></ul><ul><ul><li>Prioritization, summary, and rate control </li></ul></ul>
    79. 79. Operator Ordering: Interleave Sampling + Selection <ul><li>SELECT light, mag </li></ul><ul><li>FROM sensors </li></ul><ul><li>WHERE pred1(mag) </li></ul><ul><li>AND pred2(light) </li></ul><ul><li>EPOCH DURATION 1s </li></ul><ul><li>E(sampling mag) >> E(sampling light) </li></ul><ul><ul><li>1500 uJ vs. 90 uJ </li></ul></ul>At 1 sample / sec, total power savings could be as much as 3.5mW  Comparable to processor!  (pred1)  (pred2) mag light  (pred1)  (pred2) mag light  (pred1)  (pred2) mag light Traditional DBMS ACQP Correct ordering (unless pred1 is very selective and pred2 is not): Cheap Costly
    80. 80. Exemplary Aggregate Pushdown <ul><li>SELECT WINMAX(light,8s,8s) </li></ul><ul><li>FROM sensors </li></ul><ul><li>WHERE mag > x </li></ul><ul><li>EPOCH DURATION 1s </li></ul><ul><li>Novel, general pushdown technique </li></ul><ul><li>Mag sampling is the most expensive operation! </li></ul> WINMAX  (mag>x) mag light Traditional DBMS light mag  (mag>x)  WINMAX  (light > MAX) ACQP
    81. 81. Topics <ul><li>In-network aggregation </li></ul><ul><li>Acquisitional Query Processing </li></ul><ul><li>Heterogeneity </li></ul><ul><li>Intermittent Connectivity </li></ul><ul><li>In-network Storage </li></ul><ul><li>Statistics-based summarization and sampling </li></ul><ul><li>In-network Joins </li></ul><ul><li>Adaptivity and Sensor Networks </li></ul><ul><li>Multiple Queries </li></ul>
    82. 82. Heterogeneous Sensor Networks <ul><li>Leverage small numbers of high-end nodes to benefit large numbers of inexpensive nodes </li></ul><ul><li>Still must be transparent and ad-hoc </li></ul><ul><li>Key to scalability of sensor networks </li></ul><ul><li>Interesting heterogeneities </li></ul><ul><ul><li>Energy: battery vs. outlet power </li></ul></ul><ul><ul><li>Link bandwidth: Chipcon vs. 802.11x </li></ul></ul><ul><ul><li>Computing and storage: ATMega128 vs. Xscale </li></ul></ul><ul><ul><li>Pre-computed results </li></ul></ul><ul><ul><li>Sensing nodes vs. QP nodes </li></ul></ul>
    83. 83. Computing Heterogeneity with TinyDB <ul><li>Separate query processing from sensing </li></ul><ul><ul><li>Provide query processing on a small number of nodes </li></ul></ul><ul><ul><li>Attract packets to query processors based on “service value” </li></ul></ul><ul><li>Compare the total energy consumption of the network </li></ul><ul><li>No aggregation </li></ul><ul><li>All aggregation </li></ul><ul><li>Opportunistic aggregation </li></ul><ul><li>HSN proactive aggregation </li></ul>Mark Yarvis and York Liu, Intel’s Heterogeneous Sensor Network Project, ftp://download.intel.com/research/people/HSN_IR_Day_Poster_03.pdf .
    84. 84. 5x7 TinyDB/HSN Mica2 Testbed
    85. 85. Data Packet Saving <ul><li>How many aggregators are desired? </li></ul><ul><li>Does placement matter? </li></ul>11% aggregators achieve 72% of max data reduction Optimal placement 2/3 distance from sink.
    86. 86. Occasionally Connected Sensornets TinyDB QP TinyDB QP TinyDB Server GTWY Mobile GTWY Mobile GTWY TinyDB QP GTWY internet GTWY Mobile GTWY
    87. 87. Occasionally Connected Sensornets Challenges <ul><li>Networking support </li></ul><ul><ul><li>Tradeoff between reliability, power consumption and delay </li></ul></ul><ul><ul><li>Data custody transfer: duplicates? </li></ul></ul><ul><ul><li>Load shedding </li></ul></ul><ul><ul><li>Routing of mobile gateways </li></ul></ul><ul><li>Query processing </li></ul><ul><ul><li>Operation placement: in-network vs. on mobile gateways </li></ul></ul><ul><ul><li>Proactive pre-computation and data movement </li></ul></ul><ul><li>Tight interaction between networking and QP </li></ul>Fall, Hong and Madden, Custody Transfer for Reliable Delivery in Delay Tolerant Networks, http://www.intel-research.net/Publications/Berkeley/081220030852_157.pdf .
    88. 88. Distributed In-network Storage <ul><li>Collectively, sensornets have large amounts of in-network storage </li></ul><ul><li>Good for in-network consumption or caching </li></ul><ul><li>Challenges </li></ul><ul><ul><li>Distributed indexing for fast query dissemination </li></ul></ul><ul><ul><li>Resilience to node or link failures </li></ul></ul><ul><ul><li>Graceful adaptation to data skews </li></ul></ul><ul><ul><li>Minimizing index insertion/maintenance cost </li></ul></ul>
    89. 89. Example: DIM <ul><li>Functionality </li></ul><ul><ul><li>Efficient range query for multidimensional data. </li></ul></ul><ul><li>Approaches </li></ul><ul><ul><li>Divide sensor field into bins. </li></ul></ul><ul><ul><li>Locality preserving mapping from m- d space to geographic locations. </li></ul></ul><ul><ul><li>Use geographic routing such as GPSR . </li></ul></ul><ul><li>Assumptions </li></ul><ul><ul><li>Nodes know their locations and network boundary </li></ul></ul><ul><ul><li>No node mobility </li></ul></ul>Xin Li, Young Jin Kim, Ramesh Govindan and Wei Hong , Distributed Index for Multi-dimentional Data (DIM) in Sensor Networks, SenSys 2003. E 2 = <0.6, 0.7> E 1 = <0.7, 0.8> Q 1 =<.5-.7, .5-1>
    90. 90. Statistical Techniques <ul><li>Approximations, summaries, and sampling based on statistics and statistical models </li></ul><ul><li>Applications: </li></ul><ul><ul><li>Limited bandwidth and large number of nodes -> data reduction </li></ul></ul><ul><ul><li>Lossiness -> predictive modeling </li></ul></ul><ul><ul><li>Uncertainty -> tracking correlations and changes over time </li></ul></ul><ul><ul><li>Physical models -> improved query answering </li></ul></ul>
    91. 91. Correlated Attributes <ul><li>Data in sensor networks is correlated; e.g., </li></ul><ul><ul><li>Temperature and voltage </li></ul></ul><ul><ul><li>Temperature and light </li></ul></ul><ul><ul><li>Temperature and humidity </li></ul></ul><ul><ul><li>Temperature and time of day </li></ul></ul><ul><ul><li>etc. </li></ul></ul>
    92. 92. IDSQ <ul><li>Idea: task sensors in order of best improvement to estimate of some value: </li></ul><ul><ul><li>Choose leader(s) </li></ul></ul><ul><ul><ul><li>Suppress subordinates </li></ul></ul></ul><ul><ul><ul><li>Task subordinates, one at a time </li></ul></ul></ul><ul><ul><ul><ul><li>Until some measure of goodness (error bound) is met </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>E.g. “Mahalanobis Distance” -- Accounts for correlations in axes, tends to favor minimizing principal axis </li></ul></ul></ul></ul></ul>See “Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks.” Chu, Haussecker and Zhao. Xerox TR P2001-10113. May, 2001.
    93. 93. Model location estimate as a point with 2-dimensional Gaussian uncertainty. Graphical Representation Principal Axis S 1 Residual 1 Preferred because it reduces error along principal axis Residual 2 S 2 Area of residuals is equal
    94. 94. MQSN: Model-based Probabilistic Querying over Sensor Networks Query Processor Model 1 2 3 5 6 4 7 8 9 Joint work with Amol Desphande, Carlos Guestrin, and Joe Hellerstein
    95. 95. MQSN: Model-based Probabilistic Querying over Sensor Networks Query Processor Model 1 2 3 5 6 4 7 8 9 Consult Model Probabilistic Query select NodeID, Temp ± 0.1C where NodeID in [1..9] with conf(0.95) Observation Plan [Temp, 3], [Temp, 9]
    96. 96. MQSN: Model-based Probabilistic Querying over Sensor Networks Query Processor Model 1 2 3 5 6 4 7 8 9 Consult Model Observation Plan [Temp, 3], [Temp, 9] Probabilistic Query select NodeID, Temp ± 0.1C where NodeID in [1..9] with conf(0.95)
    97. 97. MQSN: Model-based Probabilistic Querying over Sensor Networks Query Processor Model 1 2 3 5 6 4 7 8 9 Query Results Update Model Data [Temp, 3] = …, [Temp, 9] = …
    98. 98. Challenges <ul><li>What kind of models to use ? </li></ul><ul><li>Optimization problem: </li></ul><ul><ul><li>Given a model and a query, find the best set of attributes to observe </li></ul></ul><ul><ul><li>Cost not easy to measure </li></ul></ul><ul><ul><ul><li>Non-uniform network communication costs </li></ul></ul></ul><ul><ul><ul><li>Changing network topologies </li></ul></ul></ul><ul><ul><li>Large plan space </li></ul></ul><ul><ul><ul><li>Might be cheaper to observe attributes not in query </li></ul></ul></ul><ul><ul><ul><ul><li>e.g. Voltage instead of Temperature </li></ul></ul></ul></ul><ul><ul><ul><li>Conditional Plans: </li></ul></ul></ul><ul><ul><ul><ul><li>Change the observation plan based on observed values </li></ul></ul></ul></ul>
    99. 99. MQSN: Current Prototype <ul><li>Multi-variate Gaussian Models </li></ul><ul><ul><li>Kalman Filters to capture correlations across time </li></ul></ul><ul><li>Handles: </li></ul><ul><ul><li>Range predicate queries </li></ul></ul><ul><ul><ul><li>sensor value within [x,y], w/ confidence </li></ul></ul></ul><ul><ul><li>Value queries </li></ul></ul><ul><ul><ul><li>sensor value = x, w/in epsilon, w/ confidence </li></ul></ul></ul><ul><ul><li>Simple aggregate queries </li></ul></ul><ul><ul><ul><li>AVG(sensor value)  n, w/in epsilon, w/confidence </li></ul></ul></ul><ul><li>Uses a greedy algorithm to choose the observation plan </li></ul>
    100. 100. In-Net Regression <ul><li>Linear regression : simple way to predict future values, identify outliers </li></ul><ul><li>Regression can be across local or remote values, multiple dimensions, or with high degree polynomials </li></ul><ul><ul><li>E.g., node A readings vs. node B’s </li></ul></ul><ul><ul><li>Or, location (X,Y), versus temperature </li></ul></ul><ul><ul><ul><li>E.g., over many nodes </li></ul></ul></ul>Guestrin, Thibaux, Bodik, Paskin, Madden. “Distributed Regression: an Efficient Framework for Modeling Sensor Network Data .” Under submission.
    101. 101. In-Net Regression (Continued) <ul><li>Problem: may require data from all sensors to build model </li></ul><ul><li>Solution: partition sensors into overlapping “kernels” that influence each other </li></ul><ul><ul><li>Run regression in each kernel </li></ul></ul><ul><ul><ul><li>Requiring just local communication </li></ul></ul></ul><ul><ul><li>Blend data between kernels </li></ul></ul><ul><ul><li>Requires some clever matrix manipulation </li></ul></ul><ul><li>End result: regressed model at every node </li></ul><ul><ul><li>Useful in failure detection, missing value estimation </li></ul></ul>
    102. 102. Exploiting Correlations in Query Processing <ul><li>Simple idea: </li></ul><ul><ul><li>Given predicate P(A) over expensive attribute A </li></ul></ul><ul><ul><li>Replace it with P’ over cheap attribute A’ such that P’ evaluates to P </li></ul></ul><ul><ul><li>Problem: unless A and A’ are perfectly correlated, P’ ≠ P for all time </li></ul></ul><ul><ul><ul><li>So we could incorrectly accept or reject some readings </li></ul></ul></ul><ul><li>Alternative: use correlations to improve selectivity estimates in query optimization </li></ul><ul><ul><li>Construct conditional plans that vary predicate order based on prior observations </li></ul></ul>
    103. 103. Exploiting Correlations (Cont.) <ul><li>Insight: by observing a (cheap and correlated) variable not involved in the query, it may be possible to improve query performance </li></ul><ul><ul><li>Improves estimates of selectivities </li></ul></ul><ul><li>Use conditional plans </li></ul><ul><li>Example </li></ul>Light > 100 Lux Temp < 20° C Cost = 100 Selectivity = .5 Cost = 100 Selectivity = .5 Expected Cost = 150 Light > 100 Lux Temp < 20° C Cost = 100 Selectivity = .5 Cost = 100 Selectivity = .5 Expected Cost = 150 Light > 100 Lux Temp < 20° C Cost = 100 Selectivity = .1 Cost = 100 Selectivity = .9 Expected Cost = 110 Light > 100 Lux Temp < 20° C Cost = 100 Selectivity = .1 Cost = 100 Selectivity = .9 Expected Cost = 110 Time in [6pm, 6am] T F
    104. 104. In-Network Join Strategies <ul><li>Types of joins: </li></ul><ul><ul><li>non-sensor -> sensor </li></ul></ul><ul><ul><li>sensor -> sensor </li></ul></ul><ul><li>Optimization questions: </li></ul><ul><ul><li>Should the join be pushed down? </li></ul></ul><ul><ul><li>If so, where should it be placed? </li></ul></ul><ul><ul><li>What if a join table exceeds the memory available on one node? </li></ul></ul>
    105. 105. Choosing Where to Place Operators <ul><li>Idea : choose a “join node” to run the operator </li></ul><ul><li>Over time, explore other candidate placements </li></ul><ul><ul><li>Nodes advertise data rates to their neighbors </li></ul></ul><ul><ul><li>Neighbors compute expected cost of running the join based on these rates </li></ul></ul><ul><ul><li>Neighbors advertise costs </li></ul></ul><ul><ul><li>Current join node selects a new, lower cost node </li></ul></ul>Bonfils + Bonnet, Adaptive and Decentralized Operator Placement for In-Network QueryProcessing IPSN 2003.
    106. 106. Topics <ul><li>In-network aggregation </li></ul><ul><li>Acquisitional Query Processing </li></ul><ul><li>Heterogeneity </li></ul><ul><li>Intermittent Connectivity </li></ul><ul><li>In-network Storage </li></ul><ul><li>Statistics-based summarization and sampling </li></ul><ul><li>In-network Joins </li></ul><ul><li>Adaptivity and Sensor Networks </li></ul><ul><li>Multiple Queries </li></ul>
    107. 107. Adaptivity In Sensor Networks <ul><li>Queries are long running </li></ul><ul><li>Selectivities change </li></ul><ul><ul><li>E.g. night vs day </li></ul></ul><ul><li>Network load and available energy vary </li></ul><ul><li>All suggest that some adaptivity is needed </li></ul><ul><ul><li>Of data rates or granularity of aggregation when optimizing for lifetimes </li></ul></ul><ul><ul><li>Of operator orderings or placements when selectivities change (c.f., conditional plans for correlations) </li></ul></ul><ul><li>As far as we know, this is an open problem! </li></ul>
    108. 108. Multiple Queries and Work Sharing <ul><li>As sensornets evolve, users will run many queries simultaneously </li></ul><ul><ul><li>E.g., traffic monitoring </li></ul></ul><ul><li>Likely that queries will be similar </li></ul><ul><ul><li>But have different end points, parameters, etc </li></ul></ul><ul><li>Would like to share processing, routing as much as possible </li></ul><ul><li>But how? Again, an open problem. </li></ul>
    109. 109. Concluding Remarks <ul><li>Sensor networks are an exciting emerging technology, with a wide variety of applications </li></ul><ul><li>Many research challenges in all areas of computer science </li></ul><ul><ul><li>Database community included </li></ul></ul><ul><ul><li>Some agreement that a declarative interface is right </li></ul></ul><ul><li>TinyDB and other early work are an important first step </li></ul><ul><li>But there’s lots more to be done! </li></ul>