• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Telegraph Cq English
 

Telegraph Cq English

on

  • 949 views

Workshop on TelegraphCQ:...

Workshop on TelegraphCQ:
Concept of Data Stram Management System.
TelegraphCQ: the DSMS developped at Berkley, internal architecture.
Differences between tradition database.
Adaptive QUery Processing using the new concept of Eddies like a routing operator.
Troubles about join Streams (with no statistical data) and Relations; and the two solution: STAIR and SteMs.
STAIR: a join operator that allow internal state changing using primite function visible to eddies.
SteMs: helf-join operator that keep homogeneous tuples, internal state is decision-indipendent.
Eddies Routing Policy implemented with the (Waldspurger & Weihl [1994]) Lottery Scheduling.

Statistics

Views

Total Views
949
Views on SlideShare
949
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • DBMS Open Source, PostgreSQL, da cui partire per implementare TelegraphCQ.Sviluppato alla Berkeley UniversityScritto in C/C++Licenza OpenBSDBasato sul codice di PostgreSQLVersione corrente: 2.1 su PostgreSQL 7.3.2Progetto chiuso nel 2006Importanti punti di interesse e caratteristicheSoftware http://telegraph.cs.berkeley.eduPapers http://db.cs.berkeley.edu/telegraphSpinoff commerciale Truviso
  • Versioni non bloccanti di operatori classici (sel, proj)Eddies: decidono routing tupla per tuplaFlux instradano le tuple fra le macchine di un cluster per supportare il parallelismo, il bilanciamento del carico e la tolleranza ai guasti
  • Le sorgenti devono identificarsi prima di inviare datiWrapper: funzioni user-definedCome devono essere processati i datiAll’interno del Wrapper Clearinghouse processPush sourcesIniziano una connessione con TelegraphCQPull sourcesIl wrapper inizia la connessionePull, per esempio si connette ad un server mail e controlla la posta ogni minutoDati da Wrapper differenti possono confluire nello stesso streamHeartbeat: tupla Punctuated senza dati, solo timestamp Quando arriva una tupla Punctuated non arriveranno dati antecedenti
  • I dati arrivati possono essere solo parti di tuple quindi è necessario bufferizzarle, ad ogni chiamata non è detto che avremo una tuplaWrapperState serve per fare comunicare funzioni utente con il WCH,Se arrivano meno campi saranno di default a NULL, e se ne arrivano troppi saranno troncati
  • Una sorta di repository temporaneo di datiOperatore di Half-Join che memorizza tuple omogeneeStatoindipendentedalle precedenti decisioni di routing (poiché non memorizza le tuple)Supporta le operazioniInserimento (build)Ricerca (probe)Cancellazione (eviction) utili per le windowSimili a Mjoin ma più adattiviSimile alla facile routing policy con le query con solo le selezioniSharing dello stato tra altre query continueNon memorizza i risultati intermediAumento del costo di computazione
  • The opportunity to impove:Optimizers pick a single plan for a queryHowever, different subsets of data may have very different statistical propertiesMay be more efficient to use different plans for different subsets of data

Telegraph Cq English Telegraph Cq English Presentation Transcript

  • Berkley’sTelegraphCQ
    Details about TelegraphCQ
    Adaptivity: Eddies, SteMs and STAIR
    Routing Policy: Lottery Scheduling
    1
    Friday, July 15, 2011
    Alberto Minetti
    Advanced Data Management @ DISI
    Università degli Studi di Genova
  • Data Stream Management System
    Continuous, unbounded, rapid, time-varying streams of data elements
    Occur in a variety of modern applications
    Network monitoring and traffic engineering
    Sensor networks, RFID tags
    Telecom call records
    Financial applications
    Web logs and click-streams
    Manufacturing processes
    DSMS = Data Stream Management System
    2
  • The Begin:Telegraph
    Several continuous queries
    Several data streams
    At the beginning in Java
    Then C-based using PostgreSQL
    No Distribuite Scheduling
    Level of adaptivity doesn’t change against overloading
    Ignore system resources
    Data managemente fully in memory
    3
  • ReDesign:TelegraphCQ
    Developed byBerkeley University
    Written in C/C++
    OpenBSD License
    Based on PostgreSQL sources
    Current Version: 0.6 (PostgreSQL 7.3.2)
    Closed Project in 2006
    Several points of interest and features
    Software http://telegraph.cs.berkeley.edu
    Papers http://db.cs.berkeley.edu/telegraph
    Commercial spinoff Truviso
    4
  • TelegraphCQ Architecture
    PostgreSQL backends
    Many TelegraphCQ front-end
    Only one TelegraphCQ back-end
    Front-End
    Fork for every client connection
    Doesn’t hold streams
    Parsing continuous query in the shared memory
    Back-End has an Eddy
    Joins query plans together
    Can be shared among queries
    Put results in the shared memory
    5
  • TelegraphCQ Architecture
    Shared Memory
    Query Plan Queue
    TelegraphCQBack End
    TelegraphCQBack End
    TelegraphCQ Front End
    Eddy Control Queue
    Planner Parser Listener
    Modules
    Modules
    Query Result Queues
    Split
    Mini-Executor
    Proxy
    CQEddy
    CQEddy
    }
    Split
    Split
    Catalog
    Scans
    Scans
    Shared Memory Buffer Pool
    Wrappers
    TelegraphCQ
    Wrapper
    ClearingHouse
    Disk
    6
    Taken from Michael Franklin, UC Berkeley
  • Tipologie di Moduli
    Input and Caching (Relations and Streams)
    Interface to external datasource
    Wrapper for HTML, XML, FileSystem, proxy P2P
    Remote Databases with caching support
    Query Execution
    Non-blocking of classical operators (sel, proj)
    SteMs, STAIRs
    Adaptive Routing
    Reoptimize plan during execution
    Eddies: choose routing tupla per tupla
    Juggle: ordering on the fly (per value or timestamp),
    Flux: routing among computer of a cluster
    7
    Fjords Framework
  • Fjords
    Framework inJava for Operators on Remote Data Streams
    Interconnect modules
    Support queues among modules
    Non-blocking
    Support for Relazions and Streams
    8
  • Stream in TelegraphCQ
    Unarchived Stream
    Never written on disk
    In shared memory between executor and wrapper
    Archived Stream
    Append-only method to send tuples to system
    No update, insert, delete; query aggregate with window
    tcqtimeof type TIMESTAMP for window queries
    With constraint TIMESTAMPCOLUMN
    9
  • DDL: Create Stream
    10
    CREATE STREAM measurements (
    tcqtime TIMESTAMP TIMESTAMPCOLUMN,
    stationid INTEGER,
    speed REAL)
    TYPE ARCHIVED;
    CREATE STREAM tinydb (
    tcqtime TIMESTAMP TIMESTAMPCOLUMN,
    light REAL,
    temperature REAL)
    TYPE UNARCHIVED;
    DROP STREAM measurements;
  • Acquisizione di Dati
    Sources must identify before sending datas
    Wrapper: user-defined functions
    How process datas
    Inside Wrapper Clearinghouse process
    Push sources
    Begin a connection to TelegraphCQ
    Pull sources
    Thewrapper begin the connection
    Different Wrapper Data can merge in the same stream
    Heartbeat: Punctuated tuple without datas, only timestamp
    When see a Punctuatedtuple no prior datas will come
    11
  • Wrappers nel WCH
    Non-process-blocking over network socket (TCP/IP)
    Wrapper funct called when there are datas on socket
    Or when there are datas on buffer
    If funct return a tuple for time (classic iterator)
    Ritorns array ofPostgreSQL Datum
    Init(WrapperState*) allocate resources and state
    Next(WrapperState*) tuples are in the WrapperState
    Done(WrapperState*) free resources and destroy state
    All in the infrastructured memory of PostgreSQL
    12
  • DDL: Create Wrapper
    13
    CREATE FUNCTION measurements_init(INTEGER)
    RETURNS BOOLEAN
    AS ‘libmeasurements.so,measurements_init’
    LANGUAGE ‘C’;
    CREATE WRAPPER mywrapper (
    init=measurements_init,
    next=measurements_next,
    done=measurements_done);
  • HtmlGet and WebQueryServer
    HtmlGet allows the user to execute a series of Html GET or POST requests to arrive at a page, and then to extract data out of the page using a TElegraph Screen Scrapperdefinition file. Once extracted, this data can be output to a CSV file.
    Welcome to the TESS homepage. TESS is the TElegraph Screen Scrapper. It is part of The Telegraph Project at UC Berkeley. TESS is a program that takes data from web forms (like search engines or database queries) and turns it into a representation that is usable by a database query processor.
    http://telegraph.cs.berkeley.edu/tess/
    14
  • self-monitoring capability
    Three special streams: info about system state
    Support tointrospettive queries
    Dynamic catalog
    Queried as an any stream
    tcq_queries(tcqtime, qrynum, qid, kind, qrystr)
    tcq_operators(tcqtime, opnum, opid, numqrs, kind, qid, opstr, opkind, opdesc)
    tcq_queues(tcqtime, opid, qkind, kind)
    15
  • Example
    16
    Welcome to psql 0.2, the PostgreSQL interactive terminal.
    # CREATE SCHEMA traffic;
    # CREATE STREAM traffic.measurements
    (stationid INT,
    speed REAL,
    tcqtime TIMESTAMP TIMESTAMPCOLUMN
    ) TYPE ARCHIVED;
    # ALTER STREAM traffic.measurements ADD WRAPPER csvwrapper;
    $ cat tcpdump.log | source.pl localhost 5533 csvwrapper,network.tcpdum
    Default TelegraphCQ script written in Perl to simulate sources that send CSV datas
    Default Port
  • Load Shedding
    CREATE STREAM … TYPE UNARCHIVED ON OVERLOAD ????
    BLOCK: stop it(default)
    DROP: drop tuples
    KEEP
    COUNTS: keep the count of dropped tuples
    REGHIST: build a fixed-grid histog. of shedded tuples
    MYHIST: build a MHIST (multidimensional histog.)
    WAVELET wavelet params
    Build a wavelet histogram
    SAMPLE: keep a Reservoir Sample
    17
  • LoadShedding:SummaryStream
    For a stream named schema.stream
    Automatically created two streams
    schema.__stream_dropped
    schema.__stream_kept
    For WAVELET, MYHIST, REHIST, COUNTS
    Schema contains:
    Summary datas
    Summary interval
    For SAMPLE
    Same schema with column __samplemult
    Keep the number of effective tuples rappresented
    18
  • Quering TelegraphCQ:StreaQuel
    19
    Continuous Query for:
    Standard Relations inherit from PostgreSQL
    Data stream windowed (sliding, hopping, jumping)
    RANGE BY specify the window dimension
    SLIDE BY specify the update rate
    START AT specify whenthe query will begin
    optional
    SELECT stream.color, COUNT(*)
    FROM stream [RANGE BY ‘9’ SLIDE BY ‘1’]
    GROUP BY stream.color
    window
    START OUPUT!
    1
    1
    1
    1
    1
    2
    2
    1
    2
    1
    2
    2
    1
    2
    1
    2
    1
    2
    1
    Adapted from Jarle Søberg
  • Quering TelegraphCQ:StreaQuel (2)
    wtime(*) returns the last timestamp in the window
    Recoursive query using WITH [SQL 1999]
    StreaQuel doesn’t allow subqueries
    20
    SELECT S.a, wtime(*)
    FROM
    S [RANGE BY ’10 seconds’ SLIDE BY ’10 second’],
    R [RANGE BY ’10 seconds’ SLIDE BY ’10 second’]
    WHERE S.a = R.a;
    Data Stream
    Window

    10 seconds
    10 seconds
  • Net Monitor Windowed Query
    21
    SELECT (CASE when outgoing = true
    then src_ip else dst_ip end) as inside_ip ,
    (CASE when outgoing = true
    then dst_ip else src_ip end) as outside_ip,
    sum(bytes_sent) + sum(bytes_recv) as bytes
    FROM flow [RANGE BY $w SLIDE BY $w]
    GROUP BY inside_ip, outside_ip
    All active connections
    SELECT src_ip, wtime(*),
    COUNT(DISTINCT(dst_ip||dst_port)) AS fanout,
    FROM flow [RANGE BY $w SLIDE BY $w]
    WHERE outgoing = false
    GROUP BY src_ip ORDER BY fanout DESC
    LIMIT 100 PER WINDOW;
    100 sources with the max number of connections
    SELECT sum(bytes_sent), src_ip, wtime(*) AS now
    FROM flow [RANGE BY $w SLIDE BY $w]
    WHERE outgoing = false
    GROUP BY src_ip ORDER BY sum(bytes_sent) DESC
    LIMIT 100 PER WINDOW;
    100 most significant sources of traffic
    Taken from: İlkay Ozan Kay
  • Evolutionary
    Revolutionary
    Adaptive Query Processing:Evolution
    Static Late Inter Intra Per
    Plans Binding Operator Operator Tuple
    Traditional Dynamic QEP Query Scrambling Xjoin, DPHJ, Eddies
    DBMS Parametric Mid-query Reopt. Convergent QP
    Competitive Progressive Opt.
    Taken from: AmolDeshpande, Vijayshankar Raman
    22
  • Adaptive Query Processing:Current Background
    Several plans
    Parametric Queries
    Continuous Queries
    Focus on incremental output
    Complex Queries (10-20 relazions in join)
    Data Stream and asyncronous datas
    Statistics not available
    Interactive Queries and user preferences
    Sharing of the state
    Several operators
    XML data and text
    Wide area federations
    23
  • Adaptive Query Processing:System R
    Repeat
    Observate environment: daily/weekly (runstats)
    Choose behaviour (optimizer)
    If current plan is not the best plan (analyzer)
    Actuate the new plan (executor)
    Cost-based optimization
    Runstats-optimize-execute -> high coarseness
    Weekly Adaptivity!
    Goal: adaptivity per tuple
    Merge the 4 works
    Measure
    Actuate
    Analyze
    Plan
    Taken from: Avnur, Hellerstein
    24
  • TelegraphCQ Executor:Eddy
    Idea taken from fluid mechanic
    continuously adaptive query processing mechanism
    Eddy is a routing operator
    Delineate which modules must visit before
    After a tuple visit all modules, it can output
    See tuples before and after each module (operator)
    25
    Measure
    Eddy
    Analyze
    Actuate
    Plan
    Taken from: Amol Deshpande, Vijayshankar Raman
  • Eddies:Correctness
    Every tuple has two BitVector
    Every position correspond to an operator
    Ready: identify if tuple is ready for that operator
    Eddy can delineate which tuples send to an operator
    Done: identify if tuple was processed by that operator
    Eddy avoids sending two times to same operator
    When all Done bits are setted -> output
    joined tuple have Ready and Donein bitwise-OR
    Simple selections have Ready=complement(Done)
    26
  • Eddies:Simple Selection Routing
    27
    S
    SELECT *FROM SWHERE S.a > 10 AND S.b < 15
    AND S.c = 15
    σa
    σb
    S.b < 15
    S.a > 10
    Eddy
    σc
    S.c = 15
    σaσbσc
    σaσbσc
    15 ; 0 ; 15
    S1
    0 0 0
    1 1 1
    1 0 0
    01 1
    1 1 0
    0 0 1
    1 1 1
    0 0 0
    a
    15
    b
    0
    Ready
    Done
    c
    15
    Adapted from Manuel Hertlein
  • S >< T
    Relation Binary Join:R >< S >< T
    Join order
    (R >< S) >< T
    Alright with direct access to datas (index or seq.)
    28
    R >< S
    T
    R
    S
    Taken from Jarle Søberg
  • Stream Binary JoinR >< S >< T
    29
    Join order
    (R >< S) >< T
    But if data are sent by sources...
    S >< T
    R >< S
    Block or drop some tuples is inevitable!
    Taken from Jarle Søberg
  • On the fly optimization necessary
    S >< T
    30
    Stream Binary JoinR >< S >< T
    Often stream changes
    Reoptimize require lot of time
    Non dynamic enough!
    R >< S
    Taken from Jarle Søberg
  • Stream Binary Join:Eddies
    Using an Eddy
    Initial behaviour of Telegraph
    31
    S >< T
    eddy
    R >< S
    tuple-based adaptivity
    Consider dynamic changes in the stram
    Taken from Jarle Søberg
  • Eddies:Sheduling Join problem
    Sheduling on selectivity of joins doesn’t work
    Example
    32
    |S E|
    |EC|
    >>
    E and Carrive early; Sis delayed
    S
    E
    C
    time
    Taken from Amol Deshpande
  • 33
    |S E|
    |E C|
    SE
    HashTable
    E.Name
    HashTable
    S.Name
    Eddy
    S
    E
    Output
    C
    HashTable
    C.Course
    HashTable
    E.Course
    Eddy decides to route E to EC
    EC
    >>
    E and Carrive early; Sis delayed
    S0
    sent and received suggest
    S Join E is better option
    S
    S
    E
    S0
    S –S0
    E
    C
    time
    C
    SE
    S0E
    (S –S0)E
    Eddy learns the correct sizes
    Too Late !!
    Taken from Amol Deshpande
  • 34
    State got embedded as a
    result of earlier routing
    decisions
    |S E|
    |EC|
    SE
    HashTable
    E.Name
    HashTable
    S.Name
    EC
    Eddy
    S
    E
    Output
    C
    HashTable
    C.Course
    HashTable
    E.Course
    SE
    EC
    >>
    E and Carrive early; Sis delayed
    S
    E
    C
    C
    SE
    S
    E
    Execution Plan Used
    Query is executed using the worse plan!
    Too Late !!
    Taken from Amol Deshpande
  • STAIRStorage, Transformation and Access for Intermediate Results
    35
    S.Name STAIR
    Build into
    S.Name STAIR
    HashTable
    E.Name STAIR
    HashTable
    Eddy
    S
    Output
    E
    C
    HashTable
    HashTable
    E.Course STAIR
    C.Course STAIR
    Probe into
    E.Name STAIR
    Show internal state of join to eddy
    Provide primitive function to state management
    Demotion
    Promotion
    Operation for
    Insertion (build)
    Lookup (probe)
    s1
    s1
    s1
    s1
    Taken from Amol Deshpande
  • STAIR: Demotion
    36
    e1
    e1
    e2c1
    e2
    s1e1
    e2c1
    e2
    s1e1
    S.Name STAIR
    HashTable
    E.Name STAIR
    s1
    Demoting di e2c1ae2:
    Simple projection
    HashTable
    e1
    e2c1
    e2
    Eddy
    S
    s1e1
    E
    Output
    C
    HashTable
    e2
    s1e1
    e2c1
    HashTable
    c1
    Can be tought of as undoing work
    E.Course STAIR
    C.Course STAIR
    Adapted from Amol Deshpande
  • STAIR: Promotion
    37
    Promotinge1 using EC
    e1
    e1
    e1c1
    e1
    e1c1
    S.Name STAIR
    HashTable
    E.Name STAIR
    s1
    Two arguments:
    • A tuple
    • A join to be used to promote this tuple
    HashTable
    e1
    e1c1
    e2c1
    Eddy
    S
    E
    Output
    C
    HashTable
    e2
    s1e1
    HashTable
    c1
    e1
    Can be tought of as precomputation of work
    E.Course STAIR
    C.Course STAIR
    Adapted from Amol Deshpande
  • Demotion OR Promotion
    38
    Taken from Lifting the Burden of History from Adaptive Query Processing
    Amol Deshpande and Joseph M. Hellerstein
  • Demotion AND Promotion
    Taken from Lifting the Burden of History from Adaptive Query Processing
    Amol Deshpande and Joseph M. Hellerstein
    39
  • 40
    S.Name STAIR
    HashTable
    |S E|
    |EC|
    S0
    E
    E
    E
    HashTable
    E
    E
    E
    C
    C
    C
    S0E
    Eddy decides to route E to EC
    E.Course STAIR
    >>
    E and Carrive early; Sis delayed
    S0
    E.Name STAIR
    HashTable
    S
    E
    E
    Eddy
    S
    C
    E
    Output
    C
    time
    E
    HashTable
    C
    Eddy decides to migrateE
    Eddy learns the correct selectivities
    By promoting E using EC
    C.Course STAIR
    Adapted from Amol Deshpande
  • 41
    |S E|
    |EC|
    HashTable
    E
    C
    S0E
    E.Course STAIR
    >>
    E and Carrive early; Sis delayed
    S.Name STAIR
    HashTable
    S
    S0
    E.Name STAIR
    HashTable
    S
    S –S0
    S –S0
    E
    Eddy
    S
    C
    (S –S0) EC
    E
    Output
    C
    time
    E
    HashTable
    C
    C.Course STAIR
    Adapted from Amol Deshpande
  • 42
    EC
    SE
    C
    S – S0
    SE
    EC
    S0
    E
    E
    C
    HashTable
    E
    C
    SE
    E.Course STAIR
    S.Name STAIR
    HashTable
    S
    E.Name STAIR
    HashTable
    UNION
    Eddy
    S
    E
    Output
    C
    E
    HashTable
    C
    Most of the data is
    processed using the
    correct plan
    C.Course STAIR
    Adapted from AmolDeshpande
  • STAIR:Correctness
    Theorem [3.1]: An eddy with STAIRs always produces the correct query result in spite of arbitrary applications of the promotion and demotion operations.
    STAIRs will produce wvery resul tuple
    There wull be no spurious duplicates
    43
    Taken from Lifting the Burden of History from Adaptive Query Processing
    Amol Deshpande and Joseph M. Hellerstein
  • State Module
    A kind of temporary data repository
    Half-Join operator that keep homogeneous tuple
    State inside the operators is Decisions-Indipendent
    Support the operations
    Insertion (build)
    Look-up (probe)
    Deletion (eviction) [useful for windows]
    Similar to a Mjoin but more adaptive
    Sharing of the state among other continuous queries
    But not storing intermediate results
    Increase the computation cost significant
    44
  • Eddies Join with SteMs
    45
    T
    R
    S
    eddy
    R
    T
    S
    More adaptivity
    Eddy knows half-join
    Different access method
    Index access
    Scan access
    Simulate several join
    On overloading?
    Hash join (fast!!)
    Index join (mem limit)
    Join familty?
    Hash join (equi-join)
    B-tree join (<, <=, >)
    Parametrica query can be tought as a join
    Adapted from Jarle Søberg
  • SteMs:Correctness
    46
    R
    S
    Correctness problem!
    Possibile duplicates!!
    Global unique sequence number
    Only younger can probe
    Taken from Jarle Søberg
  • SteMs sliding Window
    47
    SELECT *
    FROM Str [RANGE BY 5 Second
    SLIDE BY 2 Second],
    Relation,
    WHERE Relation.a = Str.a
    A|…….|18:49:36
    R
    B|…….|18:49:36
    C|…….|18:49:37
    A|…….|18:49:38
    Keep the state for the sliding window (eviction)
    At time 40, what will happen at time 42?
    Instead rebuild all hash table it remove only old tuples and add new tuples
    B|…….|18:49:39
    C|…….|18:49:39
    A|…….|18:49:40
    B|…….|18:49:40
    C|…….|18:49:40
    Eviction!18:49:37
    A|…….|18:49:41
    B|…….|18:49:41
  • Binary Join, STAIR, SteMComparison
    48
    select *
    from customer c, orders o, lineitem l
    where c.custkey = o.custkey and
    o.orderkey = l.orderkey and
    c.nationkey = 1 and
    c.acctbal > 9000 and
    l.shipdate > date ’1996-01-01’
    Necessary Recomputation
    NO adaptive
    lineitemcoming with ascending shipdate
    Initial routing (O >< L) >< C
    Taken from Lifting the Burden of History from Adaptive Query Processing
    Amol Deshpande and Joseph M. Hellerstein
  • Eddies:Routing Policy
    How to choose the best plan? Using routing
    Every tuple is routed individually
    Routing policy estabilish thesystem efficiency
    Eddy has a tuple buffer with priorità
    Initially they have low priority
    Exiting form an operator they have higher priority
    A tuple is sent to output as early as possible
    Avoid system memory congestion
    Allow low memory consumption
    49
  • Eddies’ Routing Policy:(old) Back-Pressure
    Approach Naive:
    Quick operator before
    50
    sel(s1) = sel(s2)
    cost(s2) = 5
    cost(s1) changes
    cost(s1) = cost(s2)
    sel(s2) = 50%
    sel(s1) changes
    Taken from: Avnur, Hellerstein
  • Eddies’ Routing Policy:Lottery Scheduling
    Waldspurger& Weihlin1994
    Algorithm to sheduling shared resources
    « rapidlyfocus availableresources»
    Every operator begin with N tickets
    Operator receive another ticket when take one tuple
    Promote operators which waste tuples fast
    Operator lose a ticket when returns one tuples
    Promote operators with lowselettività
    low: Operators that returns few tuples after processing many
    When two operators compete for a tuple
    The tuple is assigned to lottery winner operator
    Never let op without tickets + randomexploration
    51
  • Eddies’ Routing Policy:Lottery Scheduling
    Lottery Scheduling is better than Back-Pressure
    52
    cost(s1) = cost(s2)
    sel(s2) = 50%
    sel(s1) varia
    Taken from: Avnur, Hellerstein
  • Experiment
    53
    Stream: x with mean 40 and standard deviation 10
  • 54
    Experiment: Stream variation
    Stream: x with mean 10 and standard deviation 0
  • 55
    Experiment: Stream variation (2)
    Stream: x with mean 10 and standard deviation 0
  • Other Works
    Distribuited Eddies
    Freddies: DHT-Based Adaptive Query Processing via Federated Eddies
    Content-based Routing
    Partial Results for Online Query Processing
    Flux: An Adaptive Partitioning Operator for Continuous Query Systems
    Java Support for Data-Intensive Systems: Experiences Building the Telegraph Dataflow System
    Ripple Join for Online Aggregation
    Highly Available, Fault-Tolerant, Parallel Dataflows
    56
  • Bibliography
    TelegraphCQ: An Architectural Status Report
    Continuous Dataflow Processing for an Uncertain World
    Enabling Real-Time Querying of Live and Historical Stream Data
    Declarative Network Monitoring with an Underprovisioned Query Processor
    Lifting the Burden of History from Adaptive Query Processing [STAIRs]
    Eddies: Continuously Adaptive Query Processing
    Using State Modules for Adaptive Query Processing
    E altri… http://telegraph.cs.berkeley.edu/papers.html
    Telegraph team @ UC Berkley: Mike Franklin, Joe Hellerstein, Bryan Fulton, Sirish Chandrasekaran, Amol Deshpande, Ryan Huebsch, Edwin Mach, Garrett Jacobson,Sailesh Krishnamurthy, Boon Thau Loo, Nick Lanham, Sam Madden, Fred Reiss, Mehul Shah, Eric Shan, Kyle Stanek, Owen Cooper, David Culler, Lisa Hellerstein, Wei Hong, Scott Shenker, Torsten Suel, Ion Stoica, Doug Tygar, Hal Varian, Ron Avnur, David Yu Chen, Mohan Lakhamraju, Vijayshankar Raman
    Lottery Scheduling: Flexible Proportional-Share Resource Management
    Carl A. Waldspurger & William E. Weihl @ MIT
    57