Berkley’sTelegraphCQDetails about TelegraphCQAdaptivity: Eddies, SteMs and STAIRRouting Policy: Lottery Scheduling1Friday, July 15, 2011Alberto MinettiAdvanced Data Management @ DISIUniversità degli Studi di Genova
Data Stream Management SystemContinuous, unbounded, rapid, time-varying streams of data elementsOccur in a variety of modern applicationsNetwork monitoring and traffic engineeringSensor networks, RFID tagsTelecom call recordsFinancial applicationsWeb logs and click-streamsManufacturing processesDSMS = Data Stream Management System2
The Begin:TelegraphSeveral continuous queries Several data streamsAt the beginning in JavaThen C-based using PostgreSQLNo Distribuite SchedulingLevel of adaptivity doesn’t change against overloadingIgnore system resourcesData managemente fully in memory3
ReDesign:TelegraphCQDeveloped byBerkeley UniversityWritten in C/C++OpenBSD LicenseBased on PostgreSQL sourcesCurrent Version: 0.6 (PostgreSQL 7.3.2)Closed Project in 2006Several points of interest and featuresSoftware http://telegraph.cs.berkeley.eduPapers http://db.cs.berkeley.edu/telegraphCommercial spinoff Truviso4
TelegraphCQ ArchitecturePostgreSQL backendsMany TelegraphCQ front-endOnly one TelegraphCQ back-endFront-EndFork for every client connectionDoesn’t hold streamsParsing continuous query in the shared memoryBack-End has an EddyJoins query plans togetherCan be shared among queriesPut results in the shared memory5
TelegraphCQ ArchitectureShared MemoryQuery Plan QueueTelegraphCQBack EndTelegraphCQBack EndTelegraphCQ Front EndEddy Control QueuePlanner Parser ListenerModulesModulesQuery Result QueuesSplitMini-ExecutorProxyCQEddyCQEddy}SplitSplitCatalogScansScansShared Memory Buffer PoolWrappersTelegraphCQ Wrapper  ClearingHouseDisk6Taken from Michael Franklin, UC Berkeley
Tipologie di ModuliInput and Caching (Relations and Streams)Interface to external datasourceWrapper for HTML, XML, FileSystem, proxy P2PRemote Databases with caching supportQuery ExecutionNon-blocking of classical operators (sel, proj)SteMs, STAIRsAdaptive RoutingReoptimize plan during executionEddies: choose routing tupla per tuplaJuggle: ordering on the fly (per value or timestamp),Flux: routing among computer of a cluster7Fjords Framework
FjordsFramework inJava for Operators on Remote Data StreamsInterconnect modulesSupport queues among modulesNon-blockingSupport for Relazions and Streams8
Stream in TelegraphCQUnarchived StreamNever written on diskIn shared memory between executor and wrapperArchived StreamAppend-only method to send tuples to systemNo update, insert, delete; query aggregate with windowtcqtimeof type TIMESTAMP for window queriesWith constraint TIMESTAMPCOLUMN9
DDL: Create Stream10 CREATE STREAM measurements (     tcqtime   TIMESTAMP TIMESTAMPCOLUMN,      stationid INTEGER,      speed     REAL)     TYPE ARCHIVED; CREATE STREAM tinydb (     tcqtime     TIMESTAMP TIMESTAMPCOLUMN,     light       REAL,     temperature REAL)    TYPE UNARCHIVED;DROP STREAM measurements;
Acquisizione di DatiSources must identify before sending datasWrapper: user-defined functionsHow process datasInside Wrapper Clearinghouse processPush sourcesBegin a connection to TelegraphCQPull sourcesThewrapper begin the connectionDifferent Wrapper Data can merge in the same streamHeartbeat: Punctuated tuple without datas, only timestampWhen see a Punctuatedtuple no prior datas will come11
Wrappers nel WCHNon-process-blocking over network socket (TCP/IP)Wrapper funct called when there are datas on socketOr when there are datas on bufferIf funct return a tuple for time (classic iterator)Ritorns array ofPostgreSQL DatumInit(WrapperState*) allocate resources and stateNext(WrapperState*) tuples are in the WrapperStateDone(WrapperState*) free resources and destroy stateAll in the infrastructured memory of PostgreSQL12
DDL: Create Wrapper13 CREATE FUNCTION measurements_init(INTEGER)   RETURNS BOOLEAN  AS ‘libmeasurements.so,measurements_init’   LANGUAGE ‘C’; CREATE WRAPPER mywrapper (init=measurements_init,next=measurements_next,done=measurements_done);
HtmlGet and WebQueryServerHtmlGet allows the user to execute a series of Html GET or POST requests to arrive at a page, and then to extract data out of the page using a TElegraph Screen Scrapperdefinition file. Once extracted, this data can be output to a CSV file.Welcome to the TESS homepage. TESS is the TElegraph Screen Scrapper. It is part of The Telegraph Project at UC Berkeley. TESS is a program that takes data from web forms (like search engines or database queries) and turns it into a representation that is usable by a database query processor.http://telegraph.cs.berkeley.edu/tess/14
self-monitoring capabilityThree special streams: info about system stateSupport tointrospettive queriesDynamic catalogQueried as an any streamtcq_queries(tcqtime, qrynum, qid, kind, qrystr)tcq_operators(tcqtime, opnum, opid, numqrs, kind, 				qid, opstr, opkind, opdesc)tcq_queues(tcqtime, opid, qkind, kind)15
Example16Welcome to psql 0.2, the PostgreSQL interactive terminal.# CREATE SCHEMA traffic;# CREATE STREAM traffic.measurements(stationid INT, speed REAL,tcqtime TIMESTAMP TIMESTAMPCOLUMN) TYPE ARCHIVED;# ALTER STREAM traffic.measurements ADD WRAPPER csvwrapper;$ cat tcpdump.log | source.pl localhost 5533 csvwrapper,network.tcpdumDefault TelegraphCQ script written in Perl to simulate sources that send CSV datasDefault Port
Load Shedding CREATE STREAM … TYPE UNARCHIVED ON OVERLOAD ????BLOCK: stop it(default)DROP: drop tuplesKEEPCOUNTS: keep the count of dropped tuplesREGHIST: build a fixed-grid histog. of shedded tuplesMYHIST: build a MHIST (multidimensional histog.)WAVELET wavelet paramsBuild a wavelet histogramSAMPLE: keep a Reservoir Sample17
LoadShedding:SummaryStreamFor a stream named schema.streamAutomatically created two streamsschema.__stream_droppedschema.__stream_keptFor WAVELET, MYHIST, REHIST, COUNTSSchema contains:Summary datasSummary intervalFor SAMPLESame schema with column __samplemultKeep the number of effective tuples rappresented18
Quering TelegraphCQ:StreaQuel19Continuous Query for:Standard Relations inherit from PostgreSQLData stream windowed (sliding, hopping, jumping)RANGE BY specify the window dimensionSLIDE BY specify the update rateSTART AT specify whenthe query will beginoptionalSELECT stream.color, COUNT(*)FROM stream [RANGE BY ‘9’ SLIDE BY ‘1’]GROUP BY stream.colorwindowSTART OUPUT!1111122121221212121Adapted from Jarle Søberg
Quering TelegraphCQ:StreaQuel (2)wtime(*) returns the last timestamp in the windowRecoursive query using WITH [SQL 1999]StreaQuel doesn’t allow subqueries20SELECT  S.a, wtime(*)FROM   S [RANGE BY ’10 seconds’ SLIDE BY ’10 second’],   R [RANGE BY ’10 seconds’ SLIDE BY ’10 second’]WHERE  S.a = R.a;Data StreamWindow…10 seconds10 seconds
Net Monitor Windowed Query21SELECT	(CASE when outgoing = true	then src_ip else dst_ip end) as inside_ip ,	(CASE when outgoing = true	then dst_ip else src_ip end) as outside_ip,sum(bytes_sent) + sum(bytes_recv) as bytesFROM flow [RANGE BY $w SLIDE BY $w]GROUP BY inside_ip, outside_ipAll active connectionsSELECT src_ip, wtime(*),COUNT(DISTINCT(dst_ip||dst_port)) AS fanout,FROM flow [RANGE BY $w SLIDE BY $w]WHERE outgoing = falseGROUP BY src_ip ORDER BY fanout DESCLIMIT 100 PER WINDOW;100 sources with the max number of connectionsSELECT sum(bytes_sent), src_ip, wtime(*) AS nowFROM flow [RANGE BY $w SLIDE BY $w]WHERE outgoing = falseGROUP BY src_ip ORDER BY sum(bytes_sent) DESCLIMIT 100 PER WINDOW;100 most significant sources of trafficTaken from: İlkay Ozan Kay
EvolutionaryRevolutionaryAdaptive Query Processing:EvolutionStatic               Late                  Inter                    Intra                       PerPlans             Binding             Operator              Operator                TupleTraditional      Dynamic QEP    Query Scrambling         Xjoin, DPHJ,           EddiesDBMS              Parametric         Mid-query Reopt.       Convergent QP                                   Competitive        Progressive Opt.Taken from: AmolDeshpande, Vijayshankar Raman22
Adaptive Query Processing:Current BackgroundSeveral plansParametric QueriesContinuous QueriesFocus on incremental outputComplex Queries (10-20 relazions in join)Data Stream and asyncronous datasStatistics not availableInteractive Queries and user preferencesSharing of the stateSeveral operatorsXML data and textWide area federations23
Adaptive Query Processing:System RRepeatObservate environment: daily/weekly (runstats)Choose behaviour (optimizer)If current plan is not the best plan (analyzer)Actuate the new plan (executor)Cost-based optimizationRunstats-optimize-execute -> high coarsenessWeekly Adaptivity!Goal: adaptivity per tupleMerge the 4 worksMeasureActuateAnalyzePlanTaken from: Avnur, Hellerstein24
TelegraphCQ Executor:EddyIdea taken from fluid mechaniccontinuously adaptive query processing mechanismEddy is a routing operatorDelineate which modules must visit beforeAfter a tuple visit all modules, it can outputSee tuples before and after each module (operator)25MeasureEddyAnalyzeActuatePlanTaken from: Amol Deshpande, Vijayshankar Raman
Eddies:CorrectnessEvery tuple has two BitVectorEvery position correspond to an operatorReady: identify if tuple is ready for that operatorEddy can delineate which tuples send to an operatorDone: identify if tuple was processed by that operatorEddy avoids sending two times to same operatorWhen all Done bits are setted -> outputjoined tuple have Ready and Donein bitwise-ORSimple selections have Ready=complement(Done)26
Eddies:Simple Selection Routing27SSELECT	*FROM	SWHERE	S.a > 10  AND	S.b < 15  AND	S.c = 15σaσbS.b < 15 S.a > 10EddyσcS.c = 15 σaσbσcσaσbσc15 ; 0 ; 15S10 0 01 1 11 0 001 11 1 00 0 11 1 10 0 0a15b0ReadyDonec15Adapted from Manuel Hertlein
S >< TRelation Binary Join:R >< S >< TJoin order(R >< S) >< TAlright with direct access to datas (index or seq.)28R >< STRSTaken from Jarle Søberg
Stream Binary JoinR >< S >< T29Join order(R >< S) >< TBut if data are sent by sources...S >< TR >< SBlock or drop some tuples is inevitable!Taken from Jarle Søberg
On the fly optimization necessaryS >< T30Stream Binary JoinR >< S >< TOften stream changesReoptimize require lot of timeNon dynamic enough!R >< STaken from Jarle Søberg
Stream Binary Join:EddiesUsing an EddyInitial behaviour of  Telegraph31S >< TeddyR >< Stuple-based adaptivityConsider dynamic changes in the stramTaken from Jarle Søberg
Eddies:Sheduling Join problemSheduling on selectivity of joins doesn’t workExample32|S     E||EC|>>E and Carrive early; Sis delayedSECtimeTaken from Amol Deshpande
33|S      E||E C|SEHashTableE.NameHashTableS.NameEddySEOutputCHashTableC.CourseHashTableE.CourseEddy decides to route E to ECEC>>E and Carrive early; Sis delayedS0sent and received suggestS Join E is better optionSSES0S –S0ECtimeCSES0E(S –S0)EEddy learns the correct sizesToo Late !!Taken from Amol Deshpande
34State got embedded as aresult of earlier routing decisions|S      E||EC|SEHashTableE.NameHashTableS.NameECEddySEOutputCHashTableC.CourseHashTableE.CourseSEEC>>E and Carrive early; Sis delayedSECCSESEExecution Plan UsedQuery is executed using the worse plan!Too Late !!Taken from Amol Deshpande
STAIRStorage, Transformation and Access for Intermediate Results35S.Name STAIRBuild intoS.Name STAIRHashTableE.Name STAIRHashTableEddySOutputECHashTableHashTableE.Course STAIRC.Course STAIRProbe into E.Name STAIRShow internal state of join to eddyProvide primitive function to state managementDemotionPromotionOperation forInsertion (build)Lookup (probe)s1s1s1s1Taken from Amol Deshpande
STAIR: Demotion36e1e1e2c1e2s1e1e2c1e2s1e1S.Name STAIRHashTableE.Name STAIRs1Demoting di e2c1ae2:Simple projectionHashTablee1e2c1e2EddySs1e1EOutputCHashTablee2s1e1e2c1HashTablec1Can be tought of as undoing workE.Course STAIRC.Course STAIRAdapted from Amol Deshpande
STAIR: Promotion37Promotinge1 using ECe1e1e1c1e1e1c1S.Name STAIRHashTableE.Name STAIRs1Two arguments:A tuple
A join to be used to promote this tupleHashTablee1e1c1e2c1EddySEOutputCHashTablee2s1e1HashTablec1e1Can be tought of as precomputation of workE.Course STAIRC.Course STAIRAdapted from Amol Deshpande
Demotion OR Promotion38Taken from Lifting the Burden of History from Adaptive Query ProcessingAmol Deshpande and Joseph M. Hellerstein
Demotion AND PromotionTaken from Lifting the Burden of History from Adaptive Query ProcessingAmol Deshpande and Joseph M. Hellerstein39
40S.Name STAIRHashTable|S      E||EC|S0EEEHashTableEEECCCS0EEddy decides to route E to ECE.Course STAIR>>E and Carrive early; Sis delayedS0E.Name STAIRHashTableSEEEddySCEOutputCtimeEHashTableCEddy decides to migrateEEddy learns the correct selectivitiesBy promoting E using ECC.Course STAIRAdapted from Amol Deshpande
41|S      E||EC|HashTableECS0EE.Course STAIR>>E and Carrive early; Sis delayedS.Name STAIRHashTableSS0E.Name STAIRHashTableSS –S0S –S0EEddySC(S –S0) ECEOutputCtimeEHashTableCC.Course STAIRAdapted from Amol Deshpande
42ECSECS – S0SEECS0EECHashTableECSEE.Course STAIRS.Name STAIRHashTableSE.Name STAIRHashTableUNIONEddySEOutputCEHashTableCMost of the data isprocessed using thecorrect planC.Course STAIRAdapted from AmolDeshpande
STAIR:CorrectnessTheorem [3.1]: An eddy with STAIRs always produces the correct query result in spite of arbitrary applications of the promotion and demotion operations.STAIRs will produce wvery resul tupleThere wull be no spurious duplicates43Taken from Lifting the Burden of History from Adaptive Query ProcessingAmol Deshpande and Joseph M. Hellerstein
State ModuleA kind of temporary data repositoryHalf-Join operator that keep homogeneous tupleState inside the operators is Decisions-IndipendentSupport the operationsInsertion (build)Look-up (probe)Deletion (eviction) [useful for windows]Similar to a Mjoin but more adaptiveSharing of the state among other continuous queriesBut not storing intermediate resultsIncrease the computation cost significant44
Eddies Join with SteMs45TRSeddyRTSMore adaptivityEddy knows half-joinDifferent access methodIndex accessScan accessSimulate several joinOn overloading?Hash join (fast!!)Index join (mem limit)Join familty?Hash join (equi-join)B-tree join (<, <=, >)Parametrica query can be tought as a joinAdapted from Jarle Søberg
SteMs:Correctness 46RSCorrectness problem!Possibile duplicates!!Global unique sequence numberOnly younger can probeTaken from Jarle Søberg
SteMs sliding Window47SELECT * FROM Str [RANGE BY 5 Second 	   SLIDE BY 2 Second],     Relation,WHERE Relation.a = Str.aA|…….|18:49:36RB|…….|18:49:36C|…….|18:49:37A|…….|18:49:38Keep the state for the sliding window (eviction)At time 40, what will happen at time 42?Instead rebuild all hash table it remove only old tuples and add new tuplesB|…….|18:49:39C|…….|18:49:39A|…….|18:49:40B|…….|18:49:40C|…….|18:49:40Eviction!18:49:37A|…….|18:49:41B|…….|18:49:41
Binary Join, STAIR, SteMComparison48select *from customer c, orders o, lineitem lwhere c.custkey = o.custkey and    o.orderkey = l.orderkey and    c.nationkey = 1 and    c.acctbal > 9000 and    l.shipdate > date ’1996-01-01’Necessary RecomputationNO adaptivelineitemcoming with ascending shipdateInitial routing (O >< L) >< CTaken from Lifting the Burden of History from Adaptive Query ProcessingAmol Deshpande and Joseph M. Hellerstein
Eddies:Routing Policy How to choose the best plan? Using routingEvery tuple is routed individuallyRouting policy estabilish thesystem efficiencyEddy has a tuple buffer with prioritàInitially they have low priorityExiting form an operator they have higher priorityA tuple is sent to output as early as possibleAvoid system memory congestionAllow low memory consumption49
Eddies’ Routing Policy:(old) Back-PressureApproach Naive:Quick operator before50sel(s1) = sel(s2)cost(s2) = 5cost(s1) changescost(s1) = cost(s2)sel(s2) = 50%sel(s1) changesTaken from: Avnur, Hellerstein
Eddies’ Routing Policy:Lottery SchedulingWaldspurger& Weihlin1994Algorithm to sheduling shared resources« rapidlyfocus availableresources»Every operator begin with N ticketsOperator receive another ticket when take one tuplePromote operators which waste tuples fastOperator lose a ticket when returns one tuplesPromote operators with lowselettivitàlow: Operators that returns few tuples after processing manyWhen two operators compete for a tupleThe tuple is assigned  to lottery winner operatorNever let op without tickets + randomexploration51
Eddies’ Routing Policy:Lottery SchedulingLottery Scheduling is better than Back-Pressure52cost(s1) = cost(s2)sel(s2) = 50%sel(s1) variaTaken from: Avnur, Hellerstein
Experiment53Stream: x with mean 40 and standard deviation 10
54Experiment: Stream variationStream: x with mean 10 and standard deviation 0
55Experiment: Stream variation (2)Stream: x with mean 10 and standard deviation 0
Other WorksDistribuited EddiesFreddies: DHT-Based Adaptive Query Processing via Federated EddiesContent-based RoutingPartial Results for Online Query ProcessingFlux: An Adaptive Partitioning Operator for Continuous Query SystemsJava Support for Data-Intensive Systems: Experiences Building the Telegraph Dataflow SystemRipple Join for Online AggregationHighly Available, Fault-Tolerant, Parallel Dataflows56

Telegraph Cq English

  • 1.
    Berkley’sTelegraphCQDetails about TelegraphCQAdaptivity:Eddies, SteMs and STAIRRouting Policy: Lottery Scheduling1Friday, July 15, 2011Alberto MinettiAdvanced Data Management @ DISIUniversità degli Studi di Genova
  • 2.
    Data Stream ManagementSystemContinuous, unbounded, rapid, time-varying streams of data elementsOccur in a variety of modern applicationsNetwork monitoring and traffic engineeringSensor networks, RFID tagsTelecom call recordsFinancial applicationsWeb logs and click-streamsManufacturing processesDSMS = Data Stream Management System2
  • 3.
    The Begin:TelegraphSeveral continuousqueries Several data streamsAt the beginning in JavaThen C-based using PostgreSQLNo Distribuite SchedulingLevel of adaptivity doesn’t change against overloadingIgnore system resourcesData managemente fully in memory3
  • 4.
    ReDesign:TelegraphCQDeveloped byBerkeley UniversityWrittenin C/C++OpenBSD LicenseBased on PostgreSQL sourcesCurrent Version: 0.6 (PostgreSQL 7.3.2)Closed Project in 2006Several points of interest and featuresSoftware http://telegraph.cs.berkeley.eduPapers http://db.cs.berkeley.edu/telegraphCommercial spinoff Truviso4
  • 5.
    TelegraphCQ ArchitecturePostgreSQL backendsManyTelegraphCQ front-endOnly one TelegraphCQ back-endFront-EndFork for every client connectionDoesn’t hold streamsParsing continuous query in the shared memoryBack-End has an EddyJoins query plans togetherCan be shared among queriesPut results in the shared memory5
  • 6.
    TelegraphCQ ArchitectureShared MemoryQueryPlan QueueTelegraphCQBack EndTelegraphCQBack EndTelegraphCQ Front EndEddy Control QueuePlanner Parser ListenerModulesModulesQuery Result QueuesSplitMini-ExecutorProxyCQEddyCQEddy}SplitSplitCatalogScansScansShared Memory Buffer PoolWrappersTelegraphCQ Wrapper ClearingHouseDisk6Taken from Michael Franklin, UC Berkeley
  • 7.
    Tipologie di ModuliInputand Caching (Relations and Streams)Interface to external datasourceWrapper for HTML, XML, FileSystem, proxy P2PRemote Databases with caching supportQuery ExecutionNon-blocking of classical operators (sel, proj)SteMs, STAIRsAdaptive RoutingReoptimize plan during executionEddies: choose routing tupla per tuplaJuggle: ordering on the fly (per value or timestamp),Flux: routing among computer of a cluster7Fjords Framework
  • 8.
    FjordsFramework inJava forOperators on Remote Data StreamsInterconnect modulesSupport queues among modulesNon-blockingSupport for Relazions and Streams8
  • 9.
    Stream in TelegraphCQUnarchivedStreamNever written on diskIn shared memory between executor and wrapperArchived StreamAppend-only method to send tuples to systemNo update, insert, delete; query aggregate with windowtcqtimeof type TIMESTAMP for window queriesWith constraint TIMESTAMPCOLUMN9
  • 10.
    DDL: Create Stream10CREATE STREAM measurements ( tcqtime TIMESTAMP TIMESTAMPCOLUMN, stationid INTEGER, speed REAL) TYPE ARCHIVED; CREATE STREAM tinydb ( tcqtime TIMESTAMP TIMESTAMPCOLUMN, light REAL, temperature REAL) TYPE UNARCHIVED;DROP STREAM measurements;
  • 11.
    Acquisizione di DatiSourcesmust identify before sending datasWrapper: user-defined functionsHow process datasInside Wrapper Clearinghouse processPush sourcesBegin a connection to TelegraphCQPull sourcesThewrapper begin the connectionDifferent Wrapper Data can merge in the same streamHeartbeat: Punctuated tuple without datas, only timestampWhen see a Punctuatedtuple no prior datas will come11
  • 12.
    Wrappers nel WCHNon-process-blockingover network socket (TCP/IP)Wrapper funct called when there are datas on socketOr when there are datas on bufferIf funct return a tuple for time (classic iterator)Ritorns array ofPostgreSQL DatumInit(WrapperState*) allocate resources and stateNext(WrapperState*) tuples are in the WrapperStateDone(WrapperState*) free resources and destroy stateAll in the infrastructured memory of PostgreSQL12
  • 13.
    DDL: Create Wrapper13CREATE FUNCTION measurements_init(INTEGER) RETURNS BOOLEAN AS ‘libmeasurements.so,measurements_init’ LANGUAGE ‘C’; CREATE WRAPPER mywrapper (init=measurements_init,next=measurements_next,done=measurements_done);
  • 14.
    HtmlGet and WebQueryServerHtmlGetallows the user to execute a series of Html GET or POST requests to arrive at a page, and then to extract data out of the page using a TElegraph Screen Scrapperdefinition file. Once extracted, this data can be output to a CSV file.Welcome to the TESS homepage. TESS is the TElegraph Screen Scrapper. It is part of The Telegraph Project at UC Berkeley. TESS is a program that takes data from web forms (like search engines or database queries) and turns it into a representation that is usable by a database query processor.http://telegraph.cs.berkeley.edu/tess/14
  • 15.
    self-monitoring capabilityThree specialstreams: info about system stateSupport tointrospettive queriesDynamic catalogQueried as an any streamtcq_queries(tcqtime, qrynum, qid, kind, qrystr)tcq_operators(tcqtime, opnum, opid, numqrs, kind, qid, opstr, opkind, opdesc)tcq_queues(tcqtime, opid, qkind, kind)15
  • 16.
    Example16Welcome to psql0.2, the PostgreSQL interactive terminal.# CREATE SCHEMA traffic;# CREATE STREAM traffic.measurements(stationid INT, speed REAL,tcqtime TIMESTAMP TIMESTAMPCOLUMN) TYPE ARCHIVED;# ALTER STREAM traffic.measurements ADD WRAPPER csvwrapper;$ cat tcpdump.log | source.pl localhost 5533 csvwrapper,network.tcpdumDefault TelegraphCQ script written in Perl to simulate sources that send CSV datasDefault Port
  • 17.
    Load Shedding CREATESTREAM … TYPE UNARCHIVED ON OVERLOAD ????BLOCK: stop it(default)DROP: drop tuplesKEEPCOUNTS: keep the count of dropped tuplesREGHIST: build a fixed-grid histog. of shedded tuplesMYHIST: build a MHIST (multidimensional histog.)WAVELET wavelet paramsBuild a wavelet histogramSAMPLE: keep a Reservoir Sample17
  • 18.
    LoadShedding:SummaryStreamFor a streamnamed schema.streamAutomatically created two streamsschema.__stream_droppedschema.__stream_keptFor WAVELET, MYHIST, REHIST, COUNTSSchema contains:Summary datasSummary intervalFor SAMPLESame schema with column __samplemultKeep the number of effective tuples rappresented18
  • 19.
    Quering TelegraphCQ:StreaQuel19Continuous Queryfor:Standard Relations inherit from PostgreSQLData stream windowed (sliding, hopping, jumping)RANGE BY specify the window dimensionSLIDE BY specify the update rateSTART AT specify whenthe query will beginoptionalSELECT stream.color, COUNT(*)FROM stream [RANGE BY ‘9’ SLIDE BY ‘1’]GROUP BY stream.colorwindowSTART OUPUT!1111122121221212121Adapted from Jarle Søberg
  • 20.
    Quering TelegraphCQ:StreaQuel (2)wtime(*)returns the last timestamp in the windowRecoursive query using WITH [SQL 1999]StreaQuel doesn’t allow subqueries20SELECT S.a, wtime(*)FROM S [RANGE BY ’10 seconds’ SLIDE BY ’10 second’], R [RANGE BY ’10 seconds’ SLIDE BY ’10 second’]WHERE S.a = R.a;Data StreamWindow…10 seconds10 seconds
  • 21.
    Net Monitor WindowedQuery21SELECT (CASE when outgoing = true then src_ip else dst_ip end) as inside_ip , (CASE when outgoing = true then dst_ip else src_ip end) as outside_ip,sum(bytes_sent) + sum(bytes_recv) as bytesFROM flow [RANGE BY $w SLIDE BY $w]GROUP BY inside_ip, outside_ipAll active connectionsSELECT src_ip, wtime(*),COUNT(DISTINCT(dst_ip||dst_port)) AS fanout,FROM flow [RANGE BY $w SLIDE BY $w]WHERE outgoing = falseGROUP BY src_ip ORDER BY fanout DESCLIMIT 100 PER WINDOW;100 sources with the max number of connectionsSELECT sum(bytes_sent), src_ip, wtime(*) AS nowFROM flow [RANGE BY $w SLIDE BY $w]WHERE outgoing = falseGROUP BY src_ip ORDER BY sum(bytes_sent) DESCLIMIT 100 PER WINDOW;100 most significant sources of trafficTaken from: İlkay Ozan Kay
  • 22.
    EvolutionaryRevolutionaryAdaptive Query Processing:EvolutionStatic Late Inter Intra PerPlans Binding Operator Operator TupleTraditional Dynamic QEP Query Scrambling Xjoin, DPHJ, EddiesDBMS Parametric Mid-query Reopt. Convergent QP Competitive Progressive Opt.Taken from: AmolDeshpande, Vijayshankar Raman22
  • 23.
    Adaptive Query Processing:CurrentBackgroundSeveral plansParametric QueriesContinuous QueriesFocus on incremental outputComplex Queries (10-20 relazions in join)Data Stream and asyncronous datasStatistics not availableInteractive Queries and user preferencesSharing of the stateSeveral operatorsXML data and textWide area federations23
  • 24.
    Adaptive Query Processing:SystemRRepeatObservate environment: daily/weekly (runstats)Choose behaviour (optimizer)If current plan is not the best plan (analyzer)Actuate the new plan (executor)Cost-based optimizationRunstats-optimize-execute -> high coarsenessWeekly Adaptivity!Goal: adaptivity per tupleMerge the 4 worksMeasureActuateAnalyzePlanTaken from: Avnur, Hellerstein24
  • 25.
    TelegraphCQ Executor:EddyIdea takenfrom fluid mechaniccontinuously adaptive query processing mechanismEddy is a routing operatorDelineate which modules must visit beforeAfter a tuple visit all modules, it can outputSee tuples before and after each module (operator)25MeasureEddyAnalyzeActuatePlanTaken from: Amol Deshpande, Vijayshankar Raman
  • 26.
    Eddies:CorrectnessEvery tuple hastwo BitVectorEvery position correspond to an operatorReady: identify if tuple is ready for that operatorEddy can delineate which tuples send to an operatorDone: identify if tuple was processed by that operatorEddy avoids sending two times to same operatorWhen all Done bits are setted -> outputjoined tuple have Ready and Donein bitwise-ORSimple selections have Ready=complement(Done)26
  • 27.
    Eddies:Simple Selection Routing27SSELECT *FROM SWHERE S.a> 10 AND S.b < 15 AND S.c = 15σaσbS.b < 15 S.a > 10EddyσcS.c = 15 σaσbσcσaσbσc15 ; 0 ; 15S10 0 01 1 11 0 001 11 1 00 0 11 1 10 0 0a15b0ReadyDonec15Adapted from Manuel Hertlein
  • 28.
    S >< TRelationBinary Join:R >< S >< TJoin order(R >< S) >< TAlright with direct access to datas (index or seq.)28R >< STRSTaken from Jarle Søberg
  • 29.
    Stream Binary JoinR>< S >< T29Join order(R >< S) >< TBut if data are sent by sources...S >< TR >< SBlock or drop some tuples is inevitable!Taken from Jarle Søberg
  • 30.
    On the flyoptimization necessaryS >< T30Stream Binary JoinR >< S >< TOften stream changesReoptimize require lot of timeNon dynamic enough!R >< STaken from Jarle Søberg
  • 31.
    Stream Binary Join:EddiesUsingan EddyInitial behaviour of Telegraph31S >< TeddyR >< Stuple-based adaptivityConsider dynamic changes in the stramTaken from Jarle Søberg
  • 32.
    Eddies:Sheduling Join problemShedulingon selectivity of joins doesn’t workExample32|S E||EC|>>E and Carrive early; Sis delayedSECtimeTaken from Amol Deshpande
  • 33.
    33|S E||E C|SEHashTableE.NameHashTableS.NameEddySEOutputCHashTableC.CourseHashTableE.CourseEddy decides to route E to ECEC>>E and Carrive early; Sis delayedS0sent and received suggestS Join E is better optionSSES0S –S0ECtimeCSES0E(S –S0)EEddy learns the correct sizesToo Late !!Taken from Amol Deshpande
  • 34.
    34State got embeddedas aresult of earlier routing decisions|S E||EC|SEHashTableE.NameHashTableS.NameECEddySEOutputCHashTableC.CourseHashTableE.CourseSEEC>>E and Carrive early; Sis delayedSECCSESEExecution Plan UsedQuery is executed using the worse plan!Too Late !!Taken from Amol Deshpande
  • 35.
    STAIRStorage, Transformation andAccess for Intermediate Results35S.Name STAIRBuild intoS.Name STAIRHashTableE.Name STAIRHashTableEddySOutputECHashTableHashTableE.Course STAIRC.Course STAIRProbe into E.Name STAIRShow internal state of join to eddyProvide primitive function to state managementDemotionPromotionOperation forInsertion (build)Lookup (probe)s1s1s1s1Taken from Amol Deshpande
  • 36.
    STAIR: Demotion36e1e1e2c1e2s1e1e2c1e2s1e1S.Name STAIRHashTableE.NameSTAIRs1Demoting di e2c1ae2:Simple projectionHashTablee1e2c1e2EddySs1e1EOutputCHashTablee2s1e1e2c1HashTablec1Can be tought of as undoing workE.Course STAIRC.Course STAIRAdapted from Amol Deshpande
  • 37.
    STAIR: Promotion37Promotinge1 usingECe1e1e1c1e1e1c1S.Name STAIRHashTableE.Name STAIRs1Two arguments:A tuple
  • 38.
    A join tobe used to promote this tupleHashTablee1e1c1e2c1EddySEOutputCHashTablee2s1e1HashTablec1e1Can be tought of as precomputation of workE.Course STAIRC.Course STAIRAdapted from Amol Deshpande
  • 39.
    Demotion OR Promotion38Takenfrom Lifting the Burden of History from Adaptive Query ProcessingAmol Deshpande and Joseph M. Hellerstein
  • 40.
    Demotion AND PromotionTakenfrom Lifting the Burden of History from Adaptive Query ProcessingAmol Deshpande and Joseph M. Hellerstein39
  • 41.
    40S.Name STAIRHashTable|S E||EC|S0EEEHashTableEEECCCS0EEddy decides to route E to ECE.Course STAIR>>E and Carrive early; Sis delayedS0E.Name STAIRHashTableSEEEddySCEOutputCtimeEHashTableCEddy decides to migrateEEddy learns the correct selectivitiesBy promoting E using ECC.Course STAIRAdapted from Amol Deshpande
  • 42.
    41|S E||EC|HashTableECS0EE.Course STAIR>>E and Carrive early; Sis delayedS.Name STAIRHashTableSS0E.Name STAIRHashTableSS –S0S –S0EEddySC(S –S0) ECEOutputCtimeEHashTableCC.Course STAIRAdapted from Amol Deshpande
  • 43.
    42ECSECS – S0SEECS0EECHashTableECSEE.CourseSTAIRS.Name STAIRHashTableSE.Name STAIRHashTableUNIONEddySEOutputCEHashTableCMost of the data isprocessed using thecorrect planC.Course STAIRAdapted from AmolDeshpande
  • 44.
    STAIR:CorrectnessTheorem [3.1]: Aneddy with STAIRs always produces the correct query result in spite of arbitrary applications of the promotion and demotion operations.STAIRs will produce wvery resul tupleThere wull be no spurious duplicates43Taken from Lifting the Burden of History from Adaptive Query ProcessingAmol Deshpande and Joseph M. Hellerstein
  • 45.
    State ModuleA kindof temporary data repositoryHalf-Join operator that keep homogeneous tupleState inside the operators is Decisions-IndipendentSupport the operationsInsertion (build)Look-up (probe)Deletion (eviction) [useful for windows]Similar to a Mjoin but more adaptiveSharing of the state among other continuous queriesBut not storing intermediate resultsIncrease the computation cost significant44
  • 46.
    Eddies Join withSteMs45TRSeddyRTSMore adaptivityEddy knows half-joinDifferent access methodIndex accessScan accessSimulate several joinOn overloading?Hash join (fast!!)Index join (mem limit)Join familty?Hash join (equi-join)B-tree join (<, <=, >)Parametrica query can be tought as a joinAdapted from Jarle Søberg
  • 47.
    SteMs:Correctness 46RSCorrectness problem!Possibileduplicates!!Global unique sequence numberOnly younger can probeTaken from Jarle Søberg
  • 48.
    SteMs sliding Window47SELECT* FROM Str [RANGE BY 5 Second SLIDE BY 2 Second], Relation,WHERE Relation.a = Str.aA|…….|18:49:36RB|…….|18:49:36C|…….|18:49:37A|…….|18:49:38Keep the state for the sliding window (eviction)At time 40, what will happen at time 42?Instead rebuild all hash table it remove only old tuples and add new tuplesB|…….|18:49:39C|…….|18:49:39A|…….|18:49:40B|…….|18:49:40C|…….|18:49:40Eviction!18:49:37A|…….|18:49:41B|…….|18:49:41
  • 49.
    Binary Join, STAIR,SteMComparison48select *from customer c, orders o, lineitem lwhere c.custkey = o.custkey and o.orderkey = l.orderkey and c.nationkey = 1 and c.acctbal > 9000 and l.shipdate > date ’1996-01-01’Necessary RecomputationNO adaptivelineitemcoming with ascending shipdateInitial routing (O >< L) >< CTaken from Lifting the Burden of History from Adaptive Query ProcessingAmol Deshpande and Joseph M. Hellerstein
  • 50.
    Eddies:Routing Policy Howto choose the best plan? Using routingEvery tuple is routed individuallyRouting policy estabilish thesystem efficiencyEddy has a tuple buffer with prioritàInitially they have low priorityExiting form an operator they have higher priorityA tuple is sent to output as early as possibleAvoid system memory congestionAllow low memory consumption49
  • 51.
    Eddies’ Routing Policy:(old)Back-PressureApproach Naive:Quick operator before50sel(s1) = sel(s2)cost(s2) = 5cost(s1) changescost(s1) = cost(s2)sel(s2) = 50%sel(s1) changesTaken from: Avnur, Hellerstein
  • 52.
    Eddies’ Routing Policy:LotterySchedulingWaldspurger& Weihlin1994Algorithm to sheduling shared resources« rapidlyfocus availableresources»Every operator begin with N ticketsOperator receive another ticket when take one tuplePromote operators which waste tuples fastOperator lose a ticket when returns one tuplesPromote operators with lowselettivitàlow: Operators that returns few tuples after processing manyWhen two operators compete for a tupleThe tuple is assigned to lottery winner operatorNever let op without tickets + randomexploration51
  • 53.
    Eddies’ Routing Policy:LotterySchedulingLottery Scheduling is better than Back-Pressure52cost(s1) = cost(s2)sel(s2) = 50%sel(s1) variaTaken from: Avnur, Hellerstein
  • 54.
    Experiment53Stream: x withmean 40 and standard deviation 10
  • 55.
    54Experiment: Stream variationStream:x with mean 10 and standard deviation 0
  • 56.
    55Experiment: Stream variation(2)Stream: x with mean 10 and standard deviation 0
  • 57.
    Other WorksDistribuited EddiesFreddies:DHT-Based Adaptive Query Processing via Federated EddiesContent-based RoutingPartial Results for Online Query ProcessingFlux: An Adaptive Partitioning Operator for Continuous Query SystemsJava Support for Data-Intensive Systems: Experiences Building the Telegraph Dataflow SystemRipple Join for Online AggregationHighly Available, Fault-Tolerant, Parallel Dataflows56

Editor's Notes

  • #5 DBMS Open Source, PostgreSQL, da cui partire per implementare TelegraphCQ.Sviluppato alla Berkeley UniversityScritto in C/C++Licenza OpenBSDBasato sul codice di PostgreSQLVersione corrente: 2.1 su PostgreSQL 7.3.2Progetto chiuso nel 2006Importanti punti di interesse e caratteristicheSoftware http://telegraph.cs.berkeley.eduPapers http://db.cs.berkeley.edu/telegraphSpinoff commerciale Truviso
  • #8 Versioni non bloccanti di operatori classici (sel, proj)Eddies: decidono routing tupla per tuplaFlux instradano le tuple fra le macchine di un cluster per supportare il parallelismo, il bilanciamento del carico e la tolleranza ai guasti
  • #12 Le sorgenti devono identificarsi prima di inviare datiWrapper: funzioni user-definedCome devono essere processati i datiAll’interno del Wrapper Clearinghouse processPush sourcesIniziano una connessione con TelegraphCQPull sourcesIl wrapper inizia la connessionePull, per esempio si connette ad un server mail e controlla la posta ogni minutoDati da Wrapper differenti possono confluire nello stesso streamHeartbeat: tupla Punctuated senza dati, solo timestamp Quando arriva una tupla Punctuated non arriveranno dati antecedenti
  • #13 I dati arrivati possono essere solo parti di tuple quindi è necessario bufferizzarle, ad ogni chiamata non è detto che avremo una tuplaWrapperState serve per fare comunicare funzioni utente con il WCH,Se arrivano meno campi saranno di default a NULL, e se ne arrivano troppi saranno troncati
  • #45 Una sorta di repository temporaneo di datiOperatore di Half-Join che memorizza tuple omogeneeStatoindipendentedalle precedenti decisioni di routing (poiché non memorizza le tuple)Supporta le operazioniInserimento (build)Ricerca (probe)Cancellazione (eviction) utili per le windowSimili a Mjoin ma più adattiviSimile alla facile routing policy con le query con solo le selezioniSharing dello stato tra altre query continueNon memorizza i risultati intermediAumento del costo di computazione
  • #57 The opportunity to impove:Optimizers pick a single plan for a queryHowever, different subsets of data may have very different statistical propertiesMay be more efficient to use different plans for different subsets of data