SlideShare a Scribd company logo
1 of 11
Distributed Database
SUBMITTED TO: Sir Hammad
SUBMITTED BY: Syed Umair Raza 8th Semester
THE UNIVERSITY OF LAHORE
Outline
• Data Stream Management System
• Streaming Operators & their implementation
• Query Processing
• Loadshadding & Query Approximation
Data Stream Management
• Stream data - Produced incrementally over time, rather than being available in full before its
processing begins.
• Examples:
Defination:
Data for almost any large-scale data-management task is continuously collected over a wide area, and at a
much greater rate than ever before.
Typically, data streams exhibit the following characteristics:
 infinite length
 continuous data arrival
 high data rates
 requirements for low-latency
 real-time query processing, and data that are usually time-stamped and generally arrive
in either temporal order or close to it.
Characteristics of Data Stream Management
Data Stream Management Languages
 SQL-based language called StreaQuel
 XML-QL based language NiagaraCQ
 SQL-based CQL (Continous Query Language)
Query Operators & Implementation
Query Operators
• The blocking/nonblocking properties of operators independent of the language in which
they are expressed, and
• The abstract properties of stream functions expressible by blocking/nonblocking
operators.
Blocking Query Operator: A blocking query operator is a query operator that is unable to
produce the first tuple of the output until it has seen the entire input.
Nonblocking Query Operator: A nonblocking query operator is one that produces all the
tuples of the output before it has detected the end of the input.
• Consider operators that take sequences (streams) as input and return sequences
(streams) as output. For instance consider an operator G that takes a sequence S as
input and produces a sequence G(S) as output:
Streaming Operator Functionality
S −→ −→ G(S)G
• G operates as an incremental transducer, which for each new input tuple in S, adds
zero, one, or several tuples to the output.
• Join operators problematic on streams
• May need to join arbitrarily far apart stream
tuples
• Operations on implicit / explicit windows
• Selections, (duplicate preserving) projections
are straightforward
• Local, per-element operators
• Duplicate eliminating projection is like grouping
• Projection needs to include ordering attribute
• No restriction for position ordered streams
Query Processing
• Declarative queries ->Logical query plan -> Physical Plan
• Directed Acyclic Graphs (nodes->operators, edges -> data flow)
• Queries sharing memory/streams combined to a single plan
• Scheduling
o FIFS, Round Robin – simple, not efficient
o Operators with higher throughput – low latency
o Operators with min processing & selectivity –
smaller queue
• Heartbeats & Punctuations
o Typically issued by sources
o Reduce amount of states needed by operators
o Prevent operators doing unnecessary tasks
o Query plans can also issue heartbeats to avoid
pipeline stalls and delayed results
o Random sampling
o Semantic load shedding to drop less important
o Objective is to minimize the drop in accuracy
o Challenging for complex query plan with multiple streams
and operators
Load Shedding & Approximation
• applications require real-time, or near real-time response and are characterized by high
speed arrival rate.
Two types of approximation that have been suggested are:
1. Max-Subset results
The objective is to maximize the size of the resulting join.
2. Sampled results
The objective is to provide a fair random sample of the join result
• The productivity of a tuple determines its contribution to the multi-way join.
• For simplicity, we denote the set of join tuples of n windows with the i-th window only
containing t to be TWi={t}.
Two Priority Measures:
• Maximum Subset To provide a maximum subset of the true result, we should shed the tuple
with least productivity in order to minimize the loss caused by load shedding.
• Random Sampling To provide a random sample of the true result, one may control the fraction
of the tuples produced by each tuple
Estimating Productivity:
• The sketching techniques to find approximating complex query answers. The class of queries
that they considered is of the form:
SELECT AGG FROM R1, . . . , Rr WHERE θ
where AGG is an arbitrary aggregate operator such as COUNT, SUM and θ represents the conjunction
of equi-join conditions.
References
• https://link.springer.com/referenceworkentry/10.1007%2F978-0-387-39940-9_137
• http://eecs.wsu.edu/~yinghui/mat/courses/spring%202016/Reading/chp5-
data%20stream%20management.pdf
• https://www.microsoft.com/en-us/research/publication/189-theory-stream-queries/

More Related Content

What's hot

User Interface Analysis and Design
User Interface Analysis and DesignUser Interface Analysis and Design
User Interface Analysis and Design Saqib Raza
 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed systemSunita Sahu
 
CS9222 Advanced Operating System
CS9222 Advanced Operating SystemCS9222 Advanced Operating System
CS9222 Advanced Operating SystemKathirvel Ayyaswamy
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed SystemSunita Sahu
 
Object Oriented Analysis Design using UML
Object Oriented Analysis Design using UMLObject Oriented Analysis Design using UML
Object Oriented Analysis Design using UMLAjit Nayak
 
Resource Allocation In Software Project Management
Resource Allocation In Software Project ManagementResource Allocation In Software Project Management
Resource Allocation In Software Project ManagementSyed Hassan Ali
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed SystemsRupsee
 
Chapter 17 - Distributed File Systems
Chapter 17 - Distributed File SystemsChapter 17 - Distributed File Systems
Chapter 17 - Distributed File SystemsWayne Jones Jnr
 
Human computer interaction-Memory, Reasoning and Problem solving
Human computer interaction-Memory, Reasoning and Problem solvingHuman computer interaction-Memory, Reasoning and Problem solving
Human computer interaction-Memory, Reasoning and Problem solvingN.Jagadish Kumar
 
Introduction to Microsoft Project 2010
Introduction to Microsoft Project 2010Introduction to Microsoft Project 2010
Introduction to Microsoft Project 2010Bhishma Bhatti
 
Process & Thread Management
Process & Thread  ManagementProcess & Thread  Management
Process & Thread ManagementVpmv
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bankpkaviya
 
Professional Software Development, Practices and Ethics
Professional Software Development, Practices and EthicsProfessional Software Development, Practices and Ethics
Professional Software Development, Practices and EthicsLemi Orhan Ergin
 
Operating System-Memory Management
Operating System-Memory ManagementOperating System-Memory Management
Operating System-Memory ManagementAkmal Cikmat
 

What's hot (20)

Software metrics
Software metricsSoftware metrics
Software metrics
 
Software Sizing
Software SizingSoftware Sizing
Software Sizing
 
User Interface Analysis and Design
User Interface Analysis and DesignUser Interface Analysis and Design
User Interface Analysis and Design
 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed system
 
CS9222 Advanced Operating System
CS9222 Advanced Operating SystemCS9222 Advanced Operating System
CS9222 Advanced Operating System
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed System
 
Object Oriented Analysis Design using UML
Object Oriented Analysis Design using UMLObject Oriented Analysis Design using UML
Object Oriented Analysis Design using UML
 
Resource Allocation In Software Project Management
Resource Allocation In Software Project ManagementResource Allocation In Software Project Management
Resource Allocation In Software Project Management
 
Spm unit 4
Spm unit 4Spm unit 4
Spm unit 4
 
Distributed shared memory ch 5
Distributed shared memory ch 5Distributed shared memory ch 5
Distributed shared memory ch 5
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
 
Chapter 17 - Distributed File Systems
Chapter 17 - Distributed File SystemsChapter 17 - Distributed File Systems
Chapter 17 - Distributed File Systems
 
Human computer interaction-Memory, Reasoning and Problem solving
Human computer interaction-Memory, Reasoning and Problem solvingHuman computer interaction-Memory, Reasoning and Problem solving
Human computer interaction-Memory, Reasoning and Problem solving
 
Introduction to Microsoft Project 2010
Introduction to Microsoft Project 2010Introduction to Microsoft Project 2010
Introduction to Microsoft Project 2010
 
Process & Thread Management
Process & Thread  ManagementProcess & Thread  Management
Process & Thread Management
 
Leaky bucket A
Leaky bucket ALeaky bucket A
Leaky bucket A
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bank
 
Professional Software Development, Practices and Ethics
Professional Software Development, Practices and EthicsProfessional Software Development, Practices and Ethics
Professional Software Development, Practices and Ethics
 
Operating System-Memory Management
Operating System-Memory ManagementOperating System-Memory Management
Operating System-Memory Management
 
Temporal databases
Temporal databasesTemporal databases
Temporal databases
 

Similar to Data Stream Management

Data Stream Management
Data Stream ManagementData Stream Management
Data Stream Managementk_tauhid
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareApache Apex
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)Apache Apex
 
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick ParkerDevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick ParkerR3
 
Lecutre-6 Datapath Design.ppt
Lecutre-6 Datapath Design.pptLecutre-6 Datapath Design.ppt
Lecutre-6 Datapath Design.pptRaJibRaju3
 
Stream Processing with Apache Apex
Stream Processing with Apache ApexStream Processing with Apache Apex
Stream Processing with Apache ApexPramod Immaneni
 
Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016Bhupesh Chawda
 
Real-time Stream Processing using Apache Apex
Real-time Stream Processing using Apache ApexReal-time Stream Processing using Apache Apex
Real-time Stream Processing using Apache ApexApache Apex
 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...HostedbyConfluent
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Flink Forward
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingApache Apex
 
Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014Lari Hotari
 
Design and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptxDesign and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptxSyed Zaid Irshad
 
Enabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speedEnabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speedShubham Tagra
 
Performance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovPerformance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovValeriia Maliarenko
 

Similar to Data Stream Management (20)

Data Stream Management
Data Stream ManagementData Stream Management
Data Stream Management
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
 
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick ParkerDevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
 
Lecutre-6 Datapath Design.ppt
Lecutre-6 Datapath Design.pptLecutre-6 Datapath Design.ppt
Lecutre-6 Datapath Design.ppt
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
Stream Processing with Apache Apex
Stream Processing with Apache ApexStream Processing with Apache Apex
Stream Processing with Apache Apex
 
Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016
 
Real-time Stream Processing using Apache Apex
Real-time Stream Processing using Apache ApexReal-time Stream Processing using Apache Apex
Real-time Stream Processing using Apache Apex
 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
 
Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014
 
Design and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptxDesign and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptx
 
Enabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speedEnabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speed
 
Performance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovPerformance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei Radov
 

Recently uploaded

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Data Stream Management

  • 1. Distributed Database SUBMITTED TO: Sir Hammad SUBMITTED BY: Syed Umair Raza 8th Semester THE UNIVERSITY OF LAHORE
  • 2. Outline • Data Stream Management System • Streaming Operators & their implementation • Query Processing • Loadshadding & Query Approximation
  • 3. Data Stream Management • Stream data - Produced incrementally over time, rather than being available in full before its processing begins. • Examples: Defination: Data for almost any large-scale data-management task is continuously collected over a wide area, and at a much greater rate than ever before.
  • 4. Typically, data streams exhibit the following characteristics:  infinite length  continuous data arrival  high data rates  requirements for low-latency  real-time query processing, and data that are usually time-stamped and generally arrive in either temporal order or close to it. Characteristics of Data Stream Management Data Stream Management Languages  SQL-based language called StreaQuel  XML-QL based language NiagaraCQ  SQL-based CQL (Continous Query Language)
  • 5. Query Operators & Implementation Query Operators • The blocking/nonblocking properties of operators independent of the language in which they are expressed, and • The abstract properties of stream functions expressible by blocking/nonblocking operators. Blocking Query Operator: A blocking query operator is a query operator that is unable to produce the first tuple of the output until it has seen the entire input. Nonblocking Query Operator: A nonblocking query operator is one that produces all the tuples of the output before it has detected the end of the input.
  • 6. • Consider operators that take sequences (streams) as input and return sequences (streams) as output. For instance consider an operator G that takes a sequence S as input and produces a sequence G(S) as output: Streaming Operator Functionality S −→ −→ G(S)G • G operates as an incremental transducer, which for each new input tuple in S, adds zero, one, or several tuples to the output.
  • 7. • Join operators problematic on streams • May need to join arbitrarily far apart stream tuples • Operations on implicit / explicit windows • Selections, (duplicate preserving) projections are straightforward • Local, per-element operators • Duplicate eliminating projection is like grouping • Projection needs to include ordering attribute • No restriction for position ordered streams
  • 8. Query Processing • Declarative queries ->Logical query plan -> Physical Plan • Directed Acyclic Graphs (nodes->operators, edges -> data flow) • Queries sharing memory/streams combined to a single plan • Scheduling o FIFS, Round Robin – simple, not efficient o Operators with higher throughput – low latency o Operators with min processing & selectivity – smaller queue • Heartbeats & Punctuations o Typically issued by sources o Reduce amount of states needed by operators o Prevent operators doing unnecessary tasks o Query plans can also issue heartbeats to avoid pipeline stalls and delayed results
  • 9. o Random sampling o Semantic load shedding to drop less important o Objective is to minimize the drop in accuracy o Challenging for complex query plan with multiple streams and operators Load Shedding & Approximation • applications require real-time, or near real-time response and are characterized by high speed arrival rate. Two types of approximation that have been suggested are: 1. Max-Subset results The objective is to maximize the size of the resulting join. 2. Sampled results The objective is to provide a fair random sample of the join result
  • 10. • The productivity of a tuple determines its contribution to the multi-way join. • For simplicity, we denote the set of join tuples of n windows with the i-th window only containing t to be TWi={t}. Two Priority Measures: • Maximum Subset To provide a maximum subset of the true result, we should shed the tuple with least productivity in order to minimize the loss caused by load shedding. • Random Sampling To provide a random sample of the true result, one may control the fraction of the tuples produced by each tuple Estimating Productivity: • The sketching techniques to find approximating complex query answers. The class of queries that they considered is of the form: SELECT AGG FROM R1, . . . , Rr WHERE θ where AGG is an arbitrary aggregate operator such as COUNT, SUM and θ represents the conjunction of equi-join conditions.