SlideShare a Scribd company logo
1 of 19
Download to read offline
Introduction to DataStreams
Concepts
By:Dr.Sarita Tripathy
Assistant Professor
School of Computer engineering
KIIT Deemed to be University
What is a datastream?
• Golab & Oszu (2003): “A data stream is a real-time, continuous, ordered
(implicitly by arrival time or explicitly by timestamp) sequence of items.
It is impossible to control the order in which items arrive, nor is it
feasible to locally store a stream in its entirety.”
• Massive volumes of data, items arrive at a high rate.
Data Streams
• A data stream is a (potentially unbounded) sequence of tuples. Each
tuple consist of a set of attributes, similar to a row in database table.
• Transactional data streams: log interactions between entities
• Credit card: purchases by consumers from merchants
• Telecommunications: phone calls by callers to dialed parties
• Web: accesses by clients of resources at servers
• Measurement data streams: monitor evolution of entity states
• Sensor networks: physical phenomena, road traffic
• IP network: traffic at router interfaces
• Earth climate: temperature, moisture at weather stations
Examples of StreamSources
Before proceeding, let us consider some of the ways in which stream data arises aturally.
Sensor Data : Imagine a temperature sensor bobbing about in the ocean, sending back to a base
station a reading of the surface temperature each hour. The data produced by this sensor is a stream
of real numbers. Now we have 3.5 terabytes arriving every day, and we definitely need to think about
what can be kept in working storage and what can only be archived.
Image Data : Satellites often send down to earth streams consisting of many terabytes of images per
day. Surveillance cameras produce images with lower resolution than satellites, but there can be many
of them, each producing a stream of images at intervals like one second.
Internet and Web Traffic : A switching node in the middle of the Internet receives streams of IP
packets from many inputs and routes them to its outputs. Web sites receive streams of various types.
For example, Google receives several hundred million search queries per day. Yahoo! accepts billions
of “clicks” per day on its various sites.
Characteristics of DataStreams
• Characteristics
• Huge volumes of continuous data, possibly infinite
• Fast changing and requires fast, real-time response
• Data stream captures nicely our data processing needs of today
• Random access is expensive—single scan algorithm (can only have
one look)
• Store only the summary of the data seen thus far
• Most stream data are at pretty low-level or multi-dimensional in
nature, needs multi-level and multi-dimensional processing
Applications of data streamprocessing
• Data stream processing
• Process queries (compute statistics, activate alarms)
• Apply data mining algorithms
• Requirements
• Real-time processing
• One-pass processing
• Bounded storage (no complete storage of streams)
• Possibly consider several streams
• Let’s go deeper into some examples
• Network management
• Stock monitoring
Network management
Network management(cont.)
Stock monitoring
A data-stream-management system(DSMS)
• Streams may be archived in a large archival
store, but we assume it is not possible to answer
queries from the archival store.
• I t could be examined only under special
circumstances using time-consuming retrieval
processes.
• There is also a working store , into which
summaries or parts of streams may be placed,
and which can be used for answering queries.
• The working store might be disk, or it might be
main memory, depending on how fast we need
to process queries.
• But either way, it is of sufficiently limited
capacity that it cannot store all the data from all
the streams.
Generic DSMS Architecture
Updates to
Static Data
User
Queries
[Golab & Özsu 2003]
Input
Monitor
Output
Buffer
Query
Processor
Query
Reposi-
tory
Working
Storage
Summary
Storage
Static
Storage
Streaming
Inputs
Streaming
Outputs
Architecture: Stream QueryProcessing
SDMS (Stream Data
Management System)
Data Stream ManagementSystems
DBMS versus DSMS (Data Stream
Management System)
• Persistent relations
• One-time queries
• Random access
• “Unbounded” disk store
• Only current state matters
• No real-time services
• Relatively low update rate
• Data at any granularity
• Assume precise data
• Access plan determined by query
processor, physical DB design
• Transient streams
• Continuous queries
• Sequential access
• Bounded main memory
• Historical data is important
• Real-time requirements
• Possibly multi-GB arrival rate
• Data at fine granularity
• Data stale/imprecise
• Unpredictable/variable data arrival
and characteristics
Existing DSMS
Challenges of Stream DataProcessing
• Multiple, continuous, rapid, time-varying, ordered streams
• Main memory computations
• Queries are often continuous
• Evaluated continuously as stream data arrives
• Answer updated over time
• Queries are often complex
• Beyond element-at-a-time processing
• Beyond stream-at-a-time processing
• Beyond relational queries (scientific, data mining, OLAP)
• Multi-level/multi-dimensional processing and data mining
• Most stream data are at low-level or multi-dimensional in nature
Howto deal with Big Data Streams ?
Approximate answers toqueries
▪ When ?
• Queries needing unbounded memory
• Too much queries/too rapid streams/too high response time
requirements
• CPU limit
• Memory limit
• Solution : approximate answers to queries
• Sliding windows
• Sampling and load shedding
• Definition of synopsis
Streaming Computing
Approaches
• Two approaches for handling such streams
• Use a time window, and query the window as a static table
• When you can’t store collected data, or to keep track of historical data
• Sampling
• Filtering
• Counting

More Related Content

What's hot

Building tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systemsBuilding tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systemsRegunath B
 
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...Lucidworks
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architecturesDaniel Marcous
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDBMongoDB
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewPierre Baillet
 
Basic Introduction to Crate @ ViennaDB Meetup
Basic Introduction to Crate @ ViennaDB MeetupBasic Introduction to Crate @ ViennaDB Meetup
Basic Introduction to Crate @ ViennaDB MeetupJohannes Moser
 
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...Data Con LA
 
Data Management on Hadoop at Yahoo!
Data Management on Hadoop at Yahoo!Data Management on Hadoop at Yahoo!
Data Management on Hadoop at Yahoo!Seetharam Venkatesh
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingApache Apex
 
Using Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architectureUsing Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architectureOliver Buckley-Salmon
 

What's hot (10)

Building tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systemsBuilding tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systems
 
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architectures
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of view
 
Basic Introduction to Crate @ ViennaDB Meetup
Basic Introduction to Crate @ ViennaDB MeetupBasic Introduction to Crate @ ViennaDB Meetup
Basic Introduction to Crate @ ViennaDB Meetup
 
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
 
Data Management on Hadoop at Yahoo!
Data Management on Hadoop at Yahoo!Data Management on Hadoop at Yahoo!
Data Management on Hadoop at Yahoo!
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data Processing
 
Using Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architectureUsing Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architecture
 

Similar to Datastream management system1

Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streamshktripathy
 
Data Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceData Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceMercedes Coyle
 
Development of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data GridsDevelopment of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data Gridsjlorenzocima
 
Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1GurinderG
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh BellurSTS FORUM 2016
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeSaurabh K. Gupta
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introductionamiyadash
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applicationsAmit Kejriwal
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDogRedis Labs
 
What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?CQD
 
Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016Nisha Talagala
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that growGibraltar Software
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Lucidworks
 
Performance and Scalability Tuning
Performance and Scalability TuningPerformance and Scalability Tuning
Performance and Scalability TuningAndres March
 
Work with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMsWork with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMsMalin Weiss
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101Data Con LA
 

Similar to Datastream management system1 (20)

Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
Data Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceData Care, Feeding, and Maintenance
Data Care, Feeding, and Maintenance
 
Development of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data GridsDevelopment of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data Grids
 
Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1
 
Operational-Analytics
Operational-AnalyticsOperational-Analytics
Operational-Analytics
 
Dbms vs dsms
Dbms vs dsmsDbms vs dsms
Dbms vs dsms
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh Bellur
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?
 
Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that grow
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
 
Performance and Scalability Tuning
Performance and Scalability TuningPerformance and Scalability Tuning
Performance and Scalability Tuning
 
Work with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMsWork with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMs
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101
 

Recently uploaded

Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 

Recently uploaded (20)

Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 

Datastream management system1

  • 1. Introduction to DataStreams Concepts By:Dr.Sarita Tripathy Assistant Professor School of Computer engineering KIIT Deemed to be University
  • 2. What is a datastream? • Golab & Oszu (2003): “A data stream is a real-time, continuous, ordered (implicitly by arrival time or explicitly by timestamp) sequence of items. It is impossible to control the order in which items arrive, nor is it feasible to locally store a stream in its entirety.” • Massive volumes of data, items arrive at a high rate.
  • 3. Data Streams • A data stream is a (potentially unbounded) sequence of tuples. Each tuple consist of a set of attributes, similar to a row in database table. • Transactional data streams: log interactions between entities • Credit card: purchases by consumers from merchants • Telecommunications: phone calls by callers to dialed parties • Web: accesses by clients of resources at servers • Measurement data streams: monitor evolution of entity states • Sensor networks: physical phenomena, road traffic • IP network: traffic at router interfaces • Earth climate: temperature, moisture at weather stations
  • 4. Examples of StreamSources Before proceeding, let us consider some of the ways in which stream data arises aturally. Sensor Data : Imagine a temperature sensor bobbing about in the ocean, sending back to a base station a reading of the surface temperature each hour. The data produced by this sensor is a stream of real numbers. Now we have 3.5 terabytes arriving every day, and we definitely need to think about what can be kept in working storage and what can only be archived. Image Data : Satellites often send down to earth streams consisting of many terabytes of images per day. Surveillance cameras produce images with lower resolution than satellites, but there can be many of them, each producing a stream of images at intervals like one second. Internet and Web Traffic : A switching node in the middle of the Internet receives streams of IP packets from many inputs and routes them to its outputs. Web sites receive streams of various types. For example, Google receives several hundred million search queries per day. Yahoo! accepts billions of “clicks” per day on its various sites.
  • 5. Characteristics of DataStreams • Characteristics • Huge volumes of continuous data, possibly infinite • Fast changing and requires fast, real-time response • Data stream captures nicely our data processing needs of today • Random access is expensive—single scan algorithm (can only have one look) • Store only the summary of the data seen thus far • Most stream data are at pretty low-level or multi-dimensional in nature, needs multi-level and multi-dimensional processing
  • 6. Applications of data streamprocessing • Data stream processing • Process queries (compute statistics, activate alarms) • Apply data mining algorithms • Requirements • Real-time processing • One-pass processing • Bounded storage (no complete storage of streams) • Possibly consider several streams • Let’s go deeper into some examples • Network management • Stock monitoring
  • 10. A data-stream-management system(DSMS) • Streams may be archived in a large archival store, but we assume it is not possible to answer queries from the archival store. • I t could be examined only under special circumstances using time-consuming retrieval processes. • There is also a working store , into which summaries or parts of streams may be placed, and which can be used for answering queries. • The working store might be disk, or it might be main memory, depending on how fast we need to process queries. • But either way, it is of sufficiently limited capacity that it cannot store all the data from all the streams.
  • 11. Generic DSMS Architecture Updates to Static Data User Queries [Golab & Özsu 2003] Input Monitor Output Buffer Query Processor Query Reposi- tory Working Storage Summary Storage Static Storage Streaming Inputs Streaming Outputs
  • 12. Architecture: Stream QueryProcessing SDMS (Stream Data Management System)
  • 14. DBMS versus DSMS (Data Stream Management System) • Persistent relations • One-time queries • Random access • “Unbounded” disk store • Only current state matters • No real-time services • Relatively low update rate • Data at any granularity • Assume precise data • Access plan determined by query processor, physical DB design • Transient streams • Continuous queries • Sequential access • Bounded main memory • Historical data is important • Real-time requirements • Possibly multi-GB arrival rate • Data at fine granularity • Data stale/imprecise • Unpredictable/variable data arrival and characteristics
  • 16. Challenges of Stream DataProcessing • Multiple, continuous, rapid, time-varying, ordered streams • Main memory computations • Queries are often continuous • Evaluated continuously as stream data arrives • Answer updated over time • Queries are often complex • Beyond element-at-a-time processing • Beyond stream-at-a-time processing • Beyond relational queries (scientific, data mining, OLAP) • Multi-level/multi-dimensional processing and data mining • Most stream data are at low-level or multi-dimensional in nature
  • 17. Howto deal with Big Data Streams ?
  • 18. Approximate answers toqueries ▪ When ? • Queries needing unbounded memory • Too much queries/too rapid streams/too high response time requirements • CPU limit • Memory limit • Solution : approximate answers to queries • Sliding windows • Sampling and load shedding • Definition of synopsis
  • 19. Streaming Computing Approaches • Two approaches for handling such streams • Use a time window, and query the window as a static table • When you can’t store collected data, or to keep track of historical data • Sampling • Filtering • Counting