Simultaneous Analysis of Massive Data
Streams in Real-Time and Batch
Anjana Fernando
Technical Lead
WSO2
Agenda
• How massive data streams created
• How to receive
• How to store
• How to analyze, batch vs real-time
• WSO2 Big ...
Massive Data Streams -> Data Streams with Big Data
What is Big Data?
❏ The 3 Vs
❏ Velocity
❏ Volume
❏ Variety
Image Source: http://akrayasolutions.com/big-data/
Where does it originate from?
• Machine logs
• Social media
• Archives
• Traffic information
• Weather data
• Sensor data ...
What do I do with it?
Create intelligence..
• Should I take an umbrella to work today?
• What is the best route to go back...
Protocols used to publish data..
• HTTP
• MQTT
• Zigbee
• Thrift
• Avro
• ProtoBuf
How to store the data?
• Relational databases
• Block data stores
-> HDFS
• Column oriented
-> HBase
-> Cassandra
• Docume...
How to analyse data?
• Two options:
-> Batch processing: Schedule data processing jobs
and receive the processed data late...
Analysing data..
• Batch processing
-> Apache Hadoop: Map/Reduce processing system
and a distributed file system
Analysing data..
• Batch processing - Data Warehouse
-> Apache Hive - Hadoop based framework for working
on large scale da...
Analysing data..
• Batch processing - In-Memory Computing
-> Apache Spark - Functional programming model, in-
memory compu...
Analysing data..
• Real-time processing - Stream Processing
-> Apache Storm - Distributed and fault-tolerant
Spouts Bolts
Analysing data..
• Real-time processing - Complex Event Processing
-> WSO2 Siddhi:
Big Data Architecture with WSO2..
• Data Streams
{
'name':'phone.retail.shop',
'version':'1.0.0',
'nickName': 'Phone_Retai...
Big Data Architecture with WSO2..
• WSO2 BAM
-> Data Receiver - High performance binary format data
publishing with Apache...
Big Data Architecture with WSO2..
• WSO2 BAM..
-> Activity Monitoring: Implemented using a custom indexing
mechanism to in...
Big Data Architecture with WSO2..
• WSO2 BAM..
-> Incremental Data Processing - Customized Hive to support
incremental dat...
Big Data Architecture with WSO2..
• WSO2 CEP
-> Same data receiver as BAM, where this is the point where the
same event is...
Demo
Questions?
Thank you!
WSO2Con Asia 2014 - Simultaneous Analysis of Massive Data Streams in real-time and Batch
Upcoming SlideShare
Loading in …5
×

WSO2Con Asia 2014 - Simultaneous Analysis of Massive Data Streams in real-time and Batch

728 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
728
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
20
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

WSO2Con Asia 2014 - Simultaneous Analysis of Massive Data Streams in real-time and Batch

  1. 1. Simultaneous Analysis of Massive Data Streams in Real-Time and Batch Anjana Fernando Technical Lead WSO2
  2. 2. Agenda • How massive data streams created • How to receive • How to store • How to analyze, batch vs real-time • WSO2 Big Data solution • Demo
  3. 3. Massive Data Streams -> Data Streams with Big Data
  4. 4. What is Big Data? ❏ The 3 Vs ❏ Velocity ❏ Volume ❏ Variety Image Source: http://akrayasolutions.com/big-data/
  5. 5. Where does it originate from? • Machine logs • Social media • Archives • Traffic information • Weather data • Sensor data (IoT)
  6. 6. What do I do with it? Create intelligence.. • Should I take an umbrella to work today? • What is the best route to go back home? • What are the current market trends? • Are my servers running healthily?
  7. 7. Protocols used to publish data.. • HTTP • MQTT • Zigbee • Thrift • Avro • ProtoBuf
  8. 8. How to store the data? • Relational databases • Block data stores -> HDFS • Column oriented -> HBase -> Cassandra • Document based -> MongoDB -> CouchDB • In-Memory -> VoltDB A C P
  9. 9. How to analyse data? • Two options: -> Batch processing: Schedule data processing jobs and receive the processed data later -> Real-time processing: The queries are executed and the results are retrieved instantly
  10. 10. Analysing data.. • Batch processing -> Apache Hadoop: Map/Reduce processing system and a distributed file system
  11. 11. Analysing data.. • Batch processing - Data Warehouse -> Apache Hive - Hadoop based framework for working on large scale data stores with SQL-like queries INSERT OVERWRITE TABLE UserTable SELECT userName, COUNT(DISTINCT orderID),SUM(quantity) FROM PhoneSalesTable WHERE version= "1.0.0" GROUP BY userName;
  12. 12. Analysing data.. • Batch processing - In-Memory Computing -> Apache Spark - Functional programming model, in- memory computing, claims 10x - 100x faster than Hadoop
  13. 13. Analysing data.. • Real-time processing - Stream Processing -> Apache Storm - Distributed and fault-tolerant Spouts Bolts
  14. 14. Analysing data.. • Real-time processing - Complex Event Processing -> WSO2 Siddhi:
  15. 15. Big Data Architecture with WSO2.. • Data Streams { 'name':'phone.retail.shop', 'version':'1.0.0', 'nickName': 'Phone_Retail_Shop', 'description': 'Phone Sales', 'metaData':[ {'name':'clientType','type':'STRING'} ], 'payloadData':[ {'name':'brand','type':'STRING'}, {'name':'quantity','type':'INT'}, {'name':'total','type':'INT'}, {'name':'user','type':'STRING'} ] } The common stream format used in both CEP and BAM; The stream definition contains the stream name, version and other attributes that makes up the stream.
  16. 16. Big Data Architecture with WSO2.. • WSO2 BAM -> Data Receiver - High performance binary format data publishing with Apache Thrift, shared with WSO2 CEP -> Data Storage - Cassandra for highly scalable data store -> Data Analyzer - Hive based batch processing
  17. 17. Big Data Architecture with WSO2.. • WSO2 BAM.. -> Activity Monitoring: Implemented using a custom indexing mechanism to instantly search for events of a specific activity in the system
  18. 18. Big Data Architecture with WSO2.. • WSO2 BAM.. -> Incremental Data Processing - Customized Hive to support incremental data processing: @Incremental (name="salesAnalysis" , tables="PhoneSalesTable") SELECT brandname, Count(DISTINCT orderid), Sum(quantity) FROM phonesalestable WHERE version = "1.0.0" GROUP BY brandname;
  19. 19. Big Data Architecture with WSO2.. • WSO2 CEP -> Same data receiver as BAM, where this is the point where the same event is sent to both servers, where BAM for batch processing and CEP for real-time processing of the same data streams -> Real-time in-memory processing, based on WSO2 Siddhi engine, with data adapters for receiving and sending event with different data types and transports, e.g. XML, JSON, Text, HTTP, JMS, SMTP
  20. 20. Demo
  21. 21. Questions?
  22. 22. Thank you!

×