Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
×
1 of 54

ストリーミングデータのアドホック分析エンジンの比較

4

Share

Download to read offline

BigData-JAWS 勉強会#6 〜Kinesis祭り〜 https://jawsug-bigdata.connpass.com/event/52590/ 発表資料

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

ストリーミングデータのアドホック分析エンジンの比較

  1. 1. / @laclefyoshi / ysaeki@r.recruit.co.jp
  2. 2. • • • • • • 2
  3. 3. • 2011/04 • 2015/09 • • Druid (KDP, 2015) • RDB NoSQL ( , 2016; : HBase ) • ESP8266 Wi-Fi IoT (KDP, 2016) • • (WebDB Forum 2014) • Spark Streaming (Spark Meetup December; 2015) • Kafka AWS Kinesis (Apache Kafka Meetup Japan #1; 2016) • (FutureOfData; 2016) • Queryable State for Kafka Streams (Apache Kafka Meetup Japan #2; 2016) • Apache Spark ( Geek Night #11; 2016) 3
  4. 4. 5
  5. 5. 6
  6. 6. http://www.datascientist.or.jp/
  7. 7. 8
  8. 8. SQL 9
  9. 9. http://www.datascientist.or.jp/news/2015-11-20.html
  10. 10. Apache Spark SQL https://databricks.com/blog/2016/09/27/spark-survey-2016-released.html 11
  11. 11. • SQL • SELECT GROUPBY JOIN • • • AWS Kinesis Apache Kafka • & • • • 12
  12. 12. Kinesis Analytics PipelineDB MemSQL VoltDB 13
  13. 13. Kinesis Analytics the easiest way to process streaming data in real time with standard SQL 14
  14. 14. PipelineDB relational database that runs SQL queries continuously on streams, incrementally storing results in tables 15
  15. 15. MemSQL a high performance data warehouse designed for the cloud and on-premises 16
  16. 16. VoltDB modern applications processing millions of data points in milliseconds with 100% accuracy 17
  17. 17. ☓ ☓ 19 OSS
  18. 18. 20 NewSQL 1 CockroachDB
  19. 19. ☓ ☓ ☓ ☓ ☓ ☓ 21 JSON Web
  20. 20. ☓ ☓ (※) VARCHAR(N) JSON 22
  21. 21. JSON SELECT info#>'{features, 0, geometry, coordinates}' as coord FROM geo_view; SELECT info->'features'->0->'geometry'->'coordinates' as coord FROM geo_view; SELECT JSON_EXTRACT_JSON(info::features, 0, 'geometry', 'coordinates') FROM geo; SELECT JSON_EXTRACT_JSON(info, 'features', 0, 'geometry', 'coordinates') FROM geo; SELECT FIELD(info, 'features[0].geometry.coordinates') FROM geo; SELECT FIELD(ARRAY_ELEMENT(FIELD(info, 'features'), 0), 'geometry.coordinates') FROM geo;
  22. 22. Kinesis Analytics
  23. 23. Kinesis Analytics: Kinesis Stream 26
  24. 24. Kinesis Analytics: Bitcoin ASK: BID: LAST: Volume: https://api.bitcoinaverage.com/ticker/global/all 27
  25. 25. Kinesis Analytics: JSON 28
  26. 26. Kinesis Analytics: timestamp 1 1000 • 1 50KB • JSON Array 29
  27. 27. Kinesis Analytics: Web UI 30
  28. 28. Kinesis Analytics: SQL CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" ( min_ask INTEGER, max_bid INTEGER, avg_last INTEGER ); CREATE OR REPLACE PUMP "TEST_STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM MIN("ask") as min_ask, MAX("bid") as max_bid, AVG("last") as avg_last FROM "SOURCE_SQL_STREAM_001" GROUP BY PARTITION_KEY, FLOOR(("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00') SECOND / 120 TO SECOND); CREATE STREAM CREATE PUMP 31
  29. 29. Kinesis Analytics: Web UI 32
  30. 30. PipelineDB
  31. 31. PipelineDB: Kinesis CREATE STREAM bitcoins (info JSON); SELECT pipeline_kinesis.add_endpoint('input_stream', 'ap-northeast-1', '/path_to_credential_file'); SELECT pipeline_kinesis.consume_begin('input_stream', 'kinesis-stream-name', 'bitcoins', format := 'json'); CREATE CONTINUOUS VIEW bitcoins_view AS SELECT info FROM bitcoins; SELECT * FROM bitcoins_view LIMIT 10; 34 CREATE STREAM / SELECT pipeline_*.consume_begin CREATE CONTINUOUS VIEW
  32. 32. PipelineDB: Kafka CREATE STREAM bitcoins (info JSON); SELECT pipeline_kafka.add_broker('172.17.0.3:9092'); SELECT pipeline_kafka.consume_begin('test-bitcoin-j', 'bitcoins', format := ‘json'); CREATE CONTINUOUS VIEW bitcoins_view AS SELECT info FROM bitcoins; SELECT * FROM bitcoins_view LIMIT 10; CREATE STREAM / SELECT pipeline_*.consume_begin CREATE CONTINUOUS VIEW 35
  33. 33. MemSQL
  34. 34. MemSQL: Kafka CREATE TABLE bitcoins (info JSON); CREATE PIPELINE `test_kafka_bitcoin` AS LOAD DATA KAFKA '172.17.0.3:9092/test-bitcoin-j' INTO TABLE `bitcoins`; TEST PIPELINE test_kafka_bitcoin LIMIT 1; START PIPELINE test_kafka_bitcoin; CREATE VIEW bitcoins_view AS SELECT info FROM bitcoins; SELECT * FROM bitcoins LIMIT 10; SELECT * FROM bitcoins_view LIMIT 10; CREATE TABLE / CREATE PIPELINE CREATE VIEW 37
  35. 35. VoltDB
  36. 36. VoltDB: Kinesis CREATE TABLE bitcoins (info VARCHAR(5000)); CREATE VIEW bitcoins_view AS SELECT info FROM bitcoins; SELECT * FROM bitcoins LIMIT 10; SELECT * FROM bitcoins_view LIMIT 10; <deployment> <import> <configuration type="kinesis" format="csv" enabled="true"> <property name=“stream.name”> kinesis-stream-name </property> <property name=“region”> ap-northeast-1 </property> <property name="access.key"> ... </property> <property name="secret.key"> ... </property> <property name="procedure"> bitcoins.insert </property> </configuration> </import> </deployment> 39 CREATE TABLE / CREATE VIEW
  37. 37. VoltDB: Kafka CREATE TABLE bitcoins (info VARCHAR(5000)); CREATE VIEW bitcoins_view AS SELECT info FROM bitcoins; SELECT * FROM bitcoins LIMIT 10; SELECT * FROM bitcoins_view LIMIT 10; <deployment> <import> <configuration type="kafka" format="csv" enabled="true"> <property name=“topics”> test-bitcoin-j </property> <property name=“brokers"> 172.17.0.3:9092 </property> <property name="procedure"> bitcoins.insert </property> </configuration> </import> </deployment> 40 CREATE TABLE / CREATE VIEW
  38. 38. SQL JOIN
  39. 39. x JOIN 42 ☓ ☓ ☓ (※) CREATE VIEW
  40. 40. x JOIN 43 ☓ ☓ ☓ ☓ ☓ (※) CREATE VIEW
  41. 41. JOIN CREATE REFERENCE TABLE Nested JOIN : ... FROM A LEFT JOIN (B LEFT JOIN C ON B.col = C.col) ON A.col = B.col ... 44
  42. 42. SQL WINDOW
  43. 43. Tumbling Window Sliding Window
  44. 44. SQL Tumbling Window SELECT AVG(last) AS avg_last FROM bitcoins GROUP BY FLOOR(EXTRACT(MINUTE FROM row_timestamp) / 2); GROUP BY
  45. 45. SQL Sliding Window SELECT STREAM count(*) OVER lastHour FROM APP_STREAM WINDOW lastHour AS (PARTITION BY ... RANGE INTERVAL '1' HOUR PRECEDING); CREATE VIEW app_stream_view WITH (sw = '1 hour', step_factor = 50) AS SELECT count(*) FROM app_stream; SELECT count(*) OVER (PARTITION BY ... ORDER BY ...) FROM app_stream_table; timediff()
  46. 46. • SQL • SELECT GROUPBY JOIN • • • AWS Kinesis Apache Kafka • & • • • 50
  47. 47. Kinesis Analytics • Kinesis Stream KPL • Aggregated Record • Web UI • API aws • AddApplicationReferenceDataSource • S3 / UpdateApplication • S3 1GB
  48. 48. 52 

  49. 49. Appendix: ... http://jp.techcrunch.com/2016/09/23/20160922apple-acquires-another-machine-learning-company-tuplejump/ 54

×