Retrospection / prospection
and schema
TAGOMORI Satoshi (@tagomoris)
LINE Corp.
2014/01/31 (Fri) at University of Tsukuba
...
TAGOMORI Satoshi (@tagomoris)
LINE Corp.
Development Support Team
14年1月31日金曜日
14年1月31日金曜日
14年1月31日金曜日
Logs
Service metrics (Users, PageViews, ...)
UX/UI metrics (Access path, Taps/views, ...)
Monitoring metrics (Traffic Gbps,...
Software for Logging
Collection: Fluentd, Scribed, Flume, LogStash, ...
Storage: RDBMS, Hadoop HDFS, NoSQLs, Elasticsearch...
How inspect logs
Retrospection (reactive search)
Store data, and search
Prospection (proactive search)
Define what should b...
What logs inspected
Schema-full data:
strict schema: pre defined fields w/ types (or reject)
schema on read: try to read kno...
How/what
HowWhat

Schema-full

Schema-less

Retrospect

RDBMS,
Hive, BigQuery,
Cassandra, HBase, ...

MongoDB,
Hive(SerDe)...
Data size: schema & index
Logs: size is always important (xTB - xPB)
Schema:
size optimization
access optimization on memo...
Query response improvements
of retrospection
Schema-full + indexed (RDBMS)
Query plan optimization
Schema on read
I/O and ...
Query response improvements
of prospection

Time window + incremental calculation
Stream processing engines

14年1月31日金曜日
Stream processing
and data size
No disks: reduction of failure points
Less memory:
size of just processing and I/O buffers...
Stream processing and schema
Stream processing: query -> data
Prospective schema by queries:
Queries know required fields a...
My goal:
Schema-less data stream
+ schema-full queries

It’s Norikra!

14年1月31日金曜日
Upcoming SlideShare
Loading in …5
×

Retrospection / prospection and schema

4,073 views

Published on

筑波大学 集中講義資料
2014/01/31

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,073
On SlideShare
0
From Embeds
0
Number of Embeds
2,578
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Retrospection / prospection and schema

  1. 1. Retrospection / prospection and schema TAGOMORI Satoshi (@tagomoris) LINE Corp. 2014/01/31 (Fri) at University of Tsukuba the 1st half 14年1月31日金曜日
  2. 2. TAGOMORI Satoshi (@tagomoris) LINE Corp. Development Support Team 14年1月31日金曜日
  3. 3. 14年1月31日金曜日
  4. 4. 14年1月31日金曜日
  5. 5. Logs Service metrics (Users, PageViews, ...) UX/UI metrics (Access path, Taps/views, ...) Monitoring metrics (Traffic Gbps, TBytes/day, ...) System monitoring (Error rates, Response time, ...) 14年1月31日金曜日
  6. 6. Software for Logging Collection: Fluentd, Scribed, Flume, LogStash, ... Storage: RDBMS, Hadoop HDFS, NoSQLs, Elasticsearch, .... Processing: SQL, Hadoop MapReduce(Hive), Presto, Impala, ... Stream-Processing: Storm, Kafka, Norikra, ... Visualization: Kibana, Tableau Fnordmetric, GrowthForecast, Focuslight, ... Appliance: DHW + BI Tools Services: Google BigQuery, Treasure Data, ... 14年1月31日金曜日
  7. 7. How inspect logs Retrospection (reactive search) Store data, and search Prospection (proactive search) Define what should be processed, and store data 14年1月31日金曜日
  8. 8. What logs inspected Schema-full data: strict schema: pre defined fields w/ types (or reject) schema on read: try to read known fields (or ignore) Schema-less data: any fields (or ignore), any types (implicit/explicit conversion) fit for services in-development (all internet services!) 14年1月31日金曜日
  9. 9. How/what HowWhat Schema-full Schema-less Retrospect RDBMS, Hive, BigQuery, Cassandra, HBase, ... MongoDB, Hive(SerDe), TD, Plain text file, ... Prospect Esper, many of stream CEPs, ... Norikra, ... 14年1月31日金曜日
  10. 10. Data size: schema & index Logs: size is always important (xTB - xPB) Schema: size optimization access optimization on memory/disk Index: access optimization on memory/disk more memory/disk required hard to distribute 14年1月31日金曜日
  11. 11. Query response improvements of retrospection Schema-full + indexed (RDBMS) Query plan optimization Schema on read I/O and Task size optimization & scale out Schema-less + indexed (Mongo) mmap-ed index & data (!) 14年1月31日金曜日
  12. 12. Query response improvements of prospection Time window + incremental calculation Stream processing engines 14年1月31日金曜日
  13. 13. Stream processing and data size No disks: reduction of failure points Less memory: size of just processing and I/O buffers aggregation results Easy to distribute: stream duplication stream splitting by aggregation key 14年1月31日金曜日
  14. 14. Stream processing and schema Stream processing: query -> data Prospective schema by queries: Queries know required fields and its types Unused fields can be ignored Implicit type conversion available Schema-less data + schema-full queries 14年1月31日金曜日
  15. 15. My goal: Schema-less data stream + schema-full queries It’s Norikra! 14年1月31日金曜日

×