Log everything!Dr. Stefan Schadwinkel und Mike Lohmann                                          1	  
Who we are.               Dr. Stefan Schadwinkel                            Mike Lohmann                       Analytics  ...
Agenda. §  What we do. What we need to do. What we are doing. §  Requirement: Log everything! §  Infrastructure and tec...
Icans GmbH             Log everything   4	                                4
Numberfacts of PokerStrategy.com    7.600.000    Requests/Day                                       PokerStrategy.com     ...
Topics of this talk- How to use existing technologies and standards.   - Out of the box solution- Scalability and simplici...
What we do. §  We teach Poker. §  We create webapplications. §  We serve millions of users in different countries respe...
What we need to do. §  We need to try out other teaching topics, fast. §  We need to gather data from all of these „try ...
What we are doing. §  We build ECF (Education Community Framework). §  We (can) log everything! §  We (now) use Amazon ...
Requirement: Log everything. §  „Are you mad?!“ §  „Be more specific, please!“ §  „But what about the user‘s data?!“   ...
Logging Tools / Technologies   Producer          Transport           Storage            Analytics   Symfony2              ...
Logging Infrastructure     Producer              Transport          Storage      Analytics         Databases              ...
Producer           /Home    Page                   Controller                                PageHit-Event                ...
Producer §  LoggingComponent: Provides interfaces, filters and handlers §  LoggingBundle: Glues all together with Symfon...
Transport – First Try  §  Hey, if we use Hadoop, why not use Flume?      -  Part of the Ecosystem      -  Central config ...
Transport – First Try  §  But, .. wait!        -  Ecosystem? Just like Hadoop version numbers…        -  Admins say: Cent...
Transport – Second Try §  RabbitMQ vs. Flume Nodes     -  Each app server has ist own local RabbitMQ     -  The local Rab...
Transport – Second Try §  But, .. wait! We still need Sinks.      -  Custom crafted RabbitMQ consumers      -  We could w...
Storage – First Try                      §  Use out-of-the-box Hadoop (Cloudera)                      §  But:           ...
Storage – Second Try                       §  Use Amazon Webservices                       §  Provides flexible virtuali...
Storage – Storage Amazon S3                   §  Erlang RabbitMQ consumer simply copies the                     incoming ...
Storage – Storage Amazon S3                   §  S3 bucket receives many small, compressed log file chunks               ...
Analytics §  We want happy business users. §  We want to answer questions.        -  People want answers to questions th...
Analytics §  Remember MapReduce.        -  Custom Jobs.            -  Streaming: Use your favorite.            -  Java AP...
Analytics §  Cascalog is Clojure, Clojure is Lisp  (?<- (stdout)          [?person]       (age ?person ?age) … (< ?age 30...
Analytics §  We use Cascalog to preprocess and organize that incoming flow of log messages:                              ...
Analytics §  Let‘s run the Cascalog processing on Amazon EMR:    ./elastic-mapreduce --create --name „Log Message Compact...
Analytics §  After the Cascalog Query we have:   s3://[BUCKET]/icanslog/[WEBSITE]/icans.content/year=2012/month=10/day=01...
Analytics §  Now	  we	  can	  access	  the	  log	  data	  within	  Hive:                                                 ...
Analytics §  Now	  we	  can	  run	  Hive	  queries	  on	  the	  [WEBSITE]_icanslog_content	  table!	   §  But	  we	  als...
Analytics §  Now,	  get	  the	  stats:                                    15.10.12   31	                                 ...
Analytics §  We can now simply copy the data from S3 and import in any local analytical tool, like:    -  Excel (It must ...
Merci.         ?         Questions                     15.10.12   33	                                  33
Contacts.       Dr. Stefan Schadwinkel               Mike Lohmann  stefan.schadwinkel@icans-gmbh.com   mike.lohmann@icans-...
Tools/Technologies                     15.10.12   35	                                  35
ICANS GmbHValentinskamp 1820354 HamburgGermanyPhone:   +49 40 22 63 82 9-0Fax:     +49 40 38 67 15 92Web: www.icans-gmbh.c...
Upcoming SlideShare
Loading in …5
×

Log everything!

21,480 views

Published on

Slides of a talk at the International PHP Conference 2012 on how we successfully mastered the challenge to log everything and transport the logged data into different sinks for different needs.

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
21,480
On SlideShare
0
From Embeds
0
Number of Embeds
19,768
Actions
Shares
0
Downloads
18
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Log everything!

  1. 1. Log everything!Dr. Stefan Schadwinkel und Mike Lohmann 1  
  2. 2. Who we are. Dr. Stefan Schadwinkel Mike Lohmann Analytics ArchitekturAuthor (heise.de, Cereb.Cortex, EJN, J.Neurophysiol.) Author (PHPMagazin, IX, heise.de) Log everything 2   2
  3. 3. Agenda. §  What we do. What we need to do. What we are doing. §  Requirement: Log everything! §  Infrastructure and technologies. §  We want happy business users.   Log everything 3   3
  4. 4. Icans GmbH Log everything 4   4
  5. 5. Numberfacts of PokerStrategy.com 7.600.000 Requests/Day PokerStrategy.com Education since 2005 6.000.000 19 Languages Registered Users 2.800.000 700.000 PI/Day Posts/Day Log everything 5   5
  6. 6. Topics of this talk- How to use existing technologies and standards. - Out of the box solution- Scalability and simplicity of the solution - Ready to use scripts  - „Good enough“ for now!- Showing way from requirement to solution.- OpenSource Sf2 bundles for logging.- Livedemo. Log everything 6   6
  7. 7. What we do. §  We teach Poker. §  We create webapplications. §  We serve millions of users in different countries respecting a multitude of market rules. §  We make business decisions driven by complex data analytics. Log everything 7   7
  8. 8. What we need to do. §  We need to try out other teaching topics, fast. §  We need to gather data from all of these „try outs“ to accumulate them and build business decisions on their analysis. §  We need a bigger infrastructure to gather more data. §  We need to hire more (good) people! J Log everything 8   8
  9. 9. What we are doing. §  We build ECF (Education Community Framework). §  We (can) log everything! §  We (now) use Amazon S3 and Amazon EMR to have a scaling storage and map reduce solution. §  We hire (good) people! J Log everything 9   9
  10. 10. Requirement: Log everything. §  „Are you mad?!“ §  „Be more specific, please!“ §  „But what about the user‘s data?!“   Log everything 10   10
  11. 11. Logging Tools / Technologies Producer Transport Storage Analytics Symfony2 Now: Now: MapReduce RabbitMQ S3 Storage Hive Application Erlang Consumer Hadoop via Server and Amazon BI via QlikView Was: EMR Databases Flume Was: Virtualized Inhouse Hadoop 15.10.12 11   11
  12. 12. Logging Infrastructure Producer Transport Storage Analytics Databases Hadoop - Cluster QlikView   App Reverse 1-xLB Proxy S3 Graylog   Consumer Zabbix   Rabbit MQ 15.10.12 12   12
  13. 13. Producer /Home Page Controller PageHit-Event PageHit Event Shovel Logger::log() Listener Monolog- Local Logger RabbitMQ Processor Formatter LogMessage, JSON Handler 15.10.12 13   13
  14. 14. Producer §  LoggingComponent: Provides interfaces, filters and handlers §  LoggingBundle: Glues all together with Symfony2   h=ps://github.com/ICANS/IcansLoggingComponent   h=ps://github.com/ICANS/IcansLoggingBundle     15.10.12 14   14
  15. 15. Transport – First Try §  Hey, if we use Hadoop, why not use Flume? -  Part of the Ecosystem -  Central config -  Extensible via Plugins -  Flexible Flow Configuration -  How? : Flume Nodes à Flume Sinks 15.10.12 15   15
  16. 16. Transport – First Try §  But, .. wait! -  Ecosystem? Just like Hadoop version numbers… -  Admins say: Central config woes! -  issues: multi-master, logical vs. physical nodes, Java heap space, etc. -  Will my plugin run with flume-ng? -  Ever tried to keep your complex flow and switch reliability levels? Read: Our admins still hate me … 15.10.12 16   16
  17. 17. Transport – Second Try §  RabbitMQ vs. Flume Nodes -  Each app server has ist own local RabbitMQ -  The local RabbitMQ shovels ist data to a central RabbitMQ cluster -  Similar to the Flume Node concept -  Decentralized config: Producers and consumers simply connect 15.10.12 17   17
  18. 18. Transport – Second Try §  But, .. wait! We still need Sinks. -  Custom crafted RabbitMQ consumers -  We could write them in PHP, but .. -  Erlang, teh awesome! -  Battle-hardened OTP framework. -  „Let it crash!“ .. and recover. -  Hot code change. If you want. Read: Runs forever. 15.10.12 18   18
  19. 19. Storage – First Try §  Use out-of-the-box Hadoop (Cloudera) §  But: -  Virtualized Infrastructure -  Unknown usage patterns Hadoop -  Must be cost effective -  Major Hadoop version upgrades 15.10.12 19   19
  20. 20. Storage – Second Try §  Use Amazon Webservices §  Provides flexible virtualized infrastructure §  Cost-effective storage: S3 Amazon S3 §  Hadoop on demand: EMR 15.10.12 20   20
  21. 21. Storage – Storage Amazon S3 §  Erlang RabbitMQ consumer simply copies the incoming data to S3 - Easy: exchange „hadoop“ command with „s3cmd“ Amazon S3 15.10.12 21   21
  22. 22. Storage – Storage Amazon S3 §  S3 bucket receives many small, compressed log file chunks §  Amazon provides s3DistCp which does distributed data copy: -  Aggregate many small files into partitioned large chunks Amazon S3 -  Change compression 15.10.12 22   22
  23. 23. Analytics §  We want happy business users. §  We want to answer questions. -  People want answers to questions they have. Now. -  No, they couldn‘t tell you that question yesterday. If they had known, they would have already asked for the answer. Yesterday. §  We also want data-driven applications. -  Production system analysis. -  Fraud prevention. -  Recommendations. -  Social metrics for our users.   15.10.12 23   23
  24. 24. Analytics §  Remember MapReduce. -  Custom Jobs. -  Streaming: Use your favorite. -  Java API: Cascading. Use your favorite: Java, Groovy, Clojure, Scala. -  Data Queries. -  Hive: similar to SQL. -  Pig: Data flow. -  Cascalog: Datalog-like QL using Clojure and Cascading.   15.10.12 24   24
  25. 25. Analytics §  Cascalog is Clojure, Clojure is Lisp (?<- (stdout) [?person] (age ?person ?age) … (< ?age 30)) Query Cascading Columns of „Generator“ „Predicate“ Operator Output Tap the dataset generated by the query §  as many as you want §  both can be any clojure function §  clojure can call anything that is available within a JVM 15.10.12 25   25
  26. 26. Analytics §  We use Cascalog to preprocess and organize that incoming flow of log messages: 15.10.12 26   26
  27. 27. Analytics §  Let‘s run the Cascalog processing on Amazon EMR: ./elastic-mapreduce --create --name „Log Message Compaction" --bootstrap-action s3://[BUCKET]/mapreduce/configure-daemons --num-instances $NUM --slave-instance-type m1.large --master-instance-type m1.large --jar s3://[BUCKET]/mapreduce/compaction/icans-cascalog.jar --step-action TERMINATE_JOB_FLOW --step-name "Cascalog" --main-class icans.cascalogjobs.processing.compaction --args "s3://[BUCKET]/incoming/*/*/*/","s3://[BUCKET]/icanslog","s3://[BUCKET]/icanslog-error 15.10.12 27   27
  28. 28. Analytics §  After the Cascalog Query we have: s3://[BUCKET]/icanslog/[WEBSITE]/icans.content/year=2012/month=10/day=01/part-00000.lzo Hive  ParSSoning!   15.10.12 28   28
  29. 29. Analytics §  Now  we  can  access  the  log  data  within  Hive: 15.10.12 29   29
  30. 30. Analytics §  Now  we  can  run  Hive  queries  on  the  [WEBSITE]_icanslog_content  table!   §  But  we  also  want  to  store  the  result  to  S3. 15.10.12 30   30
  31. 31. Analytics §  Now,  get  the  stats: 15.10.12 31   31
  32. 32. Analytics §  We can now simply copy the data from S3 and import in any local analytical tool, like: -  Excel (It must really make business people happy…) -  QlikView (Anyone can be happy with it…) -  R (If I want an answer…) 15.10.12 32   32
  33. 33. Merci. ? Questions 15.10.12 33   33
  34. 34. Contacts. Dr. Stefan Schadwinkel Mike Lohmann stefan.schadwinkel@icans-gmbh.com mike.lohmann@icans-gmbh.com ICANS_StScha mikelohmann 15.10.12 34   34
  35. 35. Tools/Technologies 15.10.12 35   35
  36. 36. ICANS GmbHValentinskamp 1820354 HamburgGermanyPhone: +49 40 22 63 82 9-0Fax: +49 40 38 67 15 92Web: www.icans-gmbh.com   36  

×