Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DataXDay - Real-Time Access log analysis


Published on

At BlaBlaCar we have built a streaming platform to have fast insights about the usage of our services. I will show you how BlaBlaCar builds an automatic access log streaming analysis to improve the security and gain fine-grained knowledge of the platform usage.

Pierre Villard - BlaBlaCar

Published in: Technology
  • Be the first to comment

  • Be the first to like this

DataXDay - Real-Time Access log analysis

  1. 1. 2018.05.17 Real-time Access-log Analysis @ BlaBlaCar @DataXDay
  2. 2. Paris @BlaBlaCar since 2017 @ThomasLamirault Thomas Lamirault Software Architect Data @DataXDay
  3. 3. @DataXDay
  4. 4. Over 60 million members and growing 1.5 million joining each month @DataXDay
  5. 5. We are in 22 countries @DataXDay
  6. 6. Use cases Global solution Components details Concrete example Today’s agenda @DataXDay
  7. 7. Use cases @DataXDay
  8. 8. Security Enables us to identify crawling of our web site Attempted hack Product simplification Statistics of all our endpoint usage Simplify migration API Help our partner to use our API in a good way. Use cases @DataXDay
  9. 9. Global solution @DataXDay
  10. 10. Global solution NGINX Flink Kafka Data Lake @DataXDay
  11. 11. Components detail @DataXDay
  12. 12. Components detail NGINX Hindsight - Lua Flink Kafka Data Lake Schema registry Kafka connect @DataXDay
  13. 13. Flink Standalone mode Apply regex on free format text Build dynamically schema and push to schema registry & kafka Containerized (Rkt/Fleet, k8s migration on going) Serialize into avro format @DataXDay
  14. 14. Flink // Consumer FlinkKafkaConsumer010 <String> consumer = new FlinkKafkaConsumer010 <>( parameterTool.getRequired("topic") + topicVersion, new SimpleStringSchema (), consProps); DataStream<String> messageStream = env.addSource(consumer).setParallelism ( Integer.parseInt(customParamTool .getRequired( "consumer_parallelism" ), 10)).name("kafka_consumer" ); @DataXDay
  15. 15. Flink // FlatMap DataStream<byte[]> stream = messageStream.flatMap(new Regex2AvroFunction()).setParallelism( Integer.parseInt(customParamTool.getRequired("flatmap_parallelism"), 10)).name("avroserializer"); // Producer FlinkKafkaProducer010<byte[]> producer = new FlinkKafkaProducer010<>( parameterTool.getRequired("bootstrap.servers"), parameterTool.getRequired("target_topic"), new SerializationSchema<byte[]>() { [...] @DataXDay
  16. 16. Schema Registry → Kafka message represented by a key and a value → Schema registry will have two schemas for a topic - One for the key - One for the value → A schema is represented by a Json @DataXDay
  17. 17. Schema Registry - API GET : http://schema-registry/subjects ["accesslogs_avro-value","accesslogs_avro-key"] GET : http://schema-registry/subjects/accesslogs_avro-value/versions [1,2,3,4] @DataXDay
  18. 18. Schema Registry - API GET : http://schema/subjects/accesslogs_avro-value/versions/4 {"subject":"accesslogs_avro-value","version":4,"id":181,"schema":" {"type":"record","name":"accesslog","fields": [ {"name":"fieldname1","type":"string"}, {"name":"fieldname2","type":"int"}} ] } "} @DataXDay
  19. 19. Concrete example @DataXDay
  20. 20. Concrete example @DataXDay
  21. 21. Concrete example @DataXDay
  22. 22. Concrete example @DataXDay
  23. 23. Concrete example Help identify crawlers Fast identification of bugs / bad behaviors of a new release Detect bad API usages Help us to integrate new functionalities @DataXDay