Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Logging infrastructure for Microservices using StreamSets Data Collector

Logging infrastructure for Microservices using StreamSets Data Collector

  • Login to see the comments

  • Be the first to like this

Logging infrastructure for Microservices using StreamSets Data Collector

  1. 1. Logging infrastructure for MicroServices using StreamSets Data Collector Logging Infrastructure for microservices using StreamSets Data Collector Presenter: Virag Kothari Software Engineer at StreamSets
  2. 2. Open-Source Continuous Ingest
  3. 3. © 2015 StreamSets, Inc. All rights reserved. About StreamSets ● Headquartered in San Francisco, CA ● Deep expertise in enterprise data management and integration ○ Girish Pancha, CEO (Formerly Chief Product Officer at Informatica) ○ Arvind Prabhakar, CTO (Formerly Director, Engineering for Integration at Cloudera) ○ Team includes Apache PMC members for Flume, Sqoop, Hadoop, Oozie, Hive, Storm
  4. 4. © 2015 StreamSets, Inc. All rights reserved. Containerized services Run batch jobs, application jobs, microservices Logging is key in dynamic environments HBase/Cassandra HDFS/S3 Elasticsearch Docker Container Docker Container Kafka Application Flume/Logstash
  5. 5. © 2015 StreamSets, Inc. All rights reserved. Challenges Semi structured logs Semantic drift -> Schema changes -> Malformed records Infrastructure drift ->New apps with their own log format
  6. 6. © 2015 StreamSets, Inc. All rights reserved. StreamSets Data Collector (SDC) Pipeline Origin (Log Source) Processor Destination (Kafka) On success Kafka/Write to File On error Application Docker container
  7. 7. © 2015 StreamSets, Inc. All rights reserved. Handle semantic and infrastructure drift ● Built in transformations ● Scripting support ● Troubleshoot using snapshots ● Rules and alerting
  8. 8. © 2015 StreamSets, Inc. All rights reserved. Data at scale ● Streaming/Batch Cluster deployments ● Batch - MapReduce ● Streaming - Spark Streaming on Mesos and Yarn ● Storm, Samza and others?
  9. 9. © 2015 StreamSets, Inc. All rights reserved. Cluster pipeline Kafka Spark executor Task Task SDC SDC Yarn/Mesos HDFS/S3 HBase/Cassandra Hive Solr
  10. 10. © 2015 StreamSets, Inc. All rights reserved. Spark Streaming + Kafka Direct Approach One to one mapping between Kafka and RDD partitions Allocate executors equal to Kafka partitions Multiple tasks within executor Kafka partition RDD partition SDC
  11. 11. © 2015 StreamSets, Inc. All rights reserved. Spark on Yarn Client vs Cluster mode Fault tolerant driver Jars available through Distributed Cache Classloader isolation due to conflicting libraries
  12. 12. © 2015 StreamSets, Inc. All rights reserved. Spark on Mesos Mesos not a framework manager REST endpoint provided by Spark to manage the Mesos framework No Distributed Cache Fault-tolerance through pipeline-level retries
  13. 13. © 2015 StreamSets, Inc. All rights reserved. Thank you http://streamsets.com/careers/ We’re hiring... https://github.com/streamsets

×