Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

xPatterns on Spark, Shark, Mesos, Tachyon


Published on

Seattle Spark Meetup May 2014

Published in: Technology
  • Be the first to comment

xPatterns on Spark, Shark, Mesos, Tachyon

  1. 1. xPatterns on Spark, Shark, Tachyon and Mesos Seattle Spark Meetup May 2014
  2. 2. 2 Atigeo Confidential • xPatterns Architecture • xPatterns Infrastructure Evolution • Ingestion API & GUI (Demo) • Transformation API & GUI (Demo) • Jaws Http SharkServer API & GUI (Demo) • Export to NoSql API & GUI (Demo) • xPatterns dashboard application (Demo) • xPatterns monitoring and instrumentation (Demo) • ELT pipeline rebuilt on BDAS: from 0.8.0 to 0.9.1 • Lessons Learned, Tips & Tricks Agenda
  3. 3. 3 Atigeo Confidential
  4. 4. 4 Atigeo Confidential
  5. 5. 5 Atigeo Confidential • Hadoop -> Spark: faster distributed computing engine leveraging in-memory computation at a much lower operational cost, machine learning primitives, simpler programming model (Scala, Python, Java), faster job submission, shell for quick prototyping and testing, ideal for our iterative algorithms • Hive -> Shark: interactive queries on large datasets have become reasonable requests (in-memory caching yields 4-20x performance improvement, ELT script base migration required minimal effort (same familiar HiveQL, with a few exceptions) • NO resource manager - > Mesos: multiple workloads from multiple frameworks can co-exist and fairly consume the cluster resources (policy based). More mature than YARN, allows us to separate production from experimentation workloads, co-locates legacy Hadoop MR jobs, multiple Shark servers (Jaws), multiple Spark Job servers, mixed Hive and Shark queries (ELT), and establish priority queues: no more unmanageable contention and delayed execution while maximizing cluster utilization (dynamic scheduling) • No Cache -> Tachyon: in-memory distributed file system, with HDFS backup, resilience through lineage rather than replication, our out-of-process cache that survives Spark JVM restarts, allows for fine tuning performance and experimenting against cached warehouse tables without reload. Faster than in process cache due to delayed GC. Provides data sharing between multiple Spark/Shark jobs, efficient in-memory columnar storage with compression support for minimal footprint • Cloudera Manager Dashboards-> Ganglia: distributed monitoring system for dashboards with historical metrics data (CPU, RAM, disk I/O, network I/O) and Spark/Hadoop metrics. This is a nice addition to our Nagios (monitoring and alerts) and Graphite (instrumentation dashboards) xPatterns Infrastructure Evolution
  6. 6. 6 Atigeo Confidential • Highly available, scalable and resilient distributed download tool exposed through Restful API & GUI • Supports encryption/decryption, compression/decompression, automatic backup & restore (aws S3) and geo-failover (hdfs and S3 in both us-east and us-west ec2 regions) • Support multiple input sources: sftp, S3 and 450+ sources through Talend Integration • Configurable throughput (number of parallel Spark processors, in both fine-grained and coarse-grained Mesos modes) • File Transfer log and file transition state history for auditing purposes (pluggable persistence model, Cassandra/hdfs), configurable alerts, reports • Ingest + Backup: download + decompression + hdfs persistence + encryption + S3 upload • Restore: S3 download + decryption + decompress + hdfs persistence • Geo-failover: backup on S3 us-east + restore from S3 us-east into west-coast hdfs + backup on S3 us-west • Ingestion jobs can be resumed from any stage after failure (# of Spark task retries exhausted) • Http streaming API exposed for high-throughput push model ingestion (ingestion into Kafka pub-sub, batch Spark job for transfer into hdfs) Distributed Data Ingestion API & GUI
  7. 7. 7 Atigeo Confidential
  8. 8. 8 Atigeo Confidential T-Component API & GUI • Data Transformation component for building a data pipeline with monitoring and quality gates • Exposes all of Oozie’s action types and adds Spark (Java & Scala) and Shark (QL) stages • Uses patched Ooyala SparkJobServer (multiple contexts in same JVM bug fixed by us!) • Spark stage required to run code that accepts an xPatterns-managed Spark context (coarse-grained or fine-grained) as parameter • DAG and job execution info persistence in Hive Metastore • Exposes full API for job, stages, resources management and scheduled pipeline execution • Demo: submit a Spark driver program as Spark stage for transforming ingested hdfs files, submit a Shar stage for creating a Shark table and further transforming the datasets through HiveQL statements
  9. 9. 9 Atigeo Confidential • T-component DAG executed by Oozie • Spark and Shark stages executed through ssh actions • Spark stage sent to SparkJobServer • SharSk stage executed through shark CLI for now (SharkServer2 in the future) • Support for pySpark stage coming soon
  10. 10. 10 Atigeo Confidential • Jaws: a highly scalable and resilient restful (http) interface on top of a managed Shark session that can concurrently and asynchronously submit Shark queries, return persisted results (automatically limited in size or paged), execution logs and job information (Cassandra or hdfs persisted). • Jaws can be load balanced for higher availability and scalability and it fuels a web-based GUI that is integrated in the xPatterns Management Console (Warehouse Explorer) • Jaws exposes configuration options for fine-tuning Spark & Shark performance and running against a stand-alone Spark deployment, with or without Tachyon as in-memory distributed file system on top of HDFS, and with or without Mesos as resource manager • Provides different deployment recipes for all combinations of Spark, Mesos and Tachyon • Shark editor provides analysts, data scientists with a view into the warehouse through a metadata explorer, provides a query editor with intelligent features like auto-complete, a results viewer, logs viewer and historical queries for asynchronously retrieving persisted results, logs and query information for both running and historical queries • Adding web-style pagination and query cancellation, spray io http layer (REST on Akka) Jaws REST SharkServer & GUI
  11. 11. 11 Atigeo Confidential Jaws REST SharkServer & GUI
  12. 12. 12 Atigeo Confidential
  13. 13. 13 Atigeo Confidential Export to NoSql API • Datasets in the warehouse need to be exposed to high-throughput low-latency real-time APIs. Each application requires extra processing performed on top of the core datasets, hence additional transformations are executed for building data marts inside the warehouse • Exporter tool builds the efficient data model and runs an export of data from a Shark/Hive table to a Cassandra Column Family, through a custom Spark job with configurable throughput (configurable Spark processors against a Cassandra ring) (instrumentation dashboard embedded, logs, progress and instrumentation events pushed though SSE) • Data Modeling is driven by the read access patterns provided by an application engineer building dashboards and visualizations: lookup key, columns (record fields to read), paging, sorting, filtering • The end result of a job run is a REST API endpoint (instrumented, monitored, resilient, geo- replicated) that uses the underlying generated Cassandra data model and fuels the data in the dashboards • Configuration API provided for creating export jobs and executing them (ad-hoc or scheduled).
  14. 14. 14 Atigeo Confidential
  15. 15. 15 Atigeo Confidential Mesos/Spark cluster
  16. 16. 16 Atigeo Confidential Cassandra multi DC ring – write latency
  17. 17. 17 Atigeo Confidential Nagios monitoring
  18. 18. 18 Atigeo Confidential
  19. 19. 19 Atigeo Confidential Referral Provider Network • One of the many applications that we built for our largest healthcare customers using the xPatterns APIs and tools on the new upgraded infrastructure: ELT Pipeline, Jaws, Export to NoSql API. The dashboard for the RPN application was built using D3.js and angular against the generic api published by the export tool. • The application allows for building a graph of downstream and upstream referred and referring providers, grouped by specialty, with computed aggregates like patient counts, claim counts and total charged amounts. RPN is used for both fraud detection and for aiding a clinic buying decision, by following the busiest graph paths. • The dataset behind the app consists of 8 billion medical records, from which we extracted 1.7 million providers (Shark warehouse) and built 53 million relationships in the graph (persisted in Cassandra) • While we demo the graph building we will also look at the Graphite instrumentation dashboard for analyzing the runtime performance of the geo-replicated Cassandra read operations (latency in the 20-50ms range)
  20. 20. 20 Atigeo Confidential
  21. 21. 21 Atigeo Confidential Graphite – Cassandra multi DC ring
  22. 22. 22 Atigeo Confidential • 20 billion healthcare records, 200 TB of compressed hdfs data • Processing pipeline, a mixture of custom MR and mostly Hive scripts, converted to Spark and Shark, with performance gains of 3-4x (for disk intensive operations) to 20-40x for queries on cached tables (Spark cache or Tachyon which is slightly faster with added resilience benefits) • Daily processing reduced from 14 hours to 1.5hours! • Shark 0.8.1 does not support: map join auto-conversion, automatic calculation of number of reducers, reducer or map out phase disk spills, skew joins etc … we have to either manually fine tune the cluster and the query based on the specific dataset, or we are better off with Hive under these circumstances … so we use Mesos to manage Hadoop and Spark under the same cluster, mixing Hive and Shark workloads (demo) • 0.9.0 fixes many of the problems, but still requires patches! (spill & mesos fine-grained) • Tested against multiple cluster configurations of the same cost, using 3 types of instances: m1.xlarge (4c x 15GB), m2.4xlarge (8c x 68.4GB) and cc2.8xlarge (32c x 60.8GB). • Jaws config settings explained: set mapreduce.job.reduces=…, set shark.column.compress=true, spark.default.parallelism=384,, spark.shuffle.memoryFraction=0.6, spark.shuffle.consolidateFiles=true, spark.shuffle.spill=false|true, spark.mesos.coarse=false, spark.scheduler.mode=FAIR • Multiple sparkContexts in the same JVM, Mesos framework starvation bug ELT processing and data quality pipeline
  23. 23. 23 Atigeo Confidential • Export to Semantic Search API (solrCloud/lucene) • pySpark Job Server • pySpark  Shark/Tachyon interop (either) • pySpark  Spark SQL (1.0) interop (or) • Parquet columnar storage for warehouse data Coming soon …
  24. 24. 24 Atigeo Confidential Q & A Oh btw … we’re hiring!
  25. 25. © 2013 Atigeo, LLC. All rights reserved. Atigeo and the xPatterns logo are trademarks of Atigeo. The information herein is for informational purposes only and represents the current view of Atigeo as of the date of this presentation. Because Atigeo must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Atigeo, and Atigeo cannot guarantee the accuracy of any information provided after the date of this presentation. ATIGEO MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  26. 26. 26 Atigeo Confidential