Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How To Use Scala At Work - Airframe In Action at Arm Treasure Data

349 views

Published on

ScalaMatsuri 2019 presentation. June 29, 2019

Published in: Technology
  • Be the first to comment

  • Be the first to like this

How To Use Scala At Work - Airframe In Action at Arm Treasure Data

  1. 1. Taro L. Saito, Ph.D. Arm Treasure Data June 29, 2019 Scala Matsuri 2019 - Tokyo How To Use Scala At Work Airframe In Action At Arm Treasure Data 1calaを仕事で使おう - Arm reasure DataでのAirframe活用事例

  2. 2. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. About Me: Taro L. Saito (Leo) 2 ● Principal Software Engineer at Arm Treasure Data ● Building distributed query engine service ● Living in US for 4 years ● DBMS & Data Science Background ● Ph.D. of Computer Science ● Database Systems and Genome Sciences Research ● Assistant Professor at the University of Tokyo ● OSS Projects Around Scala ● sbt-sonatype: used for releasing 3000+ Scala projects ● snappy-java: a compression library used in Spark, Parquet, etc. 自己紹介

  3. 3. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. New Release from O’Reilly Japan ● Helped Japanese translation of Data-Intensive Application Design ● Techniques and concepts around distributed data processing systems ● Available at Amazon.co.jp and O’Reilly Japan web sites ● will be published on July 18, 2019 3 分散データシステム入門の決定版の翻訳が来月発売

  4. 4. 400+ Customers Founded in 2011 Raised $54M Security Acquired by Arm / Softbank 2018 Arm Treasure Data Arm reasure Dataの概要

  5. 5. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. The Architecture of Arm Treasure Data 5 DataLogs Device Data Batch Data PlazmaDB Table Schema Data Collection Cloud Storage Distributed Data Processing 2 million records / sec. 130 trillion records 1 billion rows processed / sec. Jobs Job Management SQL Editor Scheduler Workflows Machine Learning Treasure Data OSS Third Party OSS reasure Dataのシステム構成。 calaはどこに?

  6. 6. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Module Mix-InPackaging HTTP Requests and Responses Data airframe-launcher > _ airframe-log production: port: 10010 user: xxxx ... airframe-config airframe-codec sbt-pack airframe-fluentd Scala Objects Table Data (CSV, TSV) JSON airframe-jsonairframe-surface airframe-tablet airframe-jmx Monitor Runtime States Generate Mapping Codec Metrics & Log Data JDBC ResultSets airframe-jdbc airframe-http airframe-http-finagle Launch HTTP Services airframe DI Debug Logs Schema-On-Read Mapping Airframe サービスの裏側で使われているAirframe ( cala製 ) のモジュール群

  7. 7. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Our OSS Strategy Around Scala ● Gather the best practices of Scala into Airframe OSS ● Get the real experiences by operating 24/7 services 7 Knowledge Experiences Design Decisions Products 24/7 Services Business Values Programming OSS Outcome Airframeを核にした cala周辺の 戦略
 Airframe
  8. 8. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. ● Various internal and third-party Scala/Java libraries ● Managed in different repositories, different release cycles ● High-learning cost ■ The knowledge is confined to engineers’ brains 3 Years Ago... 8 Knowledge Experiences Design Decisions Products 24/7 Services Business Values Programming Various Libraries Outcome 3年前、Airframeは存在せず、様々なライブラリが混在していた
 logger launcher object mapper JDBC reader json4s jackson ….
  9. 9. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. 5 Years Ago... ● No Scala engineer in the company ● Scala in 2014: Scala 2.9.x ● Was not good enough to use: ■ e.g., no string interpolation like s”... ${x}...” 9 Knowledge Experiences Design Decisions Products 24/7 Services Business Values Programming Ruby, Java Outcome 5年前には calaのエンジニアも、 calaのコードもなかった

  10. 10. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Today’s Agenda ● How to introduce Scala to your company ● Learn the best practices of using Scala at work ● From 20 Airframe modules 10本日紹介する内容
 Airframe
  11. 11. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. How Can We Introduce Scala? ● Saying “I want to use Scala” ● It will not work, especially if you or your team are not familiar with Scala ● Your managers need more information whether it’s good enough or not ● Even if you are a tech lead: ● Need some confidence in using Scala in production ● How can we establish such confidence in using Scala? 11calaをどう導入するか? calaを使っても良いという自信を得るには?

  12. 12. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Start With A Small Investment to Scala ● Guidelines ● Think how you can save your time with Scala ● If you can save 1 minute in a day, your can spend 6 hours for this improvement ■ Save 1 minute / day = 365 minutes / year = 6 hour investment ■ Save 10 minutes / week = 520 minutes / year = 8.6 hour investment ■ Save 1 hour / week = 52 hours / year = 2.2 day investment ● Time is your most valuable asset ● Save your time by using Scala 12「 calaを使って」時間を節約するための「小さな投資」をはじめよう

  13. 13. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. ● prestop (presto + top) ● Non production service code ● A handy query monitoring tool for Presto, written in Scala ● Display complex JSON data with fancy ANSI color The First Scala Code in TD 13reasure Data最初の calaプログラム

  14. 14. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. airframe-log ● Scala 2.10: My small investment to test Scala Macros and String interpolation ● A Modern Logging Library for Scala (at Medium) ● ANSI color and source code location display ● Just add LogSupport trait to your class 14プログラムの開発をログメッセージで効率化する

  15. 15. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. airframe-launcher ● Needed to handle complex command line options and nested commands ● e.g., $ prestop -e production monitor (other options …) ● Enabled annotation-based command line definitions 15複雑なコマンドラインプログラムを簡単に作成できるようにする

  16. 16. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. airframe-config: Application Configuration Flow ● YAML config (embedded into Docker) ● Override credentials, then bind to config objects YAML development: addr: api-dev.com production: addr: api.com Config Object case class ServerConfig( addr: String, port: Int = 8080, password: String ) production: addr: api.com command: -e production Credentials and Local Configurations Merge Immutable Object Default Parameters (e.g., port = 8080) Object Mapping 16アプリケーション設定のフローをライブラリ化
 airframe-launcher > _
  17. 17. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. sbt-pack plugin ● A sbt plugin to create standalone Scala packages ● A single folder package with bin and lib folders containing all dependent JARs ● Generates command-line launcher scripts ● My small investment in 2012 to save packaging time 17sbt-packでプログラムをパッケージングし、Dockerイメージを手軽に作成
 airframe-launcher airframe-config YAML config file Standalone Scala Package sbt-pack Dockerfile
  18. 18. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Medium-SIze Investment: Find A Common Pattern ● Extract a common problem pattern and create a solution ● Data -> Object Mapping ● How many data readers and object mappers do we need? ● How can we save our time for handling such various data types? YAML JDBC ResultSet YAML Parser + Object Mapper Config Object Table Object Object-Relation Mapper JSON JSON Parser + Object Mapper Object 18入力データを cala bjectにマッピングしたいケースは多い。中期的な投資が必要

  19. 19. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. airframe-msgpack: MessagePack as Universal Data Format ● MessagePack (msgpack.org) ● Compact JSON-like binary format ● Describes data types and data values at the same time (self-describing) Object Unpack Pack JDBC ResultSet Pack/Unpack YAML JSON 19essage ackを中間フォーマットとして使うと、オブジェクトマッパーの実装は1つに
 MessagePack
  20. 20. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. PlazmaDB: MessagePack DBMS ● Fluentd -> MessagePack -> Arm Treasure Data ● Automatically generating table schema from MessagePack data ● Apply schema–on-read for providing table data for Presto/Hive/Spark, etc. Table Schema Int Column Reader String Column Reader Update Schema Generate Reader Set Table Reader Schema-free Data 20 Data Collection Distributed Data Processing Arm reasure Dataは essage ackベースの chema-on-readシステム

  21. 21. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Schema-On-Read Data Processing with MessagePack ● Users can store arbitrary typed data (No table design is required) ● Data can be read in a target type required by the application (e.g., SQL query) Int Float Boolean String Array Map Binary SQL BigInt parseInt toInt 0 or 1 IntCodec Pack Unpack Error or null “100” (string) 100 (int) 100 (int) 21 Logs データ読み込み時に、アプリケーションの要求する型に合わせる ( chema-on- ead)
 CSV command-line arguments
  22. 22. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. airframe-codec: Schema-On-Read Pack/Unpack Interface ● Apply schema-on-read for Scala objects Input MessagePack Output Pack Unpack PackUnpack 22essage ackを通した chema-on-readデータ変換インターフェースを calaに適用

  23. 23. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Pre-defined Codecs in airframe-codec ● Primitive Codecs ● ByteCodec, CharCodec, ShortCodec, IntCodec, LongCodec ● FloatCodec, DoubleCodec ● StringCodec ● BooleanCodec ● TimeStampCodec ● Collection Codec ● ArrayCodec, SeqCodec, ListCodec, IndexSeqCodec, MapCodec, etc. ● OptionCodec ● JsonCodec (airframe-json) ● Java-specific Codec ● FileCodec, ZonedDateTimeCodec, JDBCResultSetCodec, etc. ● Adding Custom Codecs ● Implement MessageCodec[X] interface 23calaで必要なほぼ全てのデータ型へのマッピングをサポート

  24. 24. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. MessageCodec.of[A]: Combination of Codecs Unpack Pack IntCodec StringCodec DoubleCodec MessagePack MessageCodec.of[A] 24オブジェクトの型に合わせてCodecを合成

  25. 25. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. airframe-surface ● Reading Type Signatures From ScalaSig ● Scala compiler embeds Scala Type Signatures (ScalaSig) to class files ● Surface.of[A] ■ returns A’s parameter names and types class A (data:List[B]) class A data: List[java.lang.Object] class A data: List[java.lang.Object] ScalaSig: data:List[B] javac scalac Surface.of[A] data: List[B] scala.reflect.runtime.universe.TypeTag Type erasure removes generic type information 25オブジェクトの型情報を cala igから取得する

  26. 26. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. [WIP] Scala.js RPC ● Scala.js ● Compiling Scala code into JavaScript for Web Browsers ● airframe-codec: Passing model class data between Scala and Scala.js UserInfo MessagePack UserInfo Pack Unpack PackUnpack Scala Server Side Scala.js Client Side XML RPC 26airframe-codecは cala.js(ブラウザ側)とのデータ受け渡しにも使える

  27. 27. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. [WIP] airframe-sql ● Universal stream SQL engine ● Processing various types of data through MessagePack MessagePack Stream SQL MessagePack Query Processing Filter/Aggregation/Join, etc. 27任意のデータ形式に対し、 essage ackを通して で処理をする
 JDBC ResultSet Pack YAML JSON
  28. 28. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 28 Scala In Production
  29. 29. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. A Technical Debt In TD (2015-2016) ● Prestogres: PostgreSQL gateway to Presto ● Enabled using PostgreSQL JDBC/ODBC drivers to access Presto ● So-called Sada (founder)’s magic ● Was good for the first use cases ● Many Problems: ● Hacks around pgpool-II was hard to debug ● Hard to support customers upon errors ● Incompatible SQL with Presto ● Nobody could fix these issues ■ including the creator! 29restogresというハックが技術的負債になっていた

  30. 30. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Replacing Prestogres with Prestobase 30calaで restobaseのプロトタイプを作成. 3ヶ月後にサービスリリース
 ● Prototyped in Scala within a week after a quick chat with Sada ● Utilizing Airframe assets ● Deployed as a production service in 3 months
  31. 31. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. airframe-di ● Created a dependency injection library for Scala ● For Prestobase development ● Scala-friendly Syntax ● Useful for combining hundreds of modules ● based on airframe-surface, airframe-log ● See also: ● Airframe Meetup #1 Report (2018) 31restobaseの開発中に calaのためのAirframe DIが誕生

  32. 32. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Airframe OSS ● Lightweight Building Blocks for Scala ● Collection of our investments to Scala ● Repackaged into wvlet.airframe in 2016 ● airframe-log ● airframe-launcher ● airframe-config ● airframe-surface ● airframe-di ● airframe-codec ● ... ● As of 2019, Airframe has 20 modules ● 35+ releases in 2018 ● Already had 17+ releases in 2019 ● Contributing to the Scala Community Build ● To test the latest Scala versions 322016年に各種ツールをAirframeとして統合。20のモジュール、頻繁なリリースサイクル
 Airframe
  33. 33. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Monorepo ● Cross build ● For 3 + 1 Scala versions ■ 2.13, 2.12, 2.11, and Scala.js ● 20 modules ■ 4 x 20 = 80 artifacts! ● Challenge ● Publishing took 3 hours with sbt-release ● Bottleneck ● Sequential run of compile -> test -> publish for all artifacts 33Airframeはメンテナンスを集約するため単一レポジトリ構成

  34. 34. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Release Automation on Travis CI ● Single-Step Release ● Triggered by git tag ● Running Tasks In Parallel ● Run tests for each Scala version ● Update doc & release notes ■ Generate release notes from git logs ● Publish ■ sbt-pgp & sbt-sonatype ○ GPG signature ○ Copy to Maven Central ● Finishes around 10~20 minutes ● Blog: 3 Tips For Maintaining Scala Projects 34ravis CI上でリリースを全自動化し、頻繁なリリースを可能に

  35. 35. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. sbt-sonatype plugin ● A sbt-plugin for releasing projects to Maven Central ● open staging repository -> verify -> close -> promote -> drop ● A small investment ● At 2015 new year holiday => Payed off for saving Airframe release time ● 3000+ Scala projects are using sbt-sonatype 35sbt-sonatypeはお正月休みに作られたプロジェクト。多くの calaライブラリで使われている

  36. 36. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. airframe-http ● Created a simple HTTP framework ● Based on Airframe modules: ■ airframe-surface ■ airframe-codec ■ airframe-msgpack ■ etc. ● Blog ● Building Low-Friction Web Service Over Finagle ● Save the time for choosing a web framework: ● Many frameworks exist: ● e.g, Finatra, Finch, akka-http, spring, RESTeasy, open-api, swagger, etc. 36Airframe資産を活用して、Webフレームワークも手軽に作成

  37. 37. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. airframe-http-client ● Error handling of HTTP requests is difficult ● 4xx, 5xx status code ● Should we retry the request? ■ IOException, EOFException ■ TimeoutException ■ InterruptedException ■ SSLException ■ InvocationTargetException ● HTTP client ● request retries ● response mapping ■ JSON, MessagePack format ● airframe-codec 37間違いやすいH リクエストのエラーハンドリングをライブラリ化

  38. 38. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. airframe-control ● Everything can fail … ● Network disconnection ● Servere crash ● ... ● Retry ● Exponential backoff ■ 2x, 4x, ... ● Jittering ■ 1 sec., 2 * rand, 4 * rand, … ● Customize error type classifiers ● retryable failures ● non-retryable failures 38リトライ処理をパターン化

  39. 39. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. airframe-http-recorder ● Testing against actual web services is time consuming ● Record & Replay HTTP responses ● Reproducible results ● Runnable on small machines (e.g., Travis CI) 39H リクエストをレコーディングして、Webサービスのテストを効率化する
 HTTP Request HTTP Recorder Request Real Web Service Recording Mode: Response HTTP Request HTTP Recorder Replay Mode: Request Response Recording Responses Request Recorded Responses
  40. 40. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 40 Data Analysis with Scala
  41. 41. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Data-Driven System Optimization ● TD is one of the biggest users of TD ● Query logs ● Collecting all Presto query logs since 2015 ● Query statements, performance statistics, logs, etc. ● Logs are our valuable assets ● To understand user activities and enable data-driven optimizations 41 Logs User Query Collect Query Logs Analyze Query Logs Machine Learning Query Optimization Optimize System システムの最適化のためにログの収集、解析が重要

  42. 42. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. airframe-fluentd ● Collect Scala Application Logs To Fluentd ● Scala Objects -> MessagePack -> Fluentd 42essage ackを受け取るFluentdには、airframe-codeの出力を渡せる
 Collect Query Logs Analyze Query Logs Machine Learning Query Optimization Optimize System airframe-fluentd Scala Objects airframe-codec
  43. 43. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. airframe-jmx ● Add @JMX annotation to your application metrics ● It’s also useful to check the application version, configurations, etc. ● JMX clients can check these metrics ● e.g., jconsole 43J Xで、JV の外側からアプリケーションの状態を確認し、メトリックを収集

  44. 44. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. airframe-metrics ● Human Readable Data Format (ElapsedTime, DataSize, etc.) ● Handy Time Window String Support 44時間幅、区間、データサイズを人間を扱いやすい形式にし、ログの解析を効率化

  45. 45. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Taking Snapshots of Data Analysis Tasks ● Save Long-Running Task Results As MessagePack (binary) ● Save the cost of re-computation Result: Seq[A] MessagePack Storage Pack Save Unpack Task Run Load Second Run: Load Compute (e.g., 10 min) First run Snapshot 45Airframe資産を活用して、データ解析結果をキャッシュし作業を効率化する

  46. 46. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Module Mix-InPackaging HTTP Requests and Responses Data airframe-launcher > _ airframe-log production: port: 10010 user: xxxx ... airframe-config airframe-codec sbt-pack airframe-fluentd Scala Objects Table Data (CSV, TSV) JSON airframe-jsonairframe-surface airframe-tablet airframe-jmx Monitor Runtime States Generate Mapping Codec Metrics & Log Data JDBC ResultSets airframe-jdbc airframe-http airframe-http-finagle Launch HTTP Services airframe DI Debug Logs Schema-On-Read Mapping Airframe Airframeを中心にコード資産が形成されている

  47. 47. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Resolving Technical Debts with Airframe Upgrade ● Migrate common programming patterns into Airframe ● Upgrade Airframe Version ● YY.MM.patch versioning: 19.5.x, 19.6.x, … ■ Easy to see how behind the project is from the latest version. ● Reduce code and logic duplications across components 47 Knowledges Experiences Design Decisions Products 24/7 Services Business Values Programming OSS Outcome Airframeをアップグレードする際に技術的負債を解消していく
 Airframe
  48. 48. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Scala At Arm Treasure Data ● Scala is now an official language at Arm Treasure Data ● 0 -> 10+ engineers who can write Scala ● Use cases are growing: ● Query optimization, API, Spark, data analysis, storage systems, service operation, etc. ● We are happy to share our Scala assets through Airframe! 48 Add Your GitHub Star! wvlet/airframe Airframe calaエンジニアが充実してきたArm reasure Data。 calaの適用範囲も広がっている

  49. 49. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Presto Conference Tokyo 2019 ● July 11 (Thu), 2019, 13:30 ~ (Free) ● https://techplay.jp/event/733772 ● Inviting Presto Creators (Martin, Dain, David) ● Presto Software Foundation ● Talks from big Presto users in Japan ● Yahoo! JAPAN, LINE, Arm Treasure Data ● Presto Source Code Navigation 49 resto Conference okyo 2019を7/11(木) 13:30~より開催 (参加無料)

  50. 50. Confidential © Arm 2017Confidential © Arm 2017Confidential © Arm 2017 Thank You! Danke! Merci! 谢谢! ありがとう! Gracias! Kiitos! 50

×