- SmartNews uses stream processing to deliver news quickly as the lifetime of news articles is very short. Kinesis Streams play an important role in processing user activity streams and metrics in near real-time.
- Data is ingested using Kinesis Producer and Consumer Libraries and processed using Spark Streaming to generate metrics for ranking articles. Metrics are stored in DynamoDB.
- An ETL workflow is used to transform log data and perform machine learning tasks to cluster users. PipelineDB is also used for real-time analytics on streams.
SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデ...SmartNews, Inc.
This document discusses the data management platform (DMP) used for ad targeting and delivery in SmartNews Ads. The DMP collects, cleans, and aggregates over 14 million user profiles and ad data from multiple sources. It uses this first-party data to perform user clustering, CTR and CVR prediction using machine learning models, and lookalike targeting. Future work may include targeting based on user interests and collecting negative feedback to optimize the user experience.
SmartNews has evolved its use of AWS over time from a monolithic application to microservices as its scale increased. It now uses over 300 EC2 instances, 80 ELBs, and many other AWS services. Configuration management has moved from pull-style deploys to using tools like CodeDeploy, Auto Scaling Groups, and infrastructure as code. Future plans include further containerization and event aggregation to improve scalability, safety, and measureability across services.
This document discusses Apache Spark on EMR and best practices for using Spark. It introduces the speaker and their experience with Spark at SmartNews. It then covers recent Spark updates, how SmartNews uses Spark for tasks like AD targeting and recommendation, and 10 best practices for using Spark on EMR like running Spark on Yarn, tuning memory settings, minimizing data shuffle, and using dynamic scaling with Spark Streaming.
- SmartNews uses stream processing to deliver news quickly as the lifetime of news articles is very short. Kinesis Streams play an important role in processing user activity streams and metrics in near real-time.
- Data is ingested using Kinesis Producer and Consumer Libraries and processed using Spark Streaming to generate metrics for ranking articles. Metrics are stored in DynamoDB.
- An ETL workflow is used to transform log data and perform machine learning tasks to cluster users. PipelineDB is also used for real-time analytics on streams.
SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデ...SmartNews, Inc.
This document discusses the data management platform (DMP) used for ad targeting and delivery in SmartNews Ads. The DMP collects, cleans, and aggregates over 14 million user profiles and ad data from multiple sources. It uses this first-party data to perform user clustering, CTR and CVR prediction using machine learning models, and lookalike targeting. Future work may include targeting based on user interests and collecting negative feedback to optimize the user experience.
SmartNews has evolved its use of AWS over time from a monolithic application to microservices as its scale increased. It now uses over 300 EC2 instances, 80 ELBs, and many other AWS services. Configuration management has moved from pull-style deploys to using tools like CodeDeploy, Auto Scaling Groups, and infrastructure as code. Future plans include further containerization and event aggregation to improve scalability, safety, and measureability across services.
This document discusses Apache Spark on EMR and best practices for using Spark. It introduces the speaker and their experience with Spark at SmartNews. It then covers recent Spark updates, how SmartNews uses Spark for tasks like AD targeting and recommendation, and 10 best practices for using Spark on EMR like running Spark on Yarn, tuning memory settings, minimizing data shuffle, and using dynamic scaling with Spark Streaming.
20. public interface Processor<T, R> extends Subscriber<T>, Publisher<R> {}
Reactive Streams
Processor
Component A Component C
Component B
Subscriber of A / Publisher of C
Processor<T1, T2>
T1 T2
41. RxJava1Adapter
RxJava 1 系 の Completable / Single / Observable と Reactor の
Mono / Flux の相互変換アダプタ
Reactor
from RxJava to RxJava
No value completableToMono publisherToCompletable
Single value singleToMono publisherToSingle
Multiple Values observableToFlux publisherToObservable
46. Spring Initializr で手軽に試すことができる
Dependencies から Reactive Web を選択 (Spring Boot のバージョンは 1.4.1 (SNAPSHOT))
Reactor and Spring 5
ref: SPRING INITIALIZR bootstrap your application now
https://start.spring.io/
48. Reactor の各 API を理解するためのハンズオン
Part 1 から 9 まで、各 API を利用した JUnit テストが書かれている
それをグリーンにしながら進めていく
Reactor API Hands-on
ref: Lite Rx API Hands-on
https://github.com/reactor/lite-rx-api-hands-on
Part 1 Flux の作成
Part 2 Mono の作成
Part 3 値の変換
Part 4 Flux のマージ
Part 5 リクエスト
Part 6 その他の操作
Part 7 Reactive -> Blocking 処理の変換
Part 8 RxJava との相互運用
Part 9 Blocking -> Reactive 処理の変換
50. Part 1 Flux の作成
Part 2 Mono の作成
Part 3 値の変換
Part 4 Flux のマージ
Part 5 リクエスト
Reactor API Hands-on
Part 6 その他の操作
Part 7 Reactive -> Blocking 処理の変換
Part 8 RxJava との相互運用
Part 9 Blocking -> Reactive 処理の変換
53. Part 1 Flux の作成
Part 2 Mono の作成
Part 3 値の変換
Part 4 Flux のマージ
Part 5 リクエスト
Reactor API Hands-on
Part 6 その他の操作
Part 7 Reactive -> Blocking 処理の変換
Part 8 RxJava との相互運用
Part 9 Blocking -> Reactive 処理の変換
56. Part 1 Flux の作成
Part 2 Mono の作成
Part 3 値の変換
Part 4 Flux のマージ
Part 5 リクエスト
Reactor API Hands-on
Part 6 その他の操作
Part 7 Reactive -> Blocking 処理の変換
Part 8 RxJava との相互運用
Part 9 Blocking -> Reactive 処理の変換
59. Part 1 Flux の作成
Part 2 Mono の作成
Part 3 値の変換
Part 4 Flux のマージ
Part 5 リクエスト
Reactor API Hands-on
Part 6 その他の操作
Part 7 Reactive -> Blocking 処理の変換
Part 8 RxJava との相互運用
Part 9 Blocking -> Reactive 処理の変換
63. Part 1 Flux の作成
Part 2 Mono の作成
Part 3 値の変換
Part 4 Flux のマージ
Part 5 リクエスト
Reactor API Hands-on
Part 6 その他の操作
Part 7 Reactive -> Blocking 処理の変換
Part 8 RxJava との相互運用
Part 9 Blocking -> Reactive 処理の変換