Large scale log pipeline using Apache Pulsar_Nozomi

Large scale log pipeline using
Apache Pulsar
Yahoo Japan Corporation
Nozomi Kurihara
June, 18th, 2020

Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 2
Who am I?
Nozomi Kurihara
• Software engineer at Yahoo! JAPAN (April 2012 ~)
• Working on internal messaging platform using Apache Pulsar
• Committer of Apache Pulsar
• (Hobby: Board / video games!)

Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved.
Agenda
3
1. Apache Pulsar at Yahoo! JAPAN
- About Yahoo! JAPAN
- Why Pulsar was chosen
- Architecture and performance
- Use cases
2. Large scale log pipeline

Apache Pulsar at Yahoo! JAPAN

Yahoo! JAPAN
https://www.yahoo.co.jp/

Yahoo! JAPAN – 3 numbers
100+ 150,000+ 49,010,000+
image: aflo
login users per month
(2019/06)
servers
(real)
services

Pulsar at Yahoo! JAPAN
• We use Apache Pulsar as a centralized messaging platform for 3.5 years
• 1 Pulsar maintainer team and a lot of teams (services) use Pulsar as a “tenant”
Producer
Service A
Consumer
Producer Consumer
Producer Consumer
Topic B
Topic A
Pulsar team
Pulsar
Service B
Service C
Topic C

Pulsar at Yahoo! JAPAN - Users
More and more services start to use Pulsar!
• 270+ tenants
• 4400+ topics
• ~50K publishes/s
• ~150K consumes/s
Typical use cases:
• Notification
• Job queueing
• Log pipeline

Pulsar community in Japan
TechBlog
- https://techblog.yahoo.co.jp/entry/20200312818173/
Apache Pulsar Meetup Japan (in Tokyo)
- https://japan-pulsar-user-group.connpass.com/

Why Pulsar was chosen

Why did Yahoo! JAPAN choose Pulsar?
Large number of customers
Large number of services
Sensitive/mission-critical messages
Multiple data centers
→ High performance & scalability
→ Multi-tenancy
→ Security & Durability
→ Geo-replication
Pulsar meets all requirements!

Multi-tenancy
Share 1 Pulsar with all YJ services → low hardware and labor costs
Service A
MQ ConsumerProducer
Service B
MQ ConsumerProducer
Service C
MQ ConsumerProducer
Service A
topic ConsumerProducer
Service B
Service C
Pulsar team

Multi-tenancy – self-service
Users can create/configure/delete their topics by themselves
→ management of topics is delegated to users
Internal Web UI tool to manage topics (will be replaced with pulsar-manager):
Create tenant
Create namespace See topic stats

Architecture and performance

East
Broker
Bookie ZK
WebSocket
Proxy
15
Clusters in Yahoo! JAPAN
West
Broker
Bookie ZK
WebSocket
Proxy
Geo-replication
For each cluster:
• 20 WS proxies
• 15 Brokers
• 10 Bookies
• 5 ZKs

Performance – experimental settings
CPU Memory Disk NIC
Broker 2.00GHz / 2CPU 768GB SATA SSD 240GB x2(RAID1) 10GBaseT
Bookie 2.00GHz / 2CPU 768GB Journal: SATA SSD 240GB x2(RAID1)
Ledger: SATA HDD 10TB x12(RAID1+0)
10GBaseT
• Pulsar version: 2.3.2(Broker) / 2.4.1(Client)
• Tool: openmessaging-benchmark
• Message size: 1 KB
• partition: 1, 16, 32
• rate(attempted): 100000, 500000
• Server spec:

Performance – experimental results
- 16, 32 partitions achieves 500,000 msg/s whereas 1 partition does not
- max publish rate with 1 partition looks 200,000 msg/s

Tuning example (Bookie)
Problem:
• More users increases, more writes to SSD
• That reduces lifespan of SSD (actually we saw frequent failure of SSD)
Solution:
Increase journalMaxGroupWaitMSec from 1 to 2
→ Write decreased by 30% at the sacrifice of the least latency
CPU Memory Disk NIC
Broker 2.00GHz / 2CPU 768GB SATA SSD 240GB x2(RAID1) 10GBaseT
Bookie 2.00GHz / 2CPU 768GB Journal: SATA SSD 240GB x2(RAID1)
Ledger: SATA HDD 10TB x12(RAID1+0)
10GBaseT

Use cases

Case 1 – Notification of contents update
Various contents files pushed from partner companies to Yahoo! JAPAN
Notification sent to topic when contents are updated
Once services receive notification, fetch contents from file server
Producer
Consumer
Topic
Service A
Pulsar
①send notification
③fetch content files
Consumer
Service B
Consumer
Service CPartner
Companies
weather, map, news etc.
FTP server
ftpd
②receive notification

Case 2 – Job queuing in mail service
Asynchronously execute heavy jobs like indexing of mail
Producers register jobs to Pulsar
Consumers take jobs from Pulsar at their own pace
Producer
Consumer
Producer
Topic Handler for indexing
Mail BE server
Mail BE server
Pulsar
request
Register a job
Re-register if it fails
Take and process a job

Case 3 – Kafka alternative
We have an internal FaaS system using Apache OpenWhisk
Problem: FaaS team had to maintain Apache Kafka
Solution: migrate from Kafka to our internal Pulsar
Pulsar Kafka Wrapper needs only a few configuration changes (.pom, topic name, etc.)
<dependency>
- <groupId>org.apache.kafka</groupId>
- <artifactId>kakfa-clients</artifactId>
- <version>0.10.2.1</version>
+ <groupId>org.apache.pulsar</groupId>
+ <artifactId>pulsar-client-kafka</artifactId>
+ <version>2.4.0</version>
</dependency>

Large scale log pipeline

Situation
…
Service developers
deploy
monitor
logs/
metrics
PaaS CaaSFaaS

Yamas
• Metrics monitoring / alerting platform (SaaS)
• Originally developed in Verizon media
• Will be open-sourced soon!

Scale
• Amount of total logs: 1.4~3.8 TB/h
• Peek traffics: 10+ Gbps
• Number of PFs will increase more and more

Legacy architecture
Computing PFs
app
PaaS…
…
Monitoring PFs
Splunk
Yamas
Yamas
agent
Splunk
agent
app
app
app
app
CaaS
Yamas
agent
Splunk
agent
app
app
app
L Need to install dedicated “agent” for each Monitoring PFs
L Difficult to scale out
L Traffic spikes directly influence Monitoring PFs

Motivation
Remove dedicated agent for each monitoring PF:
- No need specific knowledge and extra components
- Easier trouble shooting
Decouple sender/receiver PFs by introducing message queueing layer:
- Scalability
- Resiliency

New architecture
Computing PFs
app
PaaS…
…
Monitoring PFs
Splunk
Yamas
Splunk topic
app
app
app
app
CaaS
Pulsar
producer
app
app
app
Pulsar
Yamas topic
Pulsar
producer
Pulsar
consumer
Pulsar
consumer
J Single library
J Easy to scale out
J Traffic spikes are mitigated by queueing layer

Topic design – 3 patterns
PaaS
Pulsar
CaaS
PaaS
CaaS
Splunk
Yamas
①Producer-centric
②Consumer-centric
Messages are filtered/transformed at Consumer-side:
J Producers donʼt care about Consumers
L Consumers care about Producers
Splunk
Pulsar
Yamas
PaaS
CaaS
Splunk
Yamas
Messages are filtered/transformed at Producer-side:
J Consumers donʼt care about Producers
L Producers care about Consumers
③Function
Splunk
Pulsar
Yamas
PaaS
CaaS
Splunk
Yamas
Messages are filtered/transformed at Function-side:
J Both Producers and Consumers donʼt care about each other
L Extra loads: traffic, computing, storage etc.
PaaS
CaaS
func

Topic format and message format
{consumer_pf}/{region}/{message_type}-{num}
splunk/west/log-0
Pulsar (west)
yamas/west/metric-0
splunk/west/log-1
splunk/west/metric-0
……
splunk
yamas
…
west
east
log
metric
…
splunk/east/log-0
Pulsar (east)
yamas/east/metric-0
splunk/east/log-1
splunk/east/metric-0
………
{
"time": "2018-10-25T08:36:47.000Z",
"producer": "paas-producer.example.com",
"origin": "app.space.org.cluster.dc.nwseg",
"domain": "paas",
"body": {
"message": "hello splunk”,
…
}
}
Pulsar
producer

Use case: Pulsar stats on Yamas
YamasPulsar
Yamas topic
Pulsar
producer
/admin/v2/broker-stats/topics

Conclusion

Conclusion
34
Conclusion:
• Yahoo! JAPAN uses Pulsar as a centralized platform for various services
• Recently we start to use Pulsar as a large scale log pipeline where
computing PFs publish their logs/metrics and monitoring PFs consume
• Pulsar plays an important role to connect various PFs and make whole
system scalable and resilient
Future plan:
• More Producer PFs and Consumer PFs
• Visualize SLI (message delivery rate, latency etc.)

Large scale log pipeline using Apache Pulsar_Nozomi

Large scale log pipeline using Apache Pulsar_Nozomi

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Large scale log pipeline using Apache Pulsar_Nozomi

Similar to Large scale log pipeline using Apache Pulsar_Nozomi (20)

More from StreamNative

More from StreamNative (20)

Recently uploaded

Recently uploaded (20)

Large scale log pipeline using Apache Pulsar_Nozomi