Why Micro Focus Chose Pulsar for Data Ingestion - Pulsar Summit NA 2021

Pulsar Virtual Summit North America 2021
Why Micro Focus Selected Pulsar for
Data Ingestion?

Srikanth Natarajan
Fellow, CTO, IT Operations Management
Product Group, Micro Focus
Speaker Bio
Srikanth Natarajan is an experienced technology leader in the IT
Operations Management software industry, a Micro Focus Fellow and
CTO for the ITOM Product Group, and a former HP Distinguished
Technologist. He has engineered many successful products, led multiple
architectural transformations of products, and most recently, led the
transformation of a major product portfolio into a modern
containerized/cloud native architecture. He has also been responsible for
the recent market introduction of the OPTIC Data Lake from Micro
Focus. He has been granted over 20 patents for his contributions to
various inventions. He lives in Fort Collins, Colorado.

Agenda
This session will cover the experience of Micro Focus in
consuming from and contributing to Apache Pulsar, the lessons
learned, and the collaboration with a development support partner
to help us in the journey.

Micro Focus IT Operations Management
(ITOM) Context
• Large portfolio of operations management products generating many forms of
data
• Had a need to process and store a variety of operational data in near real time
• Data is time series primarily, structured and semi-structured
• Data needed to be processed while in motion and also at rest
• Micro Focus has Vertica1 technology for long term storage/analytics but
needed a streaming engine as well for real time transport and analysis of
data
1https://www.vertica.com/

Our Technical Requirements of Streaming
Engine
• Enterprise/SaaS ready
• Scalable, multi-tenant, durable, and extensible, and easy to productize
• Observable
• Low latency and high throughput across a variety of data
• Tiered storage support
• Easier to deploy and operate in production in a container form running in a
Kubernetes cluster without any professional services support
• Ready to use, simple to deploy and easy to operate in both cloud and on-
prem
Apache Pulsar Provided Us A Great Start. We Integrated it with Vertica
and Created the Micro Focus OPTIC2 Data Lake
2https://community.microfocus.com/it_ops_mgt/b/sws-571/posts/announcing-optic---the-operations-
platform-for-transformation-intelligence-and-cloud

Vertica Cluster
BI Tools
Kubernetes Cluster
1
Data Input
(streaming)
ITOM Capabilities
(ITOM internal)
REST API Layer for data access
REST API
Messaging Bus
(Pulsar based)
Scalable | Durable | Available | Observable
| Multi-tenancy | ...
Brokering
Storage
Big Data | Analytics |
Database
Streaming Pipeline
Express Pipeline
2
Batch Input
(Express Load)
Bulk load
(S3 based)
Object Store | Amazon S3 Compatible
Data Processing
(Flink based)
Baselining Forecasting Aggregation
Scalable | Extensible| Distributed
Advanced Event Correlation
processing
data in motion
processing
data at rest
High Level Architecture of Our OPTIC Data Lake

Vertica Ingestion Micro Service
Pulsar Proxy
Vertica
Broker
7. Read messages (
Reader API) and Store
in DB
6. Invoke COPY(Load)
command
Vertica Scheduler
5. Get Backlog
Bookkeeper
Zookkeeper
Administration
4. Push
Configuration
9. Update Cursor
of subscription
Pulsar
Client
Data Sources
HTTP
Receiver
8. Send load Status
Config
Client
1. Configure
Streaming
2. Create Topic and
Subscription
3.. Stream
Data

Scheduler Message Streaming Overview
Administration Service
Scheduler for Pulsar
Streaming
Pulsar Message Bus
0. Get Config
2. Get Backlog
Vertica 3 Node Cluster
Vertica Node 1
Data
UDx (reader)
Vertica Node 2
Data
UDx (reader)
Vertica Node 3
Data
UDx (reader)
Producer 1
Data Collector
Producer 2
Data Collector
Producer n
Data Collector
Receiver
Reader
Pulsar readers are message processors
much like Pulsar consumers but with
two crucial differences:
• you can specify where on a topic
readers begin processing messages
(consumers always begin with the latest
available unacked message);
• readers don't retain data or
acknowledge messages.
3. Scheduler periodically (Frame
by Frame) schedules the µBs
(micro batches) COPY commands
and asks UDX to read the
messages from pulsar for a
TOPIC
1. Message Ingestion
UDx: User-Defined Extensions

9
Use Case: Event Correlation Using Pulsar and Flink
Flink Job
Manager
Flink Task Manager
Auto Event
Correlation
Pulsar
Sink
Connector
Pulsar
Source
Connector
Vertica Database
Apache Pulsar
Raw Event
Topic
Data
Source
s
HTTP
Receiv
er
Raw Event
Correlated
Event
Correlated
Event Topic
Internal Notification
Service
Micro Focus
Operations Bridge
Manager ( Event
Manager)
ML Artifacts

● We verified in our lab today ingesting streams at approx. 100MB/s or
360 GB/Hr or 8.64 TB/day with our default set up of Kubernetes
worker nodes
● We plan to double this in the near future
● With the linear scalability we should be able support much more with
additional resources -added proportionally.
10
Our Results

● Issues Resolved – across different versions of connector and Pulsar
○ Data loss observed when using Pulsar Flink connector ( three releases had this issue with
multiple root causes)
○ Resolution for Thread leaks in Pulsar Flink connector
○ Flink connector stops streaming if topic is recreated or when state is restored in
Backup/Restore and DR scenarios
○ Security fixes for all the STAT issues observed ( last two releases with quick turnaround )
○ NullPointer Exception in Flink Connector
● Additional Help/Tools
○ Dynamic linking of libraries in Pulsar C++ client
○ Multiple CA certificate support in Pulsar
○ Formula to compute storage given ingestion rate
○ Performance tuning across cloud and on-prem deployment
○ State Migration Utility from SN helped to migrate while upgrading from Flink 1.9 to Flink
1.11
11
Support from Stream Native

● NullPointerException in Broker, workaround provided yet to validate
● Backup/Restore and DR Use case not working when using Flink
connector 2.4.28.4
● ER: Different TLS certificate configuration for Geo replication. PR
already created by Sijie (https://github.com/apache/pulsar/pull/10710)
12
Current Open Issues
Note: In the session video recording, there was a reference to a Micro Focus internal page that contains the
issues listed above. It was recorded in error. Please ignore that aspect when you listen to the recording.

Thank You.

Why Micro Focus Chose Pulsar for Data Ingestion - Pulsar Summit NA 2021

More Related Content

What's hot

Similar to Why Micro Focus Chose Pulsar for Data Ingestion - Pulsar Summit NA 2021

More from StreamNative

Recently uploaded

Why Micro Focus Chose Pulsar for Data Ingestion - Pulsar Summit NA 2021

Editor's Notes