Finagle and Java Service Framework
@pavan
#service-framework
● Finagle Overview
● Programming with Futures
● Java Service Framework
Agenda
● Asynchronous, protocol-agnostic RPC framework for JVM languages
● Provides async client/server abstractions and hides low-level network details
of Netty
● Out of the box support for protocols such as HTTP, Thrift, Redis, Memcached
but can be extended
● Implemented in Scala
● Open-sourced by Twitter
Part 1. What is Finagle?
Blocking/Non-blocking/Asynchronous APIs
Blocking:
read(fd, …) // Thread is blocked until some data is available to read
Non Blocking:
read(fd, …) // Returns immediately either with data or EAGAIN
Asynchronous:
read(fd, callback(data),...)
Finagle Architecture
Netty:
Event-based
Low-level network I/O
Finagle:
Service-oriented,
Functional abstractions
[3] Thanks @vkostyukov for the diagram
Server Modules
● Servers are simple and optimized for high-throughput
Client Modules
Clients are tricky and have with the bulk of fault-tolerance logic. By default, they
are optimized for high success rate and low latency.
● Designed for handling workloads that have a mix of Compute and I/O.
● Each server can handle thousands of requests.
● Uses just two threads per core (Netty’s default, but its configurable).
How does it scale?
Programming with Futures
Part 2. Programming with Futures
● What is a Future?
○ A container to hold the value of async computation that may be either a
success or failure.
● History - Introduced as part of Java 1.5 java.util.concurrent.Future but had
limited functionality: isDone() and get() [blocking]
● Twitter Futures - Are more powerful and adds composability!
● Part of util-core package and not tied to any thread pool model.
Futures - continued
It’s a simple state machine
Pending
Failed
Succeeded
Done
How to consume Futures?
● Someone gives you a future, you act on it and pass it on (kinda hot potato)
Typical actions:
● Transform the value [map(), handle()]
● Log it, update stats [side-effect/callbacks - onSuccess(), onFailure()]
● Trigger another async computation and return that result [flatmap(), rescue()]
Most of the handlers are variations of the basic handler transform()
Future<B> transform(Function<Try<A>, Future<B>>);
Example 1
The backend I am calling returns an int, but I need to return a
string to my caller. What do I use?
Example 1
The backend I am calling returns an int, but I need to return a
string to my caller. What do I use?
Answer: map!
public Future<String> foo() {
return backend.foo().map(new Function<Integer, String>() {
public String apply(Integer i) {
return i.toString();
}
});
}
Example 1
The backend I am calling returns an int, but I need to return a
string to my caller. What do I use?
Answer: map!
import static com.twitter.util.Function.func;
public Future<String> foo() {
return backend.foo().map(func(i -> i.toString()));
}
Example 2
I consult a cache for a value, but on a miss, need to talk to a
database. What do I use?
Example 2
I consult a cache for a value, but on a miss, need to talk to a
database. What do I use?
Answer: flatmap!
public Future<Value> fetch(Key k) {
return cache.fetch(k).flatmap(
new Function<Value, Future<Value>>() {
public Future<Value> apply(Value v) {
if (v != null) return Future.value(v);
return db.fetch(k);
}
});
}
Handling Exceptions
● Don’t forget: map/flatmap will only execute for successful
futures
● To deal with exceptions: handle/rescue are the analogous
equivalent
Future<A> handle(Function<Throwable, A>)
Future<A> rescue(Function<Throwable, Future<A>>)
Example 1
If the backend I am calling throws an exception, I want to return
an error code. What do I use?
Example 1
If the backend I am calling throws an exception, I want to return
an error code. What do I use?
Answer: handle!
public Future<Result> foo() {
return backend.foo().handle(
new Function<Throwable, Result>() {
public Result apply(Throwable t) {
Result r = new Result();
r.setErrorCode(errorCodeFromThrowable(t));
return r;
}
});
Example 2
I consult a cache for a value, but if that failed, need to talk to a
database. What do I use?
Example 2
I consult a cache for a value, but if that failed, need to talk to a
database. What do I use?
Answer: rescue!
public Future<Value> get(Key k) {
return cache.fetch(k).rescue(
new Function<Throwable, Future<Value>>() {
public Future<Value> apply(Throwable t) {
LOG.error(“Cache lookup failed”, t);
return db.fetch(k)
}
});
}
Other handlers
More Sequential composition - join()
Concurrent composition, return after all are satisfied - collect()
Concurrent composition, return if any of the future is satisfied - select()
Finish within in a Timeout: within()
Delayed execuion: delayed()
Common Pitfalls
● Never block on a Future in production code (ok for unit tests)
○ Avoid future.get(), future.apply(), Await.result(future) as it ties up I/O processing threads and
it degrades Finagle’s performance considerably.
○ If you really need to block because you are dealing with synchronous libraries such as jdbc,
jedis use a dedicated FuturePool.
● Avoid ThreadLocal<T>. Use com.twitter.finagle.context.LocalContext instead
● Don't use parallel streams in Java 8
● Request concurrency leak - Never return “null” instead of Future<A>
Future<String> getPinJson(long pinId) {
return null; // This is bad!
// Use, return Future.value(null);
}
Part 3. Java Service Framework Features
Standardized Metrics - per client, per method success/fail counts and latency stats
Logging - slow log, exception log,
Rate limiting - Enforce quotas for clients
Genesis - Tool to generate the required stubs to bootstrap a finagle-thrift service
Warm up hook
Graceful shutdown
You need to enable your options via Proxy builder
ServiceFrameworkProxy<UserService.ServiceIface> serviceFrameworkProxy =
new ServiceFrameworkProxyBuilder<UserService.ServiceIface>()
.setHandler(serviceHandler)
.setServiceName(serviceName)
.setClusterName(serviceName.toLowerCase())
.setServerSetPath(serverSetPath)
.setClientNameProvider(new DefaultClientNameProvider())
.setRootLog(LOG)
.setFailureLog(FAILURE_LOG)
.enableExceptionTypeForFailureCount()
.disableLoggingForThrowable(ClientDiscardedRequestException.class)
.disableThrowablesAsServiceFailure(
Arrays.asList(ClientDiscardedRequestException.class,
DataValidationException.class))
.enableMethodNameForSuccessCountV2()
.enableMethodNameForFailureCountV2()
.enableMethodNameForResponseTimeMetricsV2()
.enableClientNameTagForSuccessCount()
.enableClientNameTagForFailureCount()
.enableClientNameTagForResponseTimeMetrics()
.enableExceptionLog()
.build();
Complaint 1:
● Clients are noticing higher latency or timeouts during deploys or restarts.
First few requests take longer than at steady state due to connection
establishment, Java’s Hopspot JIT etc.
Solution: Use warmUp hook and then join serverset
public static boolean warmUp(Callable<Boolean> warmUpCall)
// By default, invokes warmUpCall 100 times concurrently and expects it succeeds for at least 80%
of the calls
Graceful Shutdown
● Unjoin from serverset, waits for duration/2 secs and then tries to gracefully
shutdown server by draining existing requests within the remaining duration/2
secs
ServiceShutdownHook.register(server, Duration.fromSeconds(10), status)
public static void register(final Server server, final Duration gracePeriod,
final ServerSet.EndpointStatus endpointStatus)
Complaint 2:
Client is seeing rate limiting Exceptions even though rate limits are set to a high
value
Happens if the the server cluster is huge and the local rate limit per node becomes
small and the python client is running on few nodes (pinlater, offline job etc)
Solution: Try reducing max_connection_lifespan_ms if its python thriftmux client
Next steps
● Finagle upgrade to 6.43
○ Unlocks Retry Budgets
○ Defaults to P2C Load balancer instead of heap
○ Toggle between Netty3 and Netty4
○ Couple of performance fixes in Future scheduler
○ Many more...
Resources:
“Your server as a function” paper - https://dl.acm.org/citation.cfm?id=2525538
Source code: https://github.com/twitter/finagle
Finaglers - https://groups.google.com/d/forum/finaglers
Blogs:
[1] https://twitter.github.io/scala_school/finagle.html
[2] https://twitter.github.io/finagle/guide/developers/Futures.html
[3] http://vkostyukov.net/posts/finagle-101/
Thank you!

Finagle and Java Service Framework at Pinterest

  • 1.
    Finagle and JavaService Framework @pavan #service-framework
  • 2.
    ● Finagle Overview ●Programming with Futures ● Java Service Framework Agenda
  • 3.
    ● Asynchronous, protocol-agnosticRPC framework for JVM languages ● Provides async client/server abstractions and hides low-level network details of Netty ● Out of the box support for protocols such as HTTP, Thrift, Redis, Memcached but can be extended ● Implemented in Scala ● Open-sourced by Twitter Part 1. What is Finagle?
  • 4.
    Blocking/Non-blocking/Asynchronous APIs Blocking: read(fd, …)// Thread is blocked until some data is available to read Non Blocking: read(fd, …) // Returns immediately either with data or EAGAIN Asynchronous: read(fd, callback(data),...)
  • 5.
    Finagle Architecture Netty: Event-based Low-level networkI/O Finagle: Service-oriented, Functional abstractions [3] Thanks @vkostyukov for the diagram
  • 6.
    Server Modules ● Serversare simple and optimized for high-throughput
  • 7.
    Client Modules Clients aretricky and have with the bulk of fault-tolerance logic. By default, they are optimized for high success rate and low latency.
  • 8.
    ● Designed forhandling workloads that have a mix of Compute and I/O. ● Each server can handle thousands of requests. ● Uses just two threads per core (Netty’s default, but its configurable). How does it scale?
  • 9.
  • 10.
    Part 2. Programmingwith Futures ● What is a Future? ○ A container to hold the value of async computation that may be either a success or failure. ● History - Introduced as part of Java 1.5 java.util.concurrent.Future but had limited functionality: isDone() and get() [blocking] ● Twitter Futures - Are more powerful and adds composability! ● Part of util-core package and not tied to any thread pool model.
  • 11.
    Futures - continued It’sa simple state machine Pending Failed Succeeded Done
  • 12.
    How to consumeFutures? ● Someone gives you a future, you act on it and pass it on (kinda hot potato) Typical actions: ● Transform the value [map(), handle()] ● Log it, update stats [side-effect/callbacks - onSuccess(), onFailure()] ● Trigger another async computation and return that result [flatmap(), rescue()] Most of the handlers are variations of the basic handler transform() Future<B> transform(Function<Try<A>, Future<B>>);
  • 13.
    Example 1 The backendI am calling returns an int, but I need to return a string to my caller. What do I use?
  • 14.
    Example 1 The backendI am calling returns an int, but I need to return a string to my caller. What do I use? Answer: map! public Future<String> foo() { return backend.foo().map(new Function<Integer, String>() { public String apply(Integer i) { return i.toString(); } }); }
  • 15.
    Example 1 The backendI am calling returns an int, but I need to return a string to my caller. What do I use? Answer: map! import static com.twitter.util.Function.func; public Future<String> foo() { return backend.foo().map(func(i -> i.toString())); }
  • 16.
    Example 2 I consulta cache for a value, but on a miss, need to talk to a database. What do I use?
  • 17.
    Example 2 I consulta cache for a value, but on a miss, need to talk to a database. What do I use? Answer: flatmap! public Future<Value> fetch(Key k) { return cache.fetch(k).flatmap( new Function<Value, Future<Value>>() { public Future<Value> apply(Value v) { if (v != null) return Future.value(v); return db.fetch(k); } }); }
  • 18.
    Handling Exceptions ● Don’tforget: map/flatmap will only execute for successful futures ● To deal with exceptions: handle/rescue are the analogous equivalent Future<A> handle(Function<Throwable, A>) Future<A> rescue(Function<Throwable, Future<A>>)
  • 19.
    Example 1 If thebackend I am calling throws an exception, I want to return an error code. What do I use?
  • 20.
    Example 1 If thebackend I am calling throws an exception, I want to return an error code. What do I use? Answer: handle! public Future<Result> foo() { return backend.foo().handle( new Function<Throwable, Result>() { public Result apply(Throwable t) { Result r = new Result(); r.setErrorCode(errorCodeFromThrowable(t)); return r; } });
  • 21.
    Example 2 I consulta cache for a value, but if that failed, need to talk to a database. What do I use?
  • 22.
    Example 2 I consulta cache for a value, but if that failed, need to talk to a database. What do I use? Answer: rescue! public Future<Value> get(Key k) { return cache.fetch(k).rescue( new Function<Throwable, Future<Value>>() { public Future<Value> apply(Throwable t) { LOG.error(“Cache lookup failed”, t); return db.fetch(k) } }); }
  • 23.
    Other handlers More Sequentialcomposition - join() Concurrent composition, return after all are satisfied - collect() Concurrent composition, return if any of the future is satisfied - select() Finish within in a Timeout: within() Delayed execuion: delayed()
  • 24.
    Common Pitfalls ● Neverblock on a Future in production code (ok for unit tests) ○ Avoid future.get(), future.apply(), Await.result(future) as it ties up I/O processing threads and it degrades Finagle’s performance considerably. ○ If you really need to block because you are dealing with synchronous libraries such as jdbc, jedis use a dedicated FuturePool. ● Avoid ThreadLocal<T>. Use com.twitter.finagle.context.LocalContext instead ● Don't use parallel streams in Java 8 ● Request concurrency leak - Never return “null” instead of Future<A> Future<String> getPinJson(long pinId) { return null; // This is bad! // Use, return Future.value(null); }
  • 25.
    Part 3. JavaService Framework Features Standardized Metrics - per client, per method success/fail counts and latency stats Logging - slow log, exception log, Rate limiting - Enforce quotas for clients Genesis - Tool to generate the required stubs to bootstrap a finagle-thrift service Warm up hook Graceful shutdown
  • 26.
    You need toenable your options via Proxy builder ServiceFrameworkProxy<UserService.ServiceIface> serviceFrameworkProxy = new ServiceFrameworkProxyBuilder<UserService.ServiceIface>() .setHandler(serviceHandler) .setServiceName(serviceName) .setClusterName(serviceName.toLowerCase()) .setServerSetPath(serverSetPath) .setClientNameProvider(new DefaultClientNameProvider()) .setRootLog(LOG) .setFailureLog(FAILURE_LOG) .enableExceptionTypeForFailureCount() .disableLoggingForThrowable(ClientDiscardedRequestException.class) .disableThrowablesAsServiceFailure( Arrays.asList(ClientDiscardedRequestException.class, DataValidationException.class)) .enableMethodNameForSuccessCountV2() .enableMethodNameForFailureCountV2() .enableMethodNameForResponseTimeMetricsV2() .enableClientNameTagForSuccessCount() .enableClientNameTagForFailureCount() .enableClientNameTagForResponseTimeMetrics() .enableExceptionLog() .build();
  • 27.
    Complaint 1: ● Clientsare noticing higher latency or timeouts during deploys or restarts. First few requests take longer than at steady state due to connection establishment, Java’s Hopspot JIT etc. Solution: Use warmUp hook and then join serverset public static boolean warmUp(Callable<Boolean> warmUpCall) // By default, invokes warmUpCall 100 times concurrently and expects it succeeds for at least 80% of the calls
  • 28.
    Graceful Shutdown ● Unjoinfrom serverset, waits for duration/2 secs and then tries to gracefully shutdown server by draining existing requests within the remaining duration/2 secs ServiceShutdownHook.register(server, Duration.fromSeconds(10), status) public static void register(final Server server, final Duration gracePeriod, final ServerSet.EndpointStatus endpointStatus)
  • 29.
    Complaint 2: Client isseeing rate limiting Exceptions even though rate limits are set to a high value Happens if the the server cluster is huge and the local rate limit per node becomes small and the python client is running on few nodes (pinlater, offline job etc) Solution: Try reducing max_connection_lifespan_ms if its python thriftmux client
  • 30.
    Next steps ● Finagleupgrade to 6.43 ○ Unlocks Retry Budgets ○ Defaults to P2C Load balancer instead of heap ○ Toggle between Netty3 and Netty4 ○ Couple of performance fixes in Future scheduler ○ Many more...
  • 31.
    Resources: “Your server asa function” paper - https://dl.acm.org/citation.cfm?id=2525538 Source code: https://github.com/twitter/finagle Finaglers - https://groups.google.com/d/forum/finaglers Blogs: [1] https://twitter.github.io/scala_school/finagle.html [2] https://twitter.github.io/finagle/guide/developers/Futures.html [3] http://vkostyukov.net/posts/finagle-101/
  • 32.

Editor's Notes

  • #6 Server modules/ client modules
  • #11 history
  • #12 Internal state machine - Waiting, Interruptible, Transforming, Interrupted, Linked, and Done.
  • #13 How is a future fulfilled? Stack traces