Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
User Defined Functions 
… new in Apache Cassandra 3.0 
CASSANDRA-7395 + many more…
Me, Robert… 
• Contribute code to Apache Cassandra (UDFs, row cache + more) 
• Help customers to build Cassandra solutions...
Disclaimer 
• Apache Cassandra 3.0 is the next major release 
• Everything is under development 
• Things may change 
• Th...
Apache Cassandra 3.0 
• … will bring a lot of cool, new features ! 
• … will bring a lot of great improvements ! 
• UDFs i...
UDF - 
What’s that??
UDF 
• UDF means User Defined Function 
• You write the code that’s executed on Cassandra nodes 
• Functions are distribut...
UDF Characteristics 
• „Pure“ 
• just input parameters 
• no state, side effects, dependencies to other code, etc 
• Usual...
Consider a Java function like… 
import nothing; 
public final class MyClass 
{ 
public static int myFunction ( int argumen...
CREATE FUNCTION sinPlusFoo 
( 
valueA double, 
valueB double 
arguments 
return type 
) 
RETURNS double 
LANGUAGE java 
AS...
Next Example 
CREATE FUNCTION sin ( 
value double ) 
RETURNS double 
LANGUAGE javascript 
AS ’Math.sin(value);’; 
JavaScri...
JSR 223 
• “Scripting for the Java Platform“ 
• UDFs can be written in Java and JavaScript 
• Optionally: Groovy, JRuby, J...
Behind the scenes 
• Builds Java (or script) source 
• Compiles that code (Java class, or compiled script) 
• Loads the co...
Types for UDFs 
• Support for all Cassandra types for arguments and return value 
• All means 
• Primitives (boolean, int,...
UDF - 
For what?
UDF invocation 
SELECT sumThat ( colA, colB ) 
Now your application can 
sum two values in one row - 
or create the sin of...
UDFs are good for… 
• UDFs on their own are just „nice to have“ 
• Nothing you couldn’t do better in your application
Real Use Case for UDFs ?
User Defined Aggregates ! 
CASSANDRA-8053
User Defined Aggregates 
Use UDFs to code your own aggregation functions 
(Aggregates are things like SUM, AVG, MIN, MAX, ...
Example 
name of the aggregate 
function argument types 
CREATE AGGREGATE minimum ( int ) 
STYPE int 
SFUNC minimumState; ...
How an aggregate works 
SELECT minimum ( val ) FROM foo … 
1. Initial state is set to null 
2. for each row the state func...
More sophisticated 
CREATE AGGREGATE average ( int ) 
SFUNC averageState 
STYPE tuple<long,int> 
FINALFUNC averageFinal 
I...
How that aggregate works 
SELECT average ( val ) FROM foo … 
1. Initial state is set to (0,0) 
2. for each row the state f...
Now everybody can execute evil 
code on your cluster :)
UDF permissions 
• There will be permissions to restrict (allow) 
• UDF creation (DDL) 
• UDF execution (DML) 
CASSANDRA-7...
Built in functions 
• All known built-in functions are called native functions 
• Native functions belong to SYSTEM keyspa...
UDF belong to a keyspace 
• User Defined Functions and 
• User Defined Aggregate 
• belong to a keyspace 
• SYSTEM keyspac...
UDF - some final words… 
Keep in mind: 
• JSR-223 has overhead - Java UDFs are much faster 
• Do not allow everyone to cre...
For the geeks :) 
• UDFs and user defined aggregates are executed on the coordinator node 
• Prefer to use Java-UDFs for p...
Let a man dream… 
UDFs could be useful for… 
• Functional indexes 
• Partial indexes 
• Filtering 
• Distributed GROUP BY ...
Q & A 
THANK YOU FOR YOUR ATTENTION :) 
Robert Stupp, Cologne, Germany 
@snazy snazy@snazy.de
User defined-functions-cassandra-summit-eu-2014
Upcoming SlideShare
Loading in …5
×

User defined-functions-cassandra-summit-eu-2014

5,716 views

Published on

Overview of User Defined Functions in Apache Cassandra 3.0. Presentation from Cassandra Summit Europe 2014 in London.

Published in: Software
  • Be the first to comment

User defined-functions-cassandra-summit-eu-2014

  1. 1. User Defined Functions … new in Apache Cassandra 3.0 CASSANDRA-7395 + many more…
  2. 2. Me, Robert… • Contribute code to Apache Cassandra (UDFs, row cache + more) • Help customers to build Cassandra solutions • Freelancer, Coder Robert Stupp, Cologne, Germany @snazy snazy@snazy.de
  3. 3. Disclaimer • Apache Cassandra 3.0 is the next major release • Everything is under development • Things may change • Things may be different in final 3.0 release
  4. 4. Apache Cassandra 3.0 • … will bring a lot of cool, new features ! • … will bring a lot of great improvements ! • UDFs is just one of these features :)
  5. 5. UDF - What’s that??
  6. 6. UDF • UDF means User Defined Function • You write the code that’s executed on Cassandra nodes • Functions are distributed transparently to the whole cluster • You may not have to wait for a new release for new functionality :)
  7. 7. UDF Characteristics • „Pure“ • just input parameters • no state, side effects, dependencies to other code, etc • Usually deterministic
  8. 8. Consider a Java function like… import nothing; public final class MyClass { public static int myFunction ( int argument ) { return argument * 42; } } This would be your UDF
  9. 9. CREATE FUNCTION sinPlusFoo ( valueA double, valueB double arguments return type ) RETURNS double LANGUAGE java AS ’return Math.sin(valueA) + valueB;’; Java works out of the box! Example define the UDF language Java code
  10. 10. Next Example CREATE FUNCTION sin ( value double ) RETURNS double LANGUAGE javascript AS ’Math.sin(value);’; JavaScript works out of the box! Cassandra 3.0 targets Java8 - so it’s „Nashorn“ JavaScript works, too! Javascript code
  11. 11. JSR 223 • “Scripting for the Java Platform“ • UDFs can be written in Java and JavaScript • Optionally: Groovy, JRuby, Jython, Scala • Not: Clojure (JSR 223 implementation’s wrong)
  12. 12. Behind the scenes • Builds Java (or script) source • Compiles that code (Java class, or compiled script) • Loads the compiled code • Migrates the function to all other nodes • Done - UDF is executable on any node
  13. 13. Types for UDFs • Support for all Cassandra types for arguments and return value • All means • Primitives (boolean, int, double, uuid, etc) • Collections (list, set, map) • Tuple types, User Defined Types
  14. 14. UDF - For what?
  15. 15. UDF invocation SELECT sumThat ( colA, colB ) Now your application can sum two values in one row - or create the sin of a value! GREAT NEW FEATURES! Okay - not really… FROM myTable WHERE key = ... SELECT sin ( foo ) FROM myCircle WHERE pk = ...
  16. 16. UDFs are good for… • UDFs on their own are just „nice to have“ • Nothing you couldn’t do better in your application
  17. 17. Real Use Case for UDFs ?
  18. 18. User Defined Aggregates ! CASSANDRA-8053
  19. 19. User Defined Aggregates Use UDFs to code your own aggregation functions (Aggregates are things like SUM, AVG, MIN, MAX, etc) Aggregates : consume values from multiple rows & produce a single result
  20. 20. Example name of the aggregate function argument types CREATE AGGREGATE minimum ( int ) STYPE int SFUNC minimumState; name of the “state“ UDF Syntax similar to Postgres. state type
  21. 21. How an aggregate works SELECT minimum ( val ) FROM foo … 1. Initial state is set to null 2. for each row the state function is called with current state and column value - returns new state 3. After all rows the aggregate returns the last state
  22. 22. More sophisticated CREATE AGGREGATE average ( int ) SFUNC averageState STYPE tuple<long,int> FINALFUNC averageFinal INITCOND (0, 0); UDF called after last row FINALFUNC + INITCOND are optional initial state value
  23. 23. How that aggregate works SELECT average ( val ) FROM foo … 1. Initial state is set to (0,0) 2. for each row the state function is called with current state + column value - returns new state 3. After all rows the final function is called with last state 4. final function calculates the aggregate
  24. 24. Now everybody can execute evil code on your cluster :)
  25. 25. UDF permissions • There will be permissions to restrict (allow) • UDF creation (DDL) • UDF execution (DML) CASSANDRA-7557
  26. 26. Built in functions • All known built-in functions are called native functions • Native functions belong to SYSTEM keyspace • Native functions cannot be modified (or dropped) Note: you already know native functions like now, count, unixtimestampof
  27. 27. UDF belong to a keyspace • User Defined Functions and • User Defined Aggregate • belong to a keyspace • SYSTEM keyspace is searched first for functions (then the current keyspace) if function/aggregate is not fully qualified
  28. 28. UDF - some final words… Keep in mind: • JSR-223 has overhead - Java UDFs are much faster • Do not allow everyone to create UDFs (in production) • Keep your UDFs “pure“ • Test your UDFs and user defined aggregates thoroughly
  29. 29. For the geeks :) • UDFs and user defined aggregates are executed on the coordinator node • Prefer to use Java-UDFs for performance reasons
  30. 30. Let a man dream… UDFs could be useful for… • Functional indexes • Partial indexes • Filtering • Distributed GROUP BY • etc etc
  31. 31. Q & A THANK YOU FOR YOUR ATTENTION :) Robert Stupp, Cologne, Germany @snazy snazy@snazy.de

×