Abstract Algebra and
Category Theory
Naveen Muguda
• Data processing is, generally, "the collection and manipulation of
items of data to produce meaningful information”.
• Validation – Ensuring that supplied data is correct and relevant.
• Sorting – "arranging items in some sequence and/or in different sets."
• Summarization – reducing detail data to its main points.
• Aggregation – combining multiple pieces of data.
• Analysis – the "collection, organization, analysis, interpretation and
presentation of data."
• Reporting – list detail or summary data or computed information.
• Classification – separation of data into various categories.
Multiple perspectives
Different shapes, same view
Bubble Sort Quick Sort
sorts yes yes
Complexity O(n ^ 2) O(n lgn)
progression
• Data structures, Complexity Theory …….. Algebraic Structures,
Category Theory
def sum(list: List[Int]): Int = {
if(list.isEmpty) 0
else list.head + sum(list.tail)
}
Dimensions of Data processing
data
container
logic
Dimensions of Data processing
data
container
logic
Abstract Algebra
Category Theory
Closure
• 2 + 3 = 5 (whole number)
• 2 - 3 = -1 (not an positive integer)
• When an operation is performed on any two things of a kind, it
results in another thing of the same kind.
associativity
• def fold[A1 >: A](z: A1)(op: (A1, A1) ⇒ A1): A1
• val l = List(1, 3, 5, 11)
println(l.fold(100) (_ - _));
(1 – 3) – 5 != 1 - (3 – 5)
Algebraic Structure
• a Set with finite operations and set of laws these operations obey
Algebraic Structures
• a magma consists of a set equipped with a closed single binary
operation
• a semigroup : an associative magma
• a monoid : a semigroup with Identity element.
• a group : a monoid with a unary operation (inverse), giving rise
to inverse elements.
Loop invariants
int result = 0
for(int index = 0; index < a.length; a++)
result += a[i]
result = sum { a, [0, index)}
• [Int, 0, +] is a monoid
void mergeSort(List list, int l, int r)
{
if (l < r)
{
// Same as (l+r)/2, but avoids overflow for
// large l and h
int m = l+(r-l)/2;
// Sort first and second halves
mergeSort(list, l, m);
mergeSort(list, m+1, r);
merge(list, l, m, r);
}
}
[SortedList, Merge, Empty List] is a monoid
Representation
• Set in maths => java.util.Set, scala.collection.
Typeclass
• Type class is a class (group) of types, which satisfies some contract-
defined trait
• functionality (trait and implementation) can be added without any
changes to the original code
Typeclass in Scala
a type class consists of three components:
1. The type class itself, which is defined as a trait that takes at least
one generic parameter
2. Instances of the type class for the data types you want to extend
3. Interface methods you expose to users of your new API
trait Monoid[A] {
// an identity element
def id: A
// an associative
operation
def op(x: A, y: A): A
}
implicit val intAddition = new Monoid[Int] {
def id = 0
def op(x:Int, y: Int) = x + y
}
implicit def fold[A](la: List[A])(implicit am: Monoid[A]): A =
la.foldLeft(am.id)(am.op)
fold(List(1, 3, 4)) => 8
monoids
• Boolean, True, &&
• Boolean, False, ||
• Integer, 0, +
• Integer, 1, *
• Integer, Integer.Min, max
• Integer, Integer.Max, min
• fold(l)( integerAdditionMonoid) => sum
• fold(l) (integerMultiplicationMonoid) => product
Monoids, monoids
• Binary Search Tree
• Priority Queue
• Associativity enables parallelism
Interesting monoids
• List, concatenation
• Set, {}, Union
• Sorted List, merge
Find if an element is present in an array
implicit val boolOr = new Monoid[Boolean] {
def id = false;
def op(x: Boolean , y: Boolean) = x || y
}
val list = List(1, 3, 4)
println(fold(list.map(x => x == 2))); The list was converted to a list of Booleans
and then folded over
• combine => max, reduce => max
If we want to calculate mean
• mean(0, 20, 10, 25, 15) = 14
• mean(mean(0, 20, 10), mean(25, 15)) = mean(10, 20) = 15
monoidfy
public class Pair {
int sum;
int count;
}
Pair merge(Pair first, Pair second) {
return new Pair(first.sum + second.sum, first.count + second.count);
}
Monoid homomorphisms
• List of integers => list of Booleans => fold
• Which computing platforms has this kind of behavior
• List<String> => List<Integer> : word count
• List<String> => String :longest word:
• List<String> => Map<String, Integer> :word frequency
PACELC
Streams Sketches
• Probabilistic Data structures
• Bloom Filters, HyperLogLog etc
• They are monoidal
• Used in stream processing
Lambda Architecture, Summing Bird
CRDT
• Grow only counter
Category
• Category of numbers and <=
• Category of all sets and subset relation
• Category of sets and functions
• Objects and arrows
• If f and g exists then g.f should exist
• Totality is not assumed
Functor
• trait Functor[F[_]] { def map[A, B](fa: F[A])(f: A => B): F[B] }
Monads
trait Monad[F[_]] {
def flatMap[A, B](fa: F[A])(f: (A) => F[B]):
F[B] def pure[A](x: A): F[A]
}
IPL team isSpinner
Kohli RCB no
Chahal RCB yes
Kuldeep KKR yes
Pujara null no
Team Won IPL in
RCB null
KKR 2012, 2014
Different containers, different behaviors
• Optional.of(“pujara”). map(getIplTeam).map(firstWonIn) =>
Optional.empty
• Optional.of(“kohli”). map(getIplTeam).map(firstWonIn) =>
Optional.empty
• Set.of(“kohli”, “chahal”, “kuldeep”).map(getIplTeam) => Set.of(“RCB”,
“KKR”)
• Set.of(“kohli”, “chahal”, “kuldeep”).map(getIplTeam).flatMap(wonIn)
=> Set.of(2012, 2014)
• List.of(“kohli”, “chahal”, “kuldeep”).map(getIplTeam) => List.of(“RCB”,
“RCB”, “KKR”)
Optional
List
Set
map
filter
flatmap
fold
7constructs vs 12
“Structure” preserving operations
Set.of(“kohli”, “chahal”, “kuldeep”)
.filter(isSpinner)
.map(getIplTeam)
.flatMap(wonIn) => Set.of(2012, 2014)
Implement a Monad

Abstract Algebra and Category Theory

  • 1.
    Abstract Algebra and CategoryTheory Naveen Muguda
  • 2.
    • Data processingis, generally, "the collection and manipulation of items of data to produce meaningful information”.
  • 3.
    • Validation –Ensuring that supplied data is correct and relevant. • Sorting – "arranging items in some sequence and/or in different sets." • Summarization – reducing detail data to its main points. • Aggregation – combining multiple pieces of data. • Analysis – the "collection, organization, analysis, interpretation and presentation of data." • Reporting – list detail or summary data or computed information. • Classification – separation of data into various categories.
  • 5.
  • 6.
  • 7.
    Bubble Sort QuickSort sorts yes yes Complexity O(n ^ 2) O(n lgn)
  • 8.
    progression • Data structures,Complexity Theory …….. Algebraic Structures, Category Theory
  • 9.
    def sum(list: List[Int]):Int = { if(list.isEmpty) 0 else list.head + sum(list.tail) }
  • 11.
    Dimensions of Dataprocessing data container logic
  • 12.
    Dimensions of Dataprocessing data container logic Abstract Algebra Category Theory
  • 13.
    Closure • 2 +3 = 5 (whole number) • 2 - 3 = -1 (not an positive integer) • When an operation is performed on any two things of a kind, it results in another thing of the same kind.
  • 14.
  • 16.
    • def fold[A1>: A](z: A1)(op: (A1, A1) ⇒ A1): A1 • val l = List(1, 3, 5, 11) println(l.fold(100) (_ - _)); (1 – 3) – 5 != 1 - (3 – 5)
  • 17.
    Algebraic Structure • aSet with finite operations and set of laws these operations obey
  • 18.
    Algebraic Structures • amagma consists of a set equipped with a closed single binary operation • a semigroup : an associative magma • a monoid : a semigroup with Identity element. • a group : a monoid with a unary operation (inverse), giving rise to inverse elements.
  • 19.
    Loop invariants int result= 0 for(int index = 0; index < a.length; a++) result += a[i] result = sum { a, [0, index)}
  • 20.
    • [Int, 0,+] is a monoid
  • 21.
    void mergeSort(List list,int l, int r) { if (l < r) { // Same as (l+r)/2, but avoids overflow for // large l and h int m = l+(r-l)/2; // Sort first and second halves mergeSort(list, l, m); mergeSort(list, m+1, r); merge(list, l, m, r); } } [SortedList, Merge, Empty List] is a monoid
  • 22.
    Representation • Set inmaths => java.util.Set, scala.collection.
  • 23.
    Typeclass • Type classis a class (group) of types, which satisfies some contract- defined trait • functionality (trait and implementation) can be added without any changes to the original code
  • 24.
    Typeclass in Scala atype class consists of three components: 1. The type class itself, which is defined as a trait that takes at least one generic parameter 2. Instances of the type class for the data types you want to extend 3. Interface methods you expose to users of your new API
  • 25.
    trait Monoid[A] { //an identity element def id: A // an associative operation def op(x: A, y: A): A } implicit val intAddition = new Monoid[Int] { def id = 0 def op(x:Int, y: Int) = x + y } implicit def fold[A](la: List[A])(implicit am: Monoid[A]): A = la.foldLeft(am.id)(am.op) fold(List(1, 3, 4)) => 8
  • 26.
    monoids • Boolean, True,&& • Boolean, False, || • Integer, 0, + • Integer, 1, * • Integer, Integer.Min, max • Integer, Integer.Max, min
  • 27.
    • fold(l)( integerAdditionMonoid)=> sum • fold(l) (integerMultiplicationMonoid) => product
  • 28.
    Monoids, monoids • BinarySearch Tree • Priority Queue
  • 30.
  • 31.
    Interesting monoids • List,concatenation • Set, {}, Union • Sorted List, merge
  • 33.
    Find if anelement is present in an array implicit val boolOr = new Monoid[Boolean] { def id = false; def op(x: Boolean , y: Boolean) = x || y } val list = List(1, 3, 4) println(fold(list.map(x => x == 2))); The list was converted to a list of Booleans and then folded over
  • 35.
    • combine =>max, reduce => max
  • 36.
    If we wantto calculate mean • mean(0, 20, 10, 25, 15) = 14 • mean(mean(0, 20, 10), mean(25, 15)) = mean(10, 20) = 15
  • 37.
    monoidfy public class Pair{ int sum; int count; } Pair merge(Pair first, Pair second) { return new Pair(first.sum + second.sum, first.count + second.count); }
  • 38.
    Monoid homomorphisms • Listof integers => list of Booleans => fold • Which computing platforms has this kind of behavior
  • 41.
    • List<String> =>List<Integer> : word count • List<String> => String :longest word: • List<String> => Map<String, Integer> :word frequency
  • 42.
  • 43.
    Streams Sketches • ProbabilisticData structures • Bloom Filters, HyperLogLog etc • They are monoidal • Used in stream processing
  • 45.
  • 46.
  • 47.
  • 48.
    • Category ofnumbers and <= • Category of all sets and subset relation • Category of sets and functions
  • 49.
    • Objects andarrows • If f and g exists then g.f should exist • Totality is not assumed
  • 50.
  • 53.
    • trait Functor[F[_]]{ def map[A, B](fa: F[A])(f: A => B): F[B] }
  • 54.
    Monads trait Monad[F[_]] { defflatMap[A, B](fa: F[A])(f: (A) => F[B]): F[B] def pure[A](x: A): F[A] }
  • 55.
    IPL team isSpinner KohliRCB no Chahal RCB yes Kuldeep KKR yes Pujara null no Team Won IPL in RCB null KKR 2012, 2014
  • 56.
    Different containers, differentbehaviors • Optional.of(“pujara”). map(getIplTeam).map(firstWonIn) => Optional.empty • Optional.of(“kohli”). map(getIplTeam).map(firstWonIn) => Optional.empty • Set.of(“kohli”, “chahal”, “kuldeep”).map(getIplTeam) => Set.of(“RCB”, “KKR”) • Set.of(“kohli”, “chahal”, “kuldeep”).map(getIplTeam).flatMap(wonIn) => Set.of(2012, 2014) • List.of(“kohli”, “chahal”, “kuldeep”).map(getIplTeam) => List.of(“RCB”, “RCB”, “KKR”)
  • 57.
  • 58.
    “Structure” preserving operations Set.of(“kohli”,“chahal”, “kuldeep”) .filter(isSpinner) .map(getIplTeam) .flatMap(wonIn) => Set.of(2012, 2014)
  • 59.