SlideShare a Scribd company logo
1 of 22
Presented By :
Simarpreet Kaur Monga
Software Consultant
Knoldus Software LLP
Introduction To
Algebird: Abstract algebra for analytics
Agenda
● What is Algebird ?
● Problems in computation in Big Data Analysis
● Advantages of Abstract Algebra in computing world.
● Why Algebird ?
● Projects using Algebird
● Example Usage - Counting With Algebird and Spark
● Demo
● References
What is Algebird ?
● Algebird is a library which provides abstractions for abstract
algebra in the Scala programming language.
● Twitter recently open-sourced Algebird, which provides you
with a JVM library to work with algebraic data structures
● Note that algebird was originally developed to support
scalding, which is Twitter's Scala library for Hadoop.
Problems in Computation in Big Data
Analysis
➢ How do you combine or add 2 instances of complex data
structure in an easy way ?
val first = BloomFilter(...)
val second = BloomFilter(...)
first + second == uh?
Problems in Computation in Big Data
Analysis
➢ What about performing other operations on those Bloom
filter instances, notably data processing pipelines based on
common functions such as map, flatMap, foldLeft,
reduceLeft?
val filters = Seq[BloomFilter](...)
val summary = filters flatMap { /* magic happens here */ }
reduceLeft { /* more magic */ } ...
Discussion involing
Ted Dunning of MapR
and his work on a new
data structure called t-
digest:
Why was Ted being asked whether t-digest is associative?
And how does all this relate to semigroups and monoids?
Advantages Of Abstract Algebra in
Computing World
● Abstract algebra allows you to write reusable code.
● Data analysis involves lots of concepts that can be described
using abstract algebra.
● Using abstract algebra abstractions (specifically
commutative operations) allows you to merge distributed
computations without concern for the order in which they
are merged.
Why Algebird ?
If you can turn a data structure into a monoid (or semigroup, or
…), then Algebird allows you to put it to good use. You can then
work with your data structure just as nicely as you are so used to
when dealing with Int, Double or List. And you can use it with
large-scale data processing tools such as Hadoop and Storm,
too.
Monoids & Monads
As a grossly simplified rule of thumb:
➢ Monoid: If you want to “attach” operations such as +, -, *,
/ or <= to data objects – say, adding two Bloom filters –
then you want to provide monoid forms for those data
objects (e.g. a monoid for your Bloom filter data structure).
This way you can combine and juggle your custom data
structures just like you would do with plain integer
numbers.
➢ Monad: If you want to create data processing pipelines that turn
data objects step-by-step into the desired, final output (e.g.
aggregating raw records into summary statistics), then you want to
build one or more monads to model these data pipelines. Particularly
if you want to run those pipelines in large-scala data processing
platforms such as Hadoop or Storm.
Projects using Algebird
Spark
Scio
Scalding
SummingBird
Simmer
Packetloop
Counting With AlgeBird and Spark
Summing Integers
Counting With Algebird and Spark
Summing Integers
1 + 2 + 3 + 4 + ….. + 30
Counting With Algebird and Spark
Summing Integers
1 + 2 + 3 + 4 + ….. + 30
(1 + 2) + (3 + 4) + ….. + (28 + 29 + 30)
Counting With Algebird and Spark
Summing Integers
1 + 2 + 3 + 4 + ….. + 30
(1 + 2) + (3 + 4) + ….. + (28 + 29 + 30)
Counting With Algebird and Spark
Algebraic Properties Of Integers and Addition
Associative: (a + b) + c = a + (b + c) <= can be partitioned/parallelized
Closed: Int + Int = Int <= reduce in multiple stages
Identity: a + 0 = a <= support “empty”/”zero” items
Counting With Algebird and Spark
Algebraic Properties Of Integers and Addition
Associative: (a + b) + c = a + (b + c) <= can be partitioned/parallelized
Closed: Int + Int = Int <= reduce in multiple stages
Identity: a + 0 = a <= support “empty”/”zero” items
Integers form a Monoid under Addition
Counting With Algebird and Spark
Algebraic Properties Of Integers and Addition
Associative: (a + b) + c = a + (b + c) <= can be partitioned/parallelized
Closed: Int + Int = Int <= reduce in multiple stages
Identity: a + 0 = a <= support “empty”/”zero” items
Integers form a Monoid under Addition
Monoid computations can be effectively run on Spark
DEMO
References
➢ http://www.michael-noll.com/blog/2013/12/02/twitter-algebird-m
➢ https://gist.github.com/johnynek/814fc1e77aad1d295bb7
➢ https://twitter.github.io/algebird/
Introduction To Algebird: Abstract algebra for analytics

More Related Content

What's hot

Elementary data structure
Elementary data structureElementary data structure
Elementary data structureBiswajit Mandal
 
Bca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureBca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureRai University
 
Introduction of data structure
Introduction of data structureIntroduction of data structure
Introduction of data structureeShikshak
 
2nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 12nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 1Aahwini Esware gowda
 
Data Structure In C#
Data Structure In C#Data Structure In C#
Data Structure In C#Shahzad
 
Data Structures (CS8391)
Data Structures (CS8391)Data Structures (CS8391)
Data Structures (CS8391)Elavarasi K
 
Data structure lecture 2
Data structure lecture 2Data structure lecture 2
Data structure lecture 2Kumar
 
Arrays Data Structure
Arrays Data StructureArrays Data Structure
Arrays Data Structurestudent
 
02 Arrays And Memory Mapping
02 Arrays And Memory Mapping02 Arrays And Memory Mapping
02 Arrays And Memory MappingQundeel
 
Data structure and its types
Data structure and its typesData structure and its types
Data structure and its typesNavtar Sidhu Brar
 
Set data structure
Set data structure Set data structure
Set data structure Tech_MX
 
Lecture 01 Intro to DSA
Lecture 01 Intro to DSALecture 01 Intro to DSA
Lecture 01 Intro to DSANurjahan Nipa
 
Fundamentals of data structures
Fundamentals of data structuresFundamentals of data structures
Fundamentals of data structuresNiraj Agarwal
 
Data structure using c module 1
Data structure using c module 1Data structure using c module 1
Data structure using c module 1smruti sarangi
 

What's hot (20)

Elementary data structure
Elementary data structureElementary data structure
Elementary data structure
 
Data structure ppt
Data structure pptData structure ppt
Data structure ppt
 
Bca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureBca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structure
 
Introduction of data structure
Introduction of data structureIntroduction of data structure
Introduction of data structure
 
2nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 12nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 1
 
Data Structure In C#
Data Structure In C#Data Structure In C#
Data Structure In C#
 
Array ppt
Array pptArray ppt
Array ppt
 
Data Structures (CS8391)
Data Structures (CS8391)Data Structures (CS8391)
Data Structures (CS8391)
 
Data structure lecture 2
Data structure lecture 2Data structure lecture 2
Data structure lecture 2
 
DATASTRUCTURES UNIT-1
DATASTRUCTURES UNIT-1DATASTRUCTURES UNIT-1
DATASTRUCTURES UNIT-1
 
Arrays Data Structure
Arrays Data StructureArrays Data Structure
Arrays Data Structure
 
Lecture 2a arrays
Lecture 2a arraysLecture 2a arrays
Lecture 2a arrays
 
02 Arrays And Memory Mapping
02 Arrays And Memory Mapping02 Arrays And Memory Mapping
02 Arrays And Memory Mapping
 
Data Structures (BE)
Data Structures (BE)Data Structures (BE)
Data Structures (BE)
 
Data structure and its types
Data structure and its typesData structure and its types
Data structure and its types
 
Set data structure
Set data structure Set data structure
Set data structure
 
Lecture 01 Intro to DSA
Lecture 01 Intro to DSALecture 01 Intro to DSA
Lecture 01 Intro to DSA
 
Fundamentals of data structures
Fundamentals of data structuresFundamentals of data structures
Fundamentals of data structures
 
Data structures
Data structuresData structures
Data structures
 
Data structure using c module 1
Data structure using c module 1Data structure using c module 1
Data structure using c module 1
 

Similar to Introduction To Algebird: Abstract algebra for analytics

Aggregating In Accumulo
Aggregating In AccumuloAggregating In Accumulo
Aggregating In AccumuloBill Slacum
 
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalRMADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalRPivotalOpenSourceHub
 
Scalding: Twitter's New DSL for Hadoop
Scalding: Twitter's New DSL for HadoopScalding: Twitter's New DSL for Hadoop
Scalding: Twitter's New DSL for HadoopDataWorks Summit
 
Scalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/CascadingScalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/Cascadingjohnynek
 
(2) collections algorithms
(2) collections algorithms(2) collections algorithms
(2) collections algorithmsNico Ludwig
 
10 things I’ve learnt In the clouds
10 things I’ve learnt In the clouds10 things I’ve learnt In the clouds
10 things I’ve learnt In the cloudsStuart Lodge
 
MLconf NYC Shan Shan Huang
MLconf NYC Shan Shan HuangMLconf NYC Shan Shan Huang
MLconf NYC Shan Shan HuangMLconf
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsRajendran
 
Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Dieter Plaetinck
 
Spark ml streaming
Spark ml streamingSpark ml streaming
Spark ml streamingAdam Doyle
 
Xxx treme aggregation
Xxx treme aggregationXxx treme aggregation
Xxx treme aggregationBill Slacum
 
lec_4_data_structures_and_algorithm_analysis.ppt
lec_4_data_structures_and_algorithm_analysis.pptlec_4_data_structures_and_algorithm_analysis.ppt
lec_4_data_structures_and_algorithm_analysis.pptSourabhPal46
 
lec_4_data_structures_and_algorithm_analysis.ppt
lec_4_data_structures_and_algorithm_analysis.pptlec_4_data_structures_and_algorithm_analysis.ppt
lec_4_data_structures_and_algorithm_analysis.pptMard Geer
 
C++ STL (quickest way to learn, even for absolute beginners).pptx
C++ STL (quickest way to learn, even for absolute beginners).pptxC++ STL (quickest way to learn, even for absolute beginners).pptx
C++ STL (quickest way to learn, even for absolute beginners).pptxAbhishek Tirkey
 
C++ STL (quickest way to learn, even for absolute beginners).pptx
C++ STL (quickest way to learn, even for absolute beginners).pptxC++ STL (quickest way to learn, even for absolute beginners).pptx
C++ STL (quickest way to learn, even for absolute beginners).pptxGauravPandey43518
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifyNeville Li
 
A Gentle Introduction to Coding ... with Python
A Gentle Introduction to Coding ... with PythonA Gentle Introduction to Coding ... with Python
A Gentle Introduction to Coding ... with PythonTariq Rashid
 

Similar to Introduction To Algebird: Abstract algebra for analytics (20)

Aggregating In Accumulo
Aggregating In AccumuloAggregating In Accumulo
Aggregating In Accumulo
 
Java 8
Java 8Java 8
Java 8
 
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalRMADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
 
Scalding: Twitter's New DSL for Hadoop
Scalding: Twitter's New DSL for HadoopScalding: Twitter's New DSL for Hadoop
Scalding: Twitter's New DSL for Hadoop
 
Scalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/CascadingScalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/Cascading
 
(2) collections algorithms
(2) collections algorithms(2) collections algorithms
(2) collections algorithms
 
10 things I’ve learnt In the clouds
10 things I’ve learnt In the clouds10 things I’ve learnt In the clouds
10 things I’ve learnt In the clouds
 
MLconf NYC Shan Shan Huang
MLconf NYC Shan Shan HuangMLconf NYC Shan Shan Huang
MLconf NYC Shan Shan Huang
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notations
 
Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014
 
C++.pptx
C++.pptxC++.pptx
C++.pptx
 
Spark ml streaming
Spark ml streamingSpark ml streaming
Spark ml streaming
 
Xxx treme aggregation
Xxx treme aggregationXxx treme aggregation
Xxx treme aggregation
 
lec_4_data_structures_and_algorithm_analysis.ppt
lec_4_data_structures_and_algorithm_analysis.pptlec_4_data_structures_and_algorithm_analysis.ppt
lec_4_data_structures_and_algorithm_analysis.ppt
 
lec_4_data_structures_and_algorithm_analysis.ppt
lec_4_data_structures_and_algorithm_analysis.pptlec_4_data_structures_and_algorithm_analysis.ppt
lec_4_data_structures_and_algorithm_analysis.ppt
 
C++ STL (quickest way to learn, even for absolute beginners).pptx
C++ STL (quickest way to learn, even for absolute beginners).pptxC++ STL (quickest way to learn, even for absolute beginners).pptx
C++ STL (quickest way to learn, even for absolute beginners).pptx
 
C++ STL (quickest way to learn, even for absolute beginners).pptx
C++ STL (quickest way to learn, even for absolute beginners).pptxC++ STL (quickest way to learn, even for absolute beginners).pptx
C++ STL (quickest way to learn, even for absolute beginners).pptx
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
What`s New in Java 8
What`s New in Java 8What`s New in Java 8
What`s New in Java 8
 
A Gentle Introduction to Coding ... with Python
A Gentle Introduction to Coding ... with PythonA Gentle Introduction to Coding ... with Python
A Gentle Introduction to Coding ... with Python
 

More from Knoldus Inc.

Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxKnoldus Inc.
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxKnoldus Inc.
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxKnoldus Inc.
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxKnoldus Inc.
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationKnoldus Inc.
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationKnoldus Inc.
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIsKnoldus Inc.
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II PresentationKnoldus Inc.
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Configuring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAConfiguring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAKnoldus Inc.
 
Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Knoldus Inc.
 
Azure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptxAzure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptxKnoldus Inc.
 
The Power of Dependency Injection with Dagger 2 and Kotlin
The Power of Dependency Injection with Dagger 2 and KotlinThe Power of Dependency Injection with Dagger 2 and Kotlin
The Power of Dependency Injection with Dagger 2 and KotlinKnoldus Inc.
 
Data Engineering with Databricks Presentation
Data Engineering with Databricks PresentationData Engineering with Databricks Presentation
Data Engineering with Databricks PresentationKnoldus Inc.
 
Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)Knoldus Inc.
 
NoOps - (Automate Ops) Presentation.pptx
NoOps - (Automate Ops) Presentation.pptxNoOps - (Automate Ops) Presentation.pptx
NoOps - (Automate Ops) Presentation.pptxKnoldus Inc.
 
Mastering Distributed Performance Testing
Mastering Distributed Performance TestingMastering Distributed Performance Testing
Mastering Distributed Performance TestingKnoldus Inc.
 
MLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptxMLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptxKnoldus Inc.
 
Introduction to Ansible Tower Presentation
Introduction to Ansible Tower PresentationIntroduction to Ansible Tower Presentation
Introduction to Ansible Tower PresentationKnoldus Inc.
 
CQRS with dot net services presentation.
CQRS with dot net services presentation.CQRS with dot net services presentation.
CQRS with dot net services presentation.Knoldus Inc.
 

More from Knoldus Inc. (20)

Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptx
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptx
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptx
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptx
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake Presentation
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics Presentation
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIs
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II Presentation
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Configuring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAConfiguring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRA
 
Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)
 
Azure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptxAzure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptx
 
The Power of Dependency Injection with Dagger 2 and Kotlin
The Power of Dependency Injection with Dagger 2 and KotlinThe Power of Dependency Injection with Dagger 2 and Kotlin
The Power of Dependency Injection with Dagger 2 and Kotlin
 
Data Engineering with Databricks Presentation
Data Engineering with Databricks PresentationData Engineering with Databricks Presentation
Data Engineering with Databricks Presentation
 
Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)
 
NoOps - (Automate Ops) Presentation.pptx
NoOps - (Automate Ops) Presentation.pptxNoOps - (Automate Ops) Presentation.pptx
NoOps - (Automate Ops) Presentation.pptx
 
Mastering Distributed Performance Testing
Mastering Distributed Performance TestingMastering Distributed Performance Testing
Mastering Distributed Performance Testing
 
MLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptxMLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptx
 
Introduction to Ansible Tower Presentation
Introduction to Ansible Tower PresentationIntroduction to Ansible Tower Presentation
Introduction to Ansible Tower Presentation
 
CQRS with dot net services presentation.
CQRS with dot net services presentation.CQRS with dot net services presentation.
CQRS with dot net services presentation.
 

Recently uploaded

React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 

Recently uploaded (20)

React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 

Introduction To Algebird: Abstract algebra for analytics

  • 1. Presented By : Simarpreet Kaur Monga Software Consultant Knoldus Software LLP Introduction To Algebird: Abstract algebra for analytics
  • 2. Agenda ● What is Algebird ? ● Problems in computation in Big Data Analysis ● Advantages of Abstract Algebra in computing world. ● Why Algebird ? ● Projects using Algebird ● Example Usage - Counting With Algebird and Spark ● Demo ● References
  • 3. What is Algebird ? ● Algebird is a library which provides abstractions for abstract algebra in the Scala programming language. ● Twitter recently open-sourced Algebird, which provides you with a JVM library to work with algebraic data structures ● Note that algebird was originally developed to support scalding, which is Twitter's Scala library for Hadoop.
  • 4. Problems in Computation in Big Data Analysis ➢ How do you combine or add 2 instances of complex data structure in an easy way ? val first = BloomFilter(...) val second = BloomFilter(...) first + second == uh?
  • 5. Problems in Computation in Big Data Analysis ➢ What about performing other operations on those Bloom filter instances, notably data processing pipelines based on common functions such as map, flatMap, foldLeft, reduceLeft? val filters = Seq[BloomFilter](...) val summary = filters flatMap { /* magic happens here */ } reduceLeft { /* more magic */ } ...
  • 6. Discussion involing Ted Dunning of MapR and his work on a new data structure called t- digest:
  • 7. Why was Ted being asked whether t-digest is associative? And how does all this relate to semigroups and monoids?
  • 8. Advantages Of Abstract Algebra in Computing World ● Abstract algebra allows you to write reusable code. ● Data analysis involves lots of concepts that can be described using abstract algebra. ● Using abstract algebra abstractions (specifically commutative operations) allows you to merge distributed computations without concern for the order in which they are merged.
  • 9. Why Algebird ? If you can turn a data structure into a monoid (or semigroup, or …), then Algebird allows you to put it to good use. You can then work with your data structure just as nicely as you are so used to when dealing with Int, Double or List. And you can use it with large-scale data processing tools such as Hadoop and Storm, too.
  • 10. Monoids & Monads As a grossly simplified rule of thumb: ➢ Monoid: If you want to “attach” operations such as +, -, *, / or <= to data objects – say, adding two Bloom filters – then you want to provide monoid forms for those data objects (e.g. a monoid for your Bloom filter data structure). This way you can combine and juggle your custom data structures just like you would do with plain integer numbers.
  • 11. ➢ Monad: If you want to create data processing pipelines that turn data objects step-by-step into the desired, final output (e.g. aggregating raw records into summary statistics), then you want to build one or more monads to model these data pipelines. Particularly if you want to run those pipelines in large-scala data processing platforms such as Hadoop or Storm.
  • 13. Counting With AlgeBird and Spark Summing Integers
  • 14. Counting With Algebird and Spark Summing Integers 1 + 2 + 3 + 4 + ….. + 30
  • 15. Counting With Algebird and Spark Summing Integers 1 + 2 + 3 + 4 + ….. + 30 (1 + 2) + (3 + 4) + ….. + (28 + 29 + 30)
  • 16. Counting With Algebird and Spark Summing Integers 1 + 2 + 3 + 4 + ….. + 30 (1 + 2) + (3 + 4) + ….. + (28 + 29 + 30)
  • 17. Counting With Algebird and Spark Algebraic Properties Of Integers and Addition Associative: (a + b) + c = a + (b + c) <= can be partitioned/parallelized Closed: Int + Int = Int <= reduce in multiple stages Identity: a + 0 = a <= support “empty”/”zero” items
  • 18. Counting With Algebird and Spark Algebraic Properties Of Integers and Addition Associative: (a + b) + c = a + (b + c) <= can be partitioned/parallelized Closed: Int + Int = Int <= reduce in multiple stages Identity: a + 0 = a <= support “empty”/”zero” items Integers form a Monoid under Addition
  • 19. Counting With Algebird and Spark Algebraic Properties Of Integers and Addition Associative: (a + b) + c = a + (b + c) <= can be partitioned/parallelized Closed: Int + Int = Int <= reduce in multiple stages Identity: a + 0 = a <= support “empty”/”zero” items Integers form a Monoid under Addition Monoid computations can be effectively run on Spark
  • 20. DEMO