SlideShare a Scribd company logo
1 of 12
Download to read offline
val ScAlH2O =
Scala ++ H2O
San Francisco Data Science
Why Scala & H2O ?
●

H 2 O ~ fa s t, d is trib u te d , la rg e s c a le c o m p
p la tfo rm p ro v id in g ric h J a v a A P I
–

B u t lo w -le v e l a n d fo r m a n y u s e r s t o o c o m p li c a t e d
public class ShuffleTask extends MRTask2<ShuffleTask> {
@Override public void map(Chunk ic, Chunk oc) {
if (ic._len==0) return;
// Each vector is shuffled in the same way
Random rng = Utils.getRNG(0xe031e74f321f7e29L + (ic.cidx() << 32L));
oc.set0(0,ic.at0(0));
for (int row=1; row<ic._len; row++) {
int j = rng.nextInt(row+1); // inclusive upper bound <0,row>
if (j!=row) oc.set0(row, oc.at0(j));
oc.set0(j, ic.at0(row));
}
}
}
What we provides
●

ScAlH2O - Scala library providing a DSL
–
–

Easy data manipulation and distributed computation

–

●

Abstracting of H2O low-level API
BUT still inside JVM

Scala REPL integration into H2O
–

Console for experimenting with ScAlH2O
Basic concepts
●

First-class entities
–

Scalars

–

Frames

●

Scala expressions

●

Access to H2O aglos
–

And still preserving access to low-level H2O API)
Frame operations
●

Parse data

●

Basic slicing
–

val f = parse("smalldata/cars.csv")

val f1 = f("name") ++ f(*, 5 to 7)

Column/Rows selectors, append

●

Scalar operations

●

Support head/tail/ncols/nrows/...

●

Cooperation with H2O distributed KV store
–

Load/save operations

val f2 = f1("year") + 1900

val g = load("cars.hex")
val g1 = g ++ g("year") > 80
save("cars.hex", g1)
Map/filter/collect operations
●

M ap
–

P e r v a lu e /r o w

●

F ilte r

●

C o lle c t

// Returns a boolean vector
val fm = f map ( new FAOp {
def apply(rhs: Array[scala.Double]):Boolean = rhs(2) > 4;
});

// Collect all cars with more than 4 cylinders
val ff = f filter ( new FAOp {
def apply(rhs: Array[scala.Double]):Boolean = rhs(2) > 4;
});

// Compute sum of 2. column
val fc = f collect ( 0.0, new CDOp() {
def apply(acc:scala.Double rhs:Array[scala.Double]) =
acc + rhs(2)
def reduce(l:scala.Double,r:scala.Double) = l+r
} )
Internals
It's magic
Internals
●

No magic, BUT there are key-tricks
–

connect H2O classloaders with Scala ecosystem
●

–

Make sure that all distrib. objects are correctly iced

make translation of Scala code into calls of Java API
●

●

Pass operations around t he cloud

●

–

Create H2O MR tasks
Create new frames

preserve primitives types
●

do not introduce overhead of boxing/unboxing
Internals – translation to H2O MR tasks
// Collect all cars with more than 4 cylinders
val f5 = f filter ( new FAOp {
def apply(rhs: Array[scala.Double]):Boolean = rhs(2) > 4;
});

T_A2B_Transf has to be water.Freezable
def filter(af: T_A2B_Transf[scala.Double]):T = {
val f = frame()
val mrt = new MRTask2() {
override def map(in:Array[Chunk], out:Array[NewChunk]) = {
val rlen = in(0)._len
val tmprow = new Array[scala.Double](in.length)
for (row:Int <- 0 until rlen ) {
if (af(Utils.readRow(in,row,tmprow))) {
for (i:Int <- 0 until in.length) out(i).addNum(tmprow(i))
}
}
}
}
mrt.doAll(f.numCols(), f)
val result = mrt.outputFrame(f.names(), f.domains())
apply(result) // return the DFrame
}
Party demo time!
Towards Scalding-like API
●

V is io n is to p ro v id e S c a ld in g -lik e s y n ta x
Input scheme

Output scheme

f map ( ('name, 'cylinders) -> ('name, 'moreThan4) )
{ (n:String, c:Int) => (n, if (c>4) 1 else 0) }

●

B u t s o fa r D S L is s till u g ly
f map (f, ('name, 'cylinders) -> ('name, 'moreThan4) )
{ new IcedFunctor2to2[Double,Int,Double,Int] {
def apply(n:Double, c:Int) = (n, if (c>4) 1 else 0) }
}

Transformation
Try and contribute !

> git clone git@github.com:0xdata/h2o.git
> git checkout -b h2oscala origin/h2oscala
> cd h2o-scala && ./depl.sh # or sbt compile
=== Welcome to the world of ScAlH2O ===
Type `help` or `example` to begin...
h2o>

Thank you!

More Related Content

What's hot

StrataGEM: A Generic Petri Net Verification Framework
StrataGEM: A Generic Petri Net Verification FrameworkStrataGEM: A Generic Petri Net Verification Framework
StrataGEM: A Generic Petri Net Verification FrameworkEdmundo López Bóbeda
 
Pert management
Pert management Pert management
Pert management Ahmed Gamal
 
Ece512 h1 20139_621386735458ece512_test2_solutions
Ece512 h1 20139_621386735458ece512_test2_solutionsEce512 h1 20139_621386735458ece512_test2_solutions
Ece512 h1 20139_621386735458ece512_test2_solutionsnadia abd
 
Generating and Analyzing Events
Generating and Analyzing EventsGenerating and Analyzing Events
Generating and Analyzing Eventsztellman
 
Coq for ML users
Coq for ML usersCoq for ML users
Coq for ML userstmiya
 
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/Theta
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/ThetaAlgorithm analysis basics - Seven Functions/Big-Oh/Omega/Theta
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/ThetaPriyanka Rana
 
Call report from x++
Call report from x++Call report from x++
Call report from x++Ahmed Farag
 
13. quadratic formtemplatetouchpad
13. quadratic formtemplatetouchpad13. quadratic formtemplatetouchpad
13. quadratic formtemplatetouchpadMedia4math
 
Contrastive Divergence Learning
Contrastive Divergence LearningContrastive Divergence Learning
Contrastive Divergence Learningpenny 梁斌
 
OpenTuesday: Neues aus der RRDtool Welt
OpenTuesday: Neues aus der RRDtool WeltOpenTuesday: Neues aus der RRDtool Welt
OpenTuesday: Neues aus der RRDtool WeltDigicomp Academy AG
 
Connected hubs: an analysis of the Lufthansa network in Europe
Connected hubs: an analysis of the Lufthansa network in EuropeConnected hubs: an analysis of the Lufthansa network in Europe
Connected hubs: an analysis of the Lufthansa network in EuropeSau Yee Chan
 

What's hot (20)

StrataGEM: A Generic Petri Net Verification Framework
StrataGEM: A Generic Petri Net Verification FrameworkStrataGEM: A Generic Petri Net Verification Framework
StrataGEM: A Generic Petri Net Verification Framework
 
Pert management
Pert management Pert management
Pert management
 
Ece512 h1 20139_621386735458ece512_test2_solutions
Ece512 h1 20139_621386735458ece512_test2_solutionsEce512 h1 20139_621386735458ece512_test2_solutions
Ece512 h1 20139_621386735458ece512_test2_solutions
 
Matematica CS-GOTHIC
Matematica CS-GOTHICMatematica CS-GOTHIC
Matematica CS-GOTHIC
 
Virtual reality
Virtual realityVirtual reality
Virtual reality
 
Generating and Analyzing Events
Generating and Analyzing EventsGenerating and Analyzing Events
Generating and Analyzing Events
 
Rumus vb
Rumus vbRumus vb
Rumus vb
 
Coq for ML users
Coq for ML usersCoq for ML users
Coq for ML users
 
Arrays
ArraysArrays
Arrays
 
Lec20 dimension1
Lec20 dimension1Lec20 dimension1
Lec20 dimension1
 
Google V8 engine
Google V8 engineGoogle V8 engine
Google V8 engine
 
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/Theta
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/ThetaAlgorithm analysis basics - Seven Functions/Big-Oh/Omega/Theta
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/Theta
 
Push down automata
Push down automataPush down automata
Push down automata
 
Distributed DBMS - Unit - 4 - Data Distribution Alternatives
Distributed DBMS - Unit - 4 - Data Distribution Alternatives Distributed DBMS - Unit - 4 - Data Distribution Alternatives
Distributed DBMS - Unit - 4 - Data Distribution Alternatives
 
Call report from x++
Call report from x++Call report from x++
Call report from x++
 
13. quadratic formtemplatetouchpad
13. quadratic formtemplatetouchpad13. quadratic formtemplatetouchpad
13. quadratic formtemplatetouchpad
 
Contrastive Divergence Learning
Contrastive Divergence LearningContrastive Divergence Learning
Contrastive Divergence Learning
 
OpenTuesday: Neues aus der RRDtool Welt
OpenTuesday: Neues aus der RRDtool WeltOpenTuesday: Neues aus der RRDtool Welt
OpenTuesday: Neues aus der RRDtool Welt
 
Faster Python, FOSDEM
Faster Python, FOSDEMFaster Python, FOSDEM
Faster Python, FOSDEM
 
Connected hubs: an analysis of the Lufthansa network in Europe
Connected hubs: an analysis of the Lufthansa network in EuropeConnected hubs: an analysis of the Lufthansa network in Europe
Connected hubs: an analysis of the Lufthansa network in Europe
 

Similar to Michal Malohlava presents: Open Source H2O and Scala

Introduction to Functional Programming with Haskell and JavaScript
Introduction to Functional Programming with Haskell and JavaScriptIntroduction to Functional Programming with Haskell and JavaScript
Introduction to Functional Programming with Haskell and JavaScriptWill Kurt
 
talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013ericupnorth
 
[FT-11][suhorng] “Poor Man's” Undergraduate Compilers
[FT-11][suhorng] “Poor Man's” Undergraduate Compilers[FT-11][suhorng] “Poor Man's” Undergraduate Compilers
[FT-11][suhorng] “Poor Man's” Undergraduate CompilersFunctional Thursday
 
Being functional in PHP (PHPDay Italy 2016)
Being functional in PHP (PHPDay Italy 2016)Being functional in PHP (PHPDay Italy 2016)
Being functional in PHP (PHPDay Italy 2016)David de Boer
 
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)Menlo Systems GmbH
 
Go Says WAT?
Go Says WAT?Go Says WAT?
Go Says WAT?jonbodner
 
Spark by Adform Research, Paulius
Spark by Adform Research, PauliusSpark by Adform Research, Paulius
Spark by Adform Research, PauliusVasil Remeniuk
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with ClojureDmitry Buzdin
 
All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2goMoriyoshi Koizumi
 
When RV Meets CEP (RV 2016 Tutorial)
When RV Meets CEP (RV 2016 Tutorial)When RV Meets CEP (RV 2016 Tutorial)
When RV Meets CEP (RV 2016 Tutorial)Sylvain Hallé
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Data Con LA
 

Similar to Michal Malohlava presents: Open Source H2O and Scala (20)

Spark workshop
Spark workshopSpark workshop
Spark workshop
 
Introduction to Functional Programming with Haskell and JavaScript
Introduction to Functional Programming with Haskell and JavaScriptIntroduction to Functional Programming with Haskell and JavaScript
Introduction to Functional Programming with Haskell and JavaScript
 
talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013
 
Full Stack Clojure
Full Stack ClojureFull Stack Clojure
Full Stack Clojure
 
[FT-11][suhorng] “Poor Man's” Undergraduate Compilers
[FT-11][suhorng] “Poor Man's” Undergraduate Compilers[FT-11][suhorng] “Poor Man's” Undergraduate Compilers
[FT-11][suhorng] “Poor Man's” Undergraduate Compilers
 
Being functional in PHP (PHPDay Italy 2016)
Being functional in PHP (PHPDay Italy 2016)Being functional in PHP (PHPDay Italy 2016)
Being functional in PHP (PHPDay Italy 2016)
 
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)
 
Go Says WAT?
Go Says WAT?Go Says WAT?
Go Says WAT?
 
Spark by Adform Research, Paulius
Spark by Adform Research, PauliusSpark by Adform Research, Paulius
Spark by Adform Research, Paulius
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
 
C++11
C++11C++11
C++11
 
All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2go
 
Seminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeSeminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mme
 
When RV Meets CEP (RV 2016 Tutorial)
When RV Meets CEP (RV 2016 Tutorial)When RV Meets CEP (RV 2016 Tutorial)
When RV Meets CEP (RV 2016 Tutorial)
 
Scala by Luc Duponcheel
Scala by Luc DuponcheelScala by Luc Duponcheel
Scala by Luc Duponcheel
 
Spark_Documentation_Template1
Spark_Documentation_Template1Spark_Documentation_Template1
Spark_Documentation_Template1
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
 
R meets Hadoop
R meets HadoopR meets Hadoop
R meets Hadoop
 

More from Sri Ambati

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thSri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMsSri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the WaySri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 

More from Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Recently uploaded

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Michal Malohlava presents: Open Source H2O and Scala

  • 1. val ScAlH2O = Scala ++ H2O San Francisco Data Science
  • 2. Why Scala & H2O ? ● H 2 O ~ fa s t, d is trib u te d , la rg e s c a le c o m p p la tfo rm p ro v id in g ric h J a v a A P I – B u t lo w -le v e l a n d fo r m a n y u s e r s t o o c o m p li c a t e d public class ShuffleTask extends MRTask2<ShuffleTask> { @Override public void map(Chunk ic, Chunk oc) { if (ic._len==0) return; // Each vector is shuffled in the same way Random rng = Utils.getRNG(0xe031e74f321f7e29L + (ic.cidx() << 32L)); oc.set0(0,ic.at0(0)); for (int row=1; row<ic._len; row++) { int j = rng.nextInt(row+1); // inclusive upper bound <0,row> if (j!=row) oc.set0(row, oc.at0(j)); oc.set0(j, ic.at0(row)); } } }
  • 3. What we provides ● ScAlH2O - Scala library providing a DSL – – Easy data manipulation and distributed computation – ● Abstracting of H2O low-level API BUT still inside JVM Scala REPL integration into H2O – Console for experimenting with ScAlH2O
  • 4. Basic concepts ● First-class entities – Scalars – Frames ● Scala expressions ● Access to H2O aglos – And still preserving access to low-level H2O API)
  • 5. Frame operations ● Parse data ● Basic slicing – val f = parse("smalldata/cars.csv") val f1 = f("name") ++ f(*, 5 to 7) Column/Rows selectors, append ● Scalar operations ● Support head/tail/ncols/nrows/... ● Cooperation with H2O distributed KV store – Load/save operations val f2 = f1("year") + 1900 val g = load("cars.hex") val g1 = g ++ g("year") > 80 save("cars.hex", g1)
  • 6. Map/filter/collect operations ● M ap – P e r v a lu e /r o w ● F ilte r ● C o lle c t // Returns a boolean vector val fm = f map ( new FAOp { def apply(rhs: Array[scala.Double]):Boolean = rhs(2) > 4; }); // Collect all cars with more than 4 cylinders val ff = f filter ( new FAOp { def apply(rhs: Array[scala.Double]):Boolean = rhs(2) > 4; }); // Compute sum of 2. column val fc = f collect ( 0.0, new CDOp() { def apply(acc:scala.Double rhs:Array[scala.Double]) = acc + rhs(2) def reduce(l:scala.Double,r:scala.Double) = l+r } )
  • 8. Internals ● No magic, BUT there are key-tricks – connect H2O classloaders with Scala ecosystem ● – Make sure that all distrib. objects are correctly iced make translation of Scala code into calls of Java API ● ● Pass operations around t he cloud ● – Create H2O MR tasks Create new frames preserve primitives types ● do not introduce overhead of boxing/unboxing
  • 9. Internals – translation to H2O MR tasks // Collect all cars with more than 4 cylinders val f5 = f filter ( new FAOp { def apply(rhs: Array[scala.Double]):Boolean = rhs(2) > 4; }); T_A2B_Transf has to be water.Freezable def filter(af: T_A2B_Transf[scala.Double]):T = { val f = frame() val mrt = new MRTask2() { override def map(in:Array[Chunk], out:Array[NewChunk]) = { val rlen = in(0)._len val tmprow = new Array[scala.Double](in.length) for (row:Int <- 0 until rlen ) { if (af(Utils.readRow(in,row,tmprow))) { for (i:Int <- 0 until in.length) out(i).addNum(tmprow(i)) } } } } mrt.doAll(f.numCols(), f) val result = mrt.outputFrame(f.names(), f.domains()) apply(result) // return the DFrame }
  • 11. Towards Scalding-like API ● V is io n is to p ro v id e S c a ld in g -lik e s y n ta x Input scheme Output scheme f map ( ('name, 'cylinders) -> ('name, 'moreThan4) ) { (n:String, c:Int) => (n, if (c>4) 1 else 0) } ● B u t s o fa r D S L is s till u g ly f map (f, ('name, 'cylinders) -> ('name, 'moreThan4) ) { new IcedFunctor2to2[Double,Int,Double,Int] { def apply(n:Double, c:Int) = (n, if (c>4) 1 else 0) } } Transformation
  • 12. Try and contribute ! > git clone git@github.com:0xdata/h2o.git > git checkout -b h2oscala origin/h2oscala > cd h2o-scala && ./depl.sh # or sbt compile === Welcome to the world of ScAlH2O === Type `help` or `example` to begin... h2o> Thank you!