Smart Scalable Feature Reduction with Random Forests with Erik Erlandson

Databricks
DatabricksDeveloper Marketing and Relations at MuleSoft
Erik Erlandson
Red Hat, Inc.
Smart Scalable Feature
Reduction with Random
Forests
Erik Erlandson
• Software Engineer
• Radanalytics.io community
• Apache Spark on OpenShift
• Intelligent Applications in the cloud
Talk
• Motivate Feature Reduction
• Random Forest Clustering
• T-Digest Feature Sketching
• RF Feature Reduction
• Example: Tox21 Assay Data
Features
2.3 1.0 0.0 1.0 3.1 4.2 6.9 0.0 7.3
Model
Training
Evaluation
R
esults
Measurable
Properties!
Feature Reduction
Full Feature Set
Identify Useful
Features
Reduced
Feature Set
Feature Sets Can Be Very Large
hundreds
thousands
...
millions
Features Cost Resources
Memory
Disk
Network
Time
Energy
Features Inject Noise
Model Training
Without Reduction
With Feature
Reduction
Features Impact Model Size
Model Without Reduction
With Reduction
Representation & Transfer Learning
Random Forests
Leo Breiman (2001)
Ensemble of Decision Tree Models
Each tree trains on random subset of data
Each split considers random subset of features
Random Forest Clustering
F1
F2
.
.
.
Fm
Features
RF model
34
12
.
.
.
73
Leaf IDs34
12
73
Cluster
these !
Leaf Node IDs
2 Key Benefits of RF Clustering
Features Used
by RF Model
<< Full Feature Set
RF Training ignores unhelpful features
Data with a Joint Distribution in R^2
Data with Synthetic
RF Rules for Data (non-synthetic)
List((x2 <= -1.32), (x1 <= 0.87))
List((x1 > -1.37), (x2 > 1.03))
List((x2 <= 2.09), (x1 <= 0.87))
List((x1 <= 2.13), (x2 <= -1.32))
List((x2 <= -2.31), (x1 <= 0.87))
X1 > -1.37
X2>1.03
RF Rules in Feature Space
What Features Did the RF Use?
List((x2 <= -1.32), (x1 <= 0.87))
List((x1 > -1.37), (x2 > 1.03))
List((x2 <= 2.09), (x1 <= 0.87))
List((x1 <= 2.13), (x2 <= -1.32))
List((x2 <= -2.31), (x1 <= 0.87))
reduced = {“x1”, “x2”} X1 > -1.37
X2>1.03
T-Digest Sketches a Distribution
3.4
6.0
2.5
⋮
Sketch of
CDF
P(X <= x)
X
Data Domain
Inverse Transform Sampling
Sample U[0,1] => q
x where CDF(x) = q
0
1
CDF
T-Digests Can Aggregate
P1
P2
Pn
|+|
Data in Spark t-digests
result
Map
Sketching a Feature
feature.aggregate(TDigest.empty())(
(td, x) => td + x,
(td1, td2) => td1 ++ td2
)
Synthesizing Data from TDigests
def synthesize(tdVec: Vector[TDigest],
n: Int) = {
val tdVecBC = sc.broadcast(tdVec)
sc.parallelize(1 to n).map { _ =>
tdVecBC.value.map(_.sample)
}
}
Random Forest Training Data
val fvSketches = sketchFV(trainFV)
val synthFV = synthesize(fvSketches, 48000)
val trainLab = trainFV.map(_.toLabeledPoint(1.0))
val synthLab = synthFV.map(_.toLabeledPoint(0.0))
val trainFR = trainLab ++ synthLab
Random Forest Feature Reduction
{“f1”, “f2”, … }
National Institute of Health (2014)
12 Toxicity Assays
12060 compounds + 647 hold-out
https://tripod.nih.gov/tox21/challenge/index.jsp
Tox21 Data Challenge
DeepTox
Johannes Kepler University Linz
Institute of Bioinformatics
http://bioinf.jku.at/research/DeepTox/tox21.html
[Mayr2016] Mayr, A., Klambauer, G., Unterthiner, T., & Hochreiter, S. (2016). DeepTox: Toxicity Prediction using Deep
Learning. Frontiers in Environmental Science, 3:80.
[Huang2016] Huang, R., Xia, M., Nguyen, D. T., Zhao, T., Sakamuru, S., Zhao, J., Shahane, S., Rossoshek, A., & Simeonov,
A. (2016). Tox21Challenge to build predictive models of nuclear receptor and stress response pathways as mediated by
exposure to environmental chemicals and drugs. Frontiers in Environmental Science, 3:85.
Tox21 Data
801 Dense Features
272K Sparse Features
Each assay represented on a different subset
+---------------+------+-----+---------+------------+-----
| compound|NR.AhR|NR.AR|NR.AR.LBD|NR.Aromatase| ...
+---------------+------+-----+---------+------------+-----
|NCGC00261900-01| 0| 1| NA| 0|
|NCGC00260869-01| 0| 1| NA| NA| .
|NCGC00261776-01| 1| 1| 0| NA| .
|NCGC00261380-01| NA| 0| NA| 1| .
|NCGC00261842-01| 0| 0| 0| NA|
|NCGC00261662-01| 1| 0| 0| NA|
|NCGC00261190-01| NA| 0| 0| NA|
I used these
Experiment
Train models on all 12 assays
Perform Random Forest Feature Reduction
Train similar models on reduced feature set
Compare models on each assay
85 of 801 Features Were Used
RNCS 21
MRVSA7 20
VSAEstate2 19
VSAEstate3 18
slogPVSA8 18
VSAEstate0 17
slogPVSA6 16
RDFM29 12
slogPVSA3 12
RDFM30 12
Features
Numbertrees used
Full vs Reduced (Logistic Reg)
Full vs Reduced (Boosted DTE)
Full vs Reduced (SVM)
Training Times
(times in seconds) Full (801) Reduced (85)
Logistic Regression 68.5 46.8
SVM 35.3 33.8
GB Tree Ensemble 247 65.0
Evaluation Times
(times in seconds) Full (801) Reduced (85)
Logistic Regression 32.1 3.88
SVM 0.59 0.23
GB Tree Ensemble 1.33 0.88
Erik Erlandson
eje@redhat.com
@manyangled
https://github.com/erikerlandson/feature-reduction-talk
Thank You
1 of 36

Recommended

Going Smart and Deep on Materials at ALCF by
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFIan Foster
1.7K views42 slides
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph... by
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Ian Foster
347 views37 slides
Big Data for Big Discoveries by
Big Data for Big DiscoveriesBig Data for Big Discoveries
Big Data for Big DiscoveriesGovnet Events
441 views18 slides
Autonomous experimental phase diagram acquisition by
Autonomous experimental phase diagram acquisitionAutonomous experimental phase diagram acquisition
Autonomous experimental phase diagram acquisitionaimsnist
202 views20 slides
A Study of Priors for Relevance-Based Language Modelling of Recommender Syste... by
A Study of Priors for Relevance-Based Language Modelling of Recommender Syste...A Study of Priors for Relevance-Based Language Modelling of Recommender Syste...
A Study of Priors for Relevance-Based Language Modelling of Recommender Syste...Daniel Valcarce
334 views1 slide
Object extraction from satellite imagery using deep learning by
Object extraction from satellite imagery using deep learningObject extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningAly Abdelkareem
5.1K views27 slides

More Related Content

What's hot

Morgan uw maGIV v1.3 dist by
Morgan uw maGIV v1.3 distMorgan uw maGIV v1.3 dist
Morgan uw maGIV v1.3 distddm314
2.7K views27 slides
(Paper Review)A versatile learning based 3D temporal tracker - scalable, robu... by
(Paper Review)A versatile learning based 3D temporal tracker - scalable, robu...(Paper Review)A versatile learning based 3D temporal tracker - scalable, robu...
(Paper Review)A versatile learning based 3D temporal tracker - scalable, robu...MYEONGGYU LEE
114 views27 slides
Sentiment Knowledge Discovery in Twitter Streaming Data by
Sentiment Knowledge Discovery in Twitter Streaming DataSentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataAlbert Bifet
984 views29 slides
Visualization-Driven Data Aggregation by
Visualization-Driven Data AggregationVisualization-Driven Data Aggregation
Visualization-Driven Data AggregationZbigniew Jerzak
1.7K views15 slides
Moa: Real Time Analytics for Data Streams by
Moa: Real Time Analytics for Data StreamsMoa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsAlbert Bifet
27.7K views25 slides
How might machine learning help advance solar PV research? by
How might machine learning help advance solar PV research?How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?Anubhav Jain
474 views54 slides

What's hot(20)

Morgan uw maGIV v1.3 dist by ddm314
Morgan uw maGIV v1.3 distMorgan uw maGIV v1.3 dist
Morgan uw maGIV v1.3 dist
ddm3142.7K views
(Paper Review)A versatile learning based 3D temporal tracker - scalable, robu... by MYEONGGYU LEE
(Paper Review)A versatile learning based 3D temporal tracker - scalable, robu...(Paper Review)A versatile learning based 3D temporal tracker - scalable, robu...
(Paper Review)A versatile learning based 3D temporal tracker - scalable, robu...
MYEONGGYU LEE114 views
Sentiment Knowledge Discovery in Twitter Streaming Data by Albert Bifet
Sentiment Knowledge Discovery in Twitter Streaming DataSentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming Data
Albert Bifet984 views
Visualization-Driven Data Aggregation by Zbigniew Jerzak
Visualization-Driven Data AggregationVisualization-Driven Data Aggregation
Visualization-Driven Data Aggregation
Zbigniew Jerzak1.7K views
Moa: Real Time Analytics for Data Streams by Albert Bifet
Moa: Real Time Analytics for Data StreamsMoa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data Streams
Albert Bifet27.7K views
How might machine learning help advance solar PV research? by Anubhav Jain
How might machine learning help advance solar PV research?How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?
Anubhav Jain474 views
Overview of DuraMat software tool development by Anubhav Jain
Overview of DuraMat software tool developmentOverview of DuraMat software tool development
Overview of DuraMat software tool development
Anubhav Jain147 views
Software Tools, Methods and Applications of Machine Learning in Functional Ma... by Anubhav Jain
Software Tools, Methods and Applications of Machine Learning in Functional Ma...Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Anubhav Jain449 views
DuraMat Data Analytics by Anubhav Jain
DuraMat Data AnalyticsDuraMat Data Analytics
DuraMat Data Analytics
Anubhav Jain153 views
Using HOG Descriptors on Superpixels for Human Detection of UAV Imagery by Wai Nwe Tun
Using HOG Descriptors on Superpixels for Human Detection of UAV ImageryUsing HOG Descriptors on Superpixels for Human Detection of UAV Imagery
Using HOG Descriptors on Superpixels for Human Detection of UAV Imagery
Wai Nwe Tun189 views
AI and Deep Learning for On-Board Satellite Image Analysis, OW2con'19, June 1... by OW2
AI and Deep Learning for On-Board Satellite Image Analysis, OW2con'19, June 1...AI and Deep Learning for On-Board Satellite Image Analysis, OW2con'19, June 1...
AI and Deep Learning for On-Board Satellite Image Analysis, OW2con'19, June 1...
OW2816 views
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Data by aimsnist
Automated Generation of High-accuracy Interatomic Potentials Using Quantum DataAutomated Generation of High-accuracy Interatomic Potentials Using Quantum Data
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Data
aimsnist429 views
SDOBenchmark - a machine learning image dataset for the prediction of solar f... by Roman Bolzern
SDOBenchmark - a machine learning image dataset for the prediction of solar f...SDOBenchmark - a machine learning image dataset for the prediction of solar f...
SDOBenchmark - a machine learning image dataset for the prediction of solar f...
Roman Bolzern290 views
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications by aimsnist
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
aimsnist192 views
Automatic Features Generation And Model Training On Spark: A Bayesian Approach by Spark Summit
Automatic Features Generation And Model Training On Spark: A Bayesian ApproachAutomatic Features Generation And Model Training On Spark: A Bayesian Approach
Automatic Features Generation And Model Training On Spark: A Bayesian Approach
Spark Summit11.2K views
Big Data, Big Computing, AI, and Environmental Science by Ian Foster
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
Ian Foster126 views
Applications of Machine Learning for Materials Discovery at NREL by aimsnist
Applications of Machine Learning for Materials Discovery at NRELApplications of Machine Learning for Materials Discovery at NREL
Applications of Machine Learning for Materials Discovery at NREL
aimsnist286 views
The Matsu Project - Open Source Software for Processing Satellite Imagery Data by Robert Grossman
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
Robert Grossman3.1K views
Smart Metrics for High Performance Material Design by aimsnist
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material Design
aimsnist130 views

Similar to Smart Scalable Feature Reduction with Random Forests with Erik Erlandson

One-Pass Data Science In Apache Spark With Generative T-Digests with Erik Erl... by
One-Pass Data Science In Apache Spark With Generative T-Digests with Erik Erl...One-Pass Data Science In Apache Spark With Generative T-Digests with Erik Erl...
One-Pass Data Science In Apache Spark With Generative T-Digests with Erik Erl...Spark Summit
840 views33 slides
mlp10-sem2.pdf by
mlp10-sem2.pdfmlp10-sem2.pdf
mlp10-sem2.pdfShamKumar65
1 view17 slides
Landuse Classification from Satellite Imagery using Deep Learning by
Landuse Classification from Satellite Imagery using Deep LearningLanduse Classification from Satellite Imagery using Deep Learning
Landuse Classification from Satellite Imagery using Deep LearningDataWorks Summit
1.7K views46 slides
Large scale landuse classification of satellite imagery by
Large scale landuse classification of satellite imageryLarge scale landuse classification of satellite imagery
Large scale landuse classification of satellite imagerySuneel Marthi
341 views46 slides
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305 by
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305mjfrankli
2.7K views38 slides
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012 by
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012Amazon Web Services
2.9K views38 slides

Similar to Smart Scalable Feature Reduction with Random Forests with Erik Erlandson(20)

One-Pass Data Science In Apache Spark With Generative T-Digests with Erik Erl... by Spark Summit
One-Pass Data Science In Apache Spark With Generative T-Digests with Erik Erl...One-Pass Data Science In Apache Spark With Generative T-Digests with Erik Erl...
One-Pass Data Science In Apache Spark With Generative T-Digests with Erik Erl...
Spark Summit840 views
Landuse Classification from Satellite Imagery using Deep Learning by DataWorks Summit
Landuse Classification from Satellite Imagery using Deep LearningLanduse Classification from Satellite Imagery using Deep Learning
Landuse Classification from Satellite Imagery using Deep Learning
DataWorks Summit1.7K views
Large scale landuse classification of satellite imagery by Suneel Marthi
Large scale landuse classification of satellite imageryLarge scale landuse classification of satellite imagery
Large scale landuse classification of satellite imagery
Suneel Marthi341 views
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305 by mjfrankli
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
mjfrankli2.7K views
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012 by Amazon Web Services
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
Amazon Web Services2.9K views
Dictionary Learning for Massive Matrix Factorization by Arthur Mensch
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
Arthur Mensch498 views
R Analytics in the Cloud by DataMine Lab
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the Cloud
DataMine Lab4.3K views
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B... by Databricks
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
Databricks2.5K views
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age by batchinsights
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data AgeSpark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
batchinsights980 views
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age by batchinsights
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data AgeSpark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
batchinsights277 views
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"... by Dataconomy Media
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"..."Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...
Dataconomy Media1.3K views
Introduction to Deep Learning, Keras, and TensorFlow by Sri Ambati
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlow
Sri Ambati2.1K views
Introduction to Deep Learning, Keras, and Tensorflow by Oswald Campesato
Introduction to Deep Learning, Keras, and TensorflowIntroduction to Deep Learning, Keras, and Tensorflow
Introduction to Deep Learning, Keras, and Tensorflow
Oswald Campesato397 views

More from Databricks

DW Migration Webinar-March 2022.pptx by
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
4.3K views25 slides
Data Lakehouse Symposium | Day 1 | Part 1 by
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
1.5K views43 slides
Data Lakehouse Symposium | Day 1 | Part 2 by
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
743 views16 slides
Data Lakehouse Symposium | Day 4 by
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
1.8K views74 slides
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop by
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
6.3K views64 slides
Democratizing Data Quality Through a Centralized Platform by
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
1.4K views36 slides

More from Databricks(20)

DW Migration Webinar-March 2022.pptx by Databricks
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks4.3K views
Data Lakehouse Symposium | Day 1 | Part 1 by Databricks
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks1.5K views
Data Lakehouse Symposium | Day 1 | Part 2 by Databricks
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks743 views
Data Lakehouse Symposium | Day 4 by Databricks
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks1.8K views
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop by Databricks
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks6.3K views
Democratizing Data Quality Through a Centralized Platform by Databricks
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks1.4K views
Learn to Use Databricks for Data Science by Databricks
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks1.6K views
Why APM Is Not the Same As ML Monitoring by Databricks
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks743 views
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix by Databricks
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks689 views
Stage Level Scheduling Improving Big Data and AI Integration by Databricks
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks850 views
Simplify Data Conversion from Spark to TensorFlow and PyTorch by Databricks
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks1.8K views
Scaling your Data Pipelines with Apache Spark on Kubernetes by Databricks
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks2.1K views
Scaling and Unifying SciKit Learn and Apache Spark Pipelines by Databricks
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks667 views
Sawtooth Windows for Feature Aggregations by Databricks
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks606 views
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink by Databricks
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks677 views
Re-imagine Data Monitoring with whylogs and Spark by Databricks
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks551 views
Raven: End-to-end Optimization of ML Prediction Queries by Databricks
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks450 views
Processing Large Datasets for ADAS Applications using Apache Spark by Databricks
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks513 views
Massive Data Processing in Adobe Using Delta Lake by Databricks
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks719 views
Machine Learning CI/CD for Email Attack Detection by Databricks
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks389 views

Recently uploaded

CRM stick or twist.pptx by
CRM stick or twist.pptxCRM stick or twist.pptx
CRM stick or twist.pptxinfo828217
11 views16 slides
Data about the sector workshop by
Data about the sector workshopData about the sector workshop
Data about the sector workshopinfo828217
29 views27 slides
Applied physics letters journal.pdf by
Applied physics letters journal.pdfApplied physics letters journal.pdf
Applied physics letters journal.pdfaqsamukhtiyar88
5 views8 slides
Oral presentation (1).pdf by
Oral presentation (1).pdfOral presentation (1).pdf
Oral presentation (1).pdfreemalmazroui8
5 views10 slides
PRIVACY AWRE PERSONAL DATA STORAGE by
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGEantony420421
7 views56 slides
Listed Instruments Survey 2022.pptx by
Listed Instruments Survey  2022.pptxListed Instruments Survey  2022.pptx
Listed Instruments Survey 2022.pptxsecretariat4
121 views12 slides

Recently uploaded(20)

CRM stick or twist.pptx by info828217
CRM stick or twist.pptxCRM stick or twist.pptx
CRM stick or twist.pptx
info82821711 views
Data about the sector workshop by info828217
Data about the sector workshopData about the sector workshop
Data about the sector workshop
info82821729 views
PRIVACY AWRE PERSONAL DATA STORAGE by antony420421
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGE
antony4204217 views
Listed Instruments Survey 2022.pptx by secretariat4
Listed Instruments Survey  2022.pptxListed Instruments Survey  2022.pptx
Listed Instruments Survey 2022.pptx
secretariat4121 views
Product Research sample.pdf by AllenSingson
Product Research sample.pdfProduct Research sample.pdf
Product Research sample.pdf
AllenSingson33 views
Data Journeys Hard Talk workshop final.pptx by info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821711 views
DGST Methodology Presentation.pdf by maddierlegum
DGST Methodology Presentation.pdfDGST Methodology Presentation.pdf
DGST Methodology Presentation.pdf
maddierlegum7 views
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion by Bertram Ludäscher
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
K-Drama Recommendation Using Python by FridaPutriassa
K-Drama Recommendation Using PythonK-Drama Recommendation Using Python
K-Drama Recommendation Using Python
FridaPutriassa5 views
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language... by patiladiti752
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
patiladiti7528 views
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning by SARADINDU SENGUPTA
GDG Cloud Community Day 2022 -  Managing data quality in Machine LearningGDG Cloud Community Day 2022 -  Managing data quality in Machine Learning
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
AZConf 2023 - Considerations for LLMOps: Running LLMs in production by SARADINDU SENGUPTA
AZConf 2023 - Considerations for LLMOps: Running LLMs in productionAZConf 2023 - Considerations for LLMOps: Running LLMs in production
AZConf 2023 - Considerations for LLMOps: Running LLMs in production

Smart Scalable Feature Reduction with Random Forests with Erik Erlandson