Submit Search
Upload
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher Bradford, Open Source Connections)
•
4 likes
•
2,651 views
Spark Summit
Follow
Presentation at Spark Summit 2015
Read less
Read more
Data & Analytics
Report
Share
Report
Share
1 of 23
Download now
Download to read offline
Recommended
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
Databricks
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...
Databricks
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Databricks
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
Databricks
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
Anyscale
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Databricks
Introduction to Spark Training
Introduction to Spark Training
Spark Summit
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Databricks
Recommended
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
Databricks
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...
Databricks
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Databricks
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
Databricks
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
Anyscale
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Databricks
Introduction to Spark Training
Introduction to Spark Training
Spark Summit
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Databricks
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
Paco Nathan
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Databricks
SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015
Lance Co Ting Keh
Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Databricks
A Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons Learned
Databricks
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
Databricks
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Databricks
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
Databricks
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Spark Summit
Spark r under the hood with Hossein Falaki
Spark r under the hood with Hossein Falaki
Databricks
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Cloudera, Inc.
Introduction to Spark ML
Introduction to Spark ML
Holden Karau
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
Databricks
Operational Tips For Deploying Apache Spark
Operational Tips For Deploying Apache Spark
Databricks
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySpark
Databricks
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine Learning
Paco Nathan
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Databricks
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Databricks
Spark Community Update - Spark Summit San Francisco 2015
Spark Community Update - Spark Summit San Francisco 2015
Databricks
What's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You Care
Databricks
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark Summit
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
Spark Summit
More Related Content
What's hot
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
Paco Nathan
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Databricks
SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015
Lance Co Ting Keh
Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Databricks
A Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons Learned
Databricks
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
Databricks
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Databricks
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
Databricks
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Spark Summit
Spark r under the hood with Hossein Falaki
Spark r under the hood with Hossein Falaki
Databricks
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Cloudera, Inc.
Introduction to Spark ML
Introduction to Spark ML
Holden Karau
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
Databricks
Operational Tips For Deploying Apache Spark
Operational Tips For Deploying Apache Spark
Databricks
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySpark
Databricks
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine Learning
Paco Nathan
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Databricks
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Databricks
Spark Community Update - Spark Summit San Francisco 2015
Spark Community Update - Spark Summit San Francisco 2015
Databricks
What's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You Care
Databricks
What's hot
(20)
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015
Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
A Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons Learned
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Spark r under the hood with Hossein Falaki
Spark r under the hood with Hossein Falaki
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Introduction to Spark ML
Introduction to Spark ML
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
Operational Tips For Deploying Apache Spark
Operational Tips For Deploying Apache Spark
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySpark
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine Learning
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Community Update - Spark Summit San Francisco 2015
Spark Community Update - Spark Summit San Francisco 2015
What's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You Care
Viewers also liked
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark Summit
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
Spark Summit
Open Stack Cheat Sheet V1
Open Stack Cheat Sheet V1
Anuchit Chalothorn
Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5
Haoyuan Li
Linux Filesystems, RAID, and more
Linux Filesystems, RAID, and more
Mark Wong
The Hot Rod Protocol in Infinispan
The Hot Rod Protocol in Infinispan
Galder Zamarreño
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Ceph Community
Advanced Data Retrieval and Analytics with Apache Spark and Openstack Swift
Advanced Data Retrieval and Analytics with Apache Spark and Openstack Swift
Daniel Krook
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
fnothaft
ELC-E 2010: The Right Approach to Minimal Boot Times
ELC-E 2010: The Right Approach to Minimal Boot Times
andrewmurraympc
Velox: Models in Action
Velox: Models in Action
Dan Crankshaw
Naïveté vs. Experience
Naïveté vs. Experience
Mike Fogus
SparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at Scale
jeykottalam
SampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS Stack
jeykottalam
OpenStack Cheat Sheet V2
OpenStack Cheat Sheet V2
Anuchit Chalothorn
A Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and Concurrency
David Beazley (Dabeaz LLC)
Lab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using Mininet
Zubair Nabi
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark Summit
Best Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache Hadoop
Hortonworks
Python in Action (Part 2)
Python in Action (Part 2)
David Beazley (Dabeaz LLC)
Viewers also liked
(20)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
Open Stack Cheat Sheet V1
Open Stack Cheat Sheet V1
Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5
Linux Filesystems, RAID, and more
Linux Filesystems, RAID, and more
The Hot Rod Protocol in Infinispan
The Hot Rod Protocol in Infinispan
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Advanced Data Retrieval and Analytics with Apache Spark and Openstack Swift
Advanced Data Retrieval and Analytics with Apache Spark and Openstack Swift
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
ELC-E 2010: The Right Approach to Minimal Boot Times
ELC-E 2010: The Right Approach to Minimal Boot Times
Velox: Models in Action
Velox: Models in Action
Naïveté vs. Experience
Naïveté vs. Experience
SparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at Scale
SampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS Stack
OpenStack Cheat Sheet V2
OpenStack Cheat Sheet V2
A Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and Concurrency
Lab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using Mininet
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Best Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache Hadoop
Python in Action (Part 2)
Python in Action (Part 2)
Similar to Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher Bradford, Open Source Connections)
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
Anyscale
Jumpstart on Apache Spark 2.2 on Databricks
Jumpstart on Apache Spark 2.2 on Databricks
Databricks
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
CloudxLab
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Databricks
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
Databricks
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
Data Con LA
Spark to DocumentDB connector
Spark to DocumentDB connector
Denny Lee
Jdbc drivers
Jdbc drivers
Prabhat gangwar
Apache Spark Fundamentals Training
Apache Spark Fundamentals Training
Eren Avşaroğulları
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
Databricks
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
Patrick Wendell
Spark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to Know
Kristian Alexander
USF Seminar Series: Apache Spark, Machine Learning, Recommendations Feb 05 2016
USF Seminar Series: Apache Spark, Machine Learning, Recommendations Feb 05 2016
Chris Fregly
Building a modern Application with DataFrames
Building a modern Application with DataFrames
Databricks
Building a modern Application with DataFrames
Building a modern Application with DataFrames
Spark Summit
2 rel-algebra
2 rel-algebra
Mahesh Jeedimalla
Quick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skills
Ravindra kumar
Engineering Document Collaboration with Office 365
Engineering Document Collaboration with Office 365
JoAnna Cheshire
Scylla Summit 2018: Building Recoverable (and optionally Async) Spark Pipelines
Scylla Summit 2018: Building Recoverable (and optionally Async) Spark Pipelines
ScyllaDB
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and Furure
DataStax Academy
Similar to Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher Bradford, Open Source Connections)
(20)
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
Jumpstart on Apache Spark 2.2 on Databricks
Jumpstart on Apache Spark 2.2 on Databricks
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
Spark to DocumentDB connector
Spark to DocumentDB connector
Jdbc drivers
Jdbc drivers
Apache Spark Fundamentals Training
Apache Spark Fundamentals Training
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
Spark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to Know
USF Seminar Series: Apache Spark, Machine Learning, Recommendations Feb 05 2016
USF Seminar Series: Apache Spark, Machine Learning, Recommendations Feb 05 2016
Building a modern Application with DataFrames
Building a modern Application with DataFrames
Building a modern Application with DataFrames
Building a modern Application with DataFrames
2 rel-algebra
2 rel-algebra
Quick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skills
Engineering Document Collaboration with Office 365
Engineering Document Collaboration with Office 365
Scylla Summit 2018: Building Recoverable (and optionally Async) Spark Pipelines
Scylla Summit 2018: Building Recoverable (and optionally Async) Spark Pipelines
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and Furure
More from Spark Summit
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Spark Summit
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
More from Spark Summit
(20)
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Recently uploaded
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
Rafezzaman
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
Boston Institute of Analytics
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
Emmanuel Dauda
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Social Samosa
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
limedy534
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
fhwihughh
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
dajasot375
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
MYRABACSAFRA2
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
Boston Institute of Analytics
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
soniya singh
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
Colleen Farrelly
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
sapnasaifi408
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Universitat Politècnica de Catalunya
How we prevented account sharing with MFA
How we prevented account sharing with MFA
Andrei Kaleshka
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
jennyeacort
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
F sss
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Boston Institute of Analytics
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
Jeremy Anderson
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
208367051
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
thyngster
Recently uploaded
(20)
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
How we prevented account sharing with MFA
How we prevented account sharing with MFA
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher Bradford, Open Source Connections)
1.
Lessons Learned with Spark
at the US Patent & Trademark Office Christopher Bradford Big Data Architect at OpenSource Connections
2.
Christopher Bradford Twitter: @bradfordcp GitHub:
bradfordcp
3.
OpenSource Connections
4.
Exploring Search Technologies
- EST
5.
EST – Technology
Stack
6.
EST – Data
Loading CSS Ingestion (CSS2C) Solr Ingestion (C2S)
7.
EST – C2S
Process Note: some connections are omitted for clarity
8.
EST – C2S
Process (Scaled Out) Note: some connections are omitted for clarity
9.
EST – C2S
Review Did it work? Why change it? How could we make it better?
10.
11.
EST – Old
C2S Process Note: some connections are omitted for clarity
12.
EST – Spark
C2S Process Note: some connections are omitted for clarity
13.
How did this
work out? Poorly
14.
Poor Performance joinedRDD =
… joinedRDD.foreach() document = … // build document sc = new SolrConnection() sc.push(document) sc.disconnect() // Job is done
15.
Poor Performance sc =
new SolrConnection() sc.push(document) sc.disconnect()
16.
Optimum Performance joinedRDD =
… sc = new SolrConnection() joinedRDD.foreach() document = … // build document sc.push(document) sc.disconnect() // Job is done joinedRDD = … joinedRDD.foreachPartition() sc = new SolrConnection() partition.foreach() document = … // build document sc.push(document) sc.disconnect() // Job is done Almost
17.
The Solution! joinedRDD =
… joinedRDD.mapPartitions() sc = new SolrConnection() partition.foreach() document = … // build document sc.push(document) sc.close() return partition.rows .collect() joinedRDD = … joinedRDD.mapPartitions() sc = new SolrConnection() partition.foreach() document = … // build document sc.push(document) sc.close() return partitions.rows.count .collect()
18.
Results?
19.
Solr Indexing
20.
Better Solr Indexing Note:
some connections are omitted for clarity
21.
EST – Spark
C2S Process v2 Note: some connections are omitted for clarity
22.
Success? YUP 5x faster than
the original C2S process (with optimizations)
23.
What’s Next? • Optimization
of the C2S Spark job • More Spark jobs • Newer version of Spark & DSE • Scala Spark jobs instead of Java
Download now