Submit Search
Upload
20180417 hivemall meetup#4
•
0 likes
•
1,340 views
Takeshi Yamamuro
Follow
A slide used in Hivemall Meetup#4
Read less
Read more
Engineering
Report
Share
Report
Share
1 of 24
Download now
Download to read offline
Recommended
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
Spark Summit
Interactive Analytics using Apache Spark
Interactive Analytics using Apache Spark
Sachin Aggarwal
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
Spark Summit
Fuel, Puppet and OpenStack
Fuel, Puppet and OpenStack
aedocw
How to Use Telegraf and Its Plugin Ecosystem
How to Use Telegraf and Its Plugin Ecosystem
InfluxData
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
Luciano Resende
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Spark Summit
Spark industrialisation
Spark industrialisation
Lucien Fregosi
Recommended
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
Spark Summit
Interactive Analytics using Apache Spark
Interactive Analytics using Apache Spark
Sachin Aggarwal
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
Spark Summit
Fuel, Puppet and OpenStack
Fuel, Puppet and OpenStack
aedocw
How to Use Telegraf and Its Plugin Ecosystem
How to Use Telegraf and Its Plugin Ecosystem
InfluxData
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
Luciano Resende
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Spark Summit
Spark industrialisation
Spark industrialisation
Lucien Fregosi
Application Management in Openstack
Application Management in Openstack
Cloud Native Day Tel Aviv
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
GetInData
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Databricks
StackStorm Product Highlights - DevOps Enterprise 2014 After-Party Ignite Talk
StackStorm Product Highlights - DevOps Enterprise 2014 After-Party Ignite Talk
StackStorm
Summit openshift-on-openstack
Summit openshift-on-openstack
Pippo620677
In-Memory Computing Essentials for Software Engineers
In-Memory Computing Essentials for Software Engineers
Denis Magda
SparkFramework
SparkFramework
Sergio Viademonte.
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Linaro
Getting Started with Apache Geode
Getting Started with Apache Geode
John Blum
Bringing New Experience with Openstack and Fuel (Ihor Dvoretskyi, Oleksandr M...
Bringing New Experience with Openstack and Fuel (Ihor Dvoretskyi, Oleksandr M...
IT Arena
Deploying Splunk on OpenShift – Part2 : Getting Data In
Deploying Splunk on OpenShift – Part2 : Getting Data In
Eric Gardner
Practical virtual network functions with Snabb (SDN Barcelona VI)
Practical virtual network functions with Snabb (SDN Barcelona VI)
Igalia
Splunk Ninjas: New Features, Pivot and Search Dojo
Splunk Ninjas: New Features, Pivot and Search Dojo
Splunk
Nike tech-talk-intro-to-apache-ignite
Nike tech-talk-intro-to-apache-ignite
Dani Traphagen
In-Memory Computing Essentials
In-Memory Computing Essentials
Denis Magda
Apache Spark Best Practices Meetup Talk
Apache Spark Best Practices Meetup Talk
Eren Avşaroğulları
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
Databricks
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
“Quantum” Performance Effects: beyond the Core
“Quantum” Performance Effects: beyond the Core
C4Media
Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3
DataWorks Summit
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
Big Data Spain
20180921_DOAG_BigDataDays_OracleSpatialandPython_kpatenge
20180921_DOAG_BigDataDays_OracleSpatialandPython_kpatenge
Karin Patenge
More Related Content
What's hot
Application Management in Openstack
Application Management in Openstack
Cloud Native Day Tel Aviv
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
GetInData
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Databricks
StackStorm Product Highlights - DevOps Enterprise 2014 After-Party Ignite Talk
StackStorm Product Highlights - DevOps Enterprise 2014 After-Party Ignite Talk
StackStorm
Summit openshift-on-openstack
Summit openshift-on-openstack
Pippo620677
In-Memory Computing Essentials for Software Engineers
In-Memory Computing Essentials for Software Engineers
Denis Magda
SparkFramework
SparkFramework
Sergio Viademonte.
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Linaro
Getting Started with Apache Geode
Getting Started with Apache Geode
John Blum
Bringing New Experience with Openstack and Fuel (Ihor Dvoretskyi, Oleksandr M...
Bringing New Experience with Openstack and Fuel (Ihor Dvoretskyi, Oleksandr M...
IT Arena
Deploying Splunk on OpenShift – Part2 : Getting Data In
Deploying Splunk on OpenShift – Part2 : Getting Data In
Eric Gardner
Practical virtual network functions with Snabb (SDN Barcelona VI)
Practical virtual network functions with Snabb (SDN Barcelona VI)
Igalia
Splunk Ninjas: New Features, Pivot and Search Dojo
Splunk Ninjas: New Features, Pivot and Search Dojo
Splunk
Nike tech-talk-intro-to-apache-ignite
Nike tech-talk-intro-to-apache-ignite
Dani Traphagen
In-Memory Computing Essentials
In-Memory Computing Essentials
Denis Magda
Apache Spark Best Practices Meetup Talk
Apache Spark Best Practices Meetup Talk
Eren Avşaroğulları
What's hot
(16)
Application Management in Openstack
Application Management in Openstack
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
StackStorm Product Highlights - DevOps Enterprise 2014 After-Party Ignite Talk
StackStorm Product Highlights - DevOps Enterprise 2014 After-Party Ignite Talk
Summit openshift-on-openstack
Summit openshift-on-openstack
In-Memory Computing Essentials for Software Engineers
In-Memory Computing Essentials for Software Engineers
SparkFramework
SparkFramework
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Getting Started with Apache Geode
Getting Started with Apache Geode
Bringing New Experience with Openstack and Fuel (Ihor Dvoretskyi, Oleksandr M...
Bringing New Experience with Openstack and Fuel (Ihor Dvoretskyi, Oleksandr M...
Deploying Splunk on OpenShift – Part2 : Getting Data In
Deploying Splunk on OpenShift – Part2 : Getting Data In
Practical virtual network functions with Snabb (SDN Barcelona VI)
Practical virtual network functions with Snabb (SDN Barcelona VI)
Splunk Ninjas: New Features, Pivot and Search Dojo
Splunk Ninjas: New Features, Pivot and Search Dojo
Nike tech-talk-intro-to-apache-ignite
Nike tech-talk-intro-to-apache-ignite
In-Memory Computing Essentials
In-Memory Computing Essentials
Apache Spark Best Practices Meetup Talk
Apache Spark Best Practices Meetup Talk
Similar to 20180417 hivemall meetup#4
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
Databricks
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
“Quantum” Performance Effects: beyond the Core
“Quantum” Performance Effects: beyond the Core
C4Media
Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3
DataWorks Summit
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
Big Data Spain
20180921_DOAG_BigDataDays_OracleSpatialandPython_kpatenge
20180921_DOAG_BigDataDays_OracleSpatialandPython_kpatenge
Karin Patenge
20160908 hivemall meetup
20160908 hivemall meetup
Takeshi Yamamuro
Apache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! Japan
StreamNative
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Databricks
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...
Amazon Web Services
S3 Server Hackathon Presented by S3 Server, a Scality Product, Seagate and Ho...
S3 Server Hackathon Presented by S3 Server, a Scality Product, Seagate and Ho...
Scality
Hackathon scality holberton seagate 2016 v5
Hackathon scality holberton seagate 2016 v5
Scality
Review on Apache Spark Technology
Review on Apache Spark Technology
IRJET Journal
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
SeungYong Oh
Integrating Existing C++ Libraries into PySpark with Esther Kundin
Integrating Existing C++ Libraries into PySpark with Esther Kundin
Databricks
Oracle Cloud Infrastructure Data Science 概要資料(20200406)
Oracle Cloud Infrastructure Data Science 概要資料(20200406)
オラクルエンジニア通信
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
Luciano Resende
Apache spark 2.4 and beyond
Apache spark 2.4 and beyond
Xiao Li
Advanced technologies and techniques for debugging HPC applications
Advanced technologies and techniques for debugging HPC applications
Rogue Wave Software
Similar to 20180417 hivemall meetup#4
(20)
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
“Quantum” Performance Effects: beyond the Core
“Quantum” Performance Effects: beyond the Core
Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
20180921_DOAG_BigDataDays_OracleSpatialandPython_kpatenge
20180921_DOAG_BigDataDays_OracleSpatialandPython_kpatenge
20160908 hivemall meetup
20160908 hivemall meetup
Apache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! Japan
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...
S3 Server Hackathon Presented by S3 Server, a Scality Product, Seagate and Ho...
S3 Server Hackathon Presented by S3 Server, a Scality Product, Seagate and Ho...
Hackathon scality holberton seagate 2016 v5
Hackathon scality holberton seagate 2016 v5
Review on Apache Spark Technology
Review on Apache Spark Technology
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Integrating Existing C++ Libraries into PySpark with Esther Kundin
Integrating Existing C++ Libraries into PySpark with Esther Kundin
Oracle Cloud Infrastructure Data Science 概要資料(20200406)
Oracle Cloud Infrastructure Data Science 概要資料(20200406)
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
Apache spark 2.4 and beyond
Apache spark 2.4 and beyond
Advanced technologies and techniques for debugging HPC applications
Advanced technologies and techniques for debugging HPC applications
More from Takeshi Yamamuro
LT: Spark 3.1 Feature Expectation
LT: Spark 3.1 Feature Expectation
Takeshi Yamamuro
Apache Spark + Arrow
Apache Spark + Arrow
Takeshi Yamamuro
Quick Overview of Upcoming Spark 3.0 + α
Quick Overview of Upcoming Spark 3.0 + α
Takeshi Yamamuro
MLflowによる機械学習モデルのライフサイクルの管理
MLflowによる機械学習モデルのライフサイクルの管理
Takeshi Yamamuro
Taming Distributed/Parallel Query Execution Engine of Apache Spark
Taming Distributed/Parallel Query Execution Engine of Apache Spark
Takeshi Yamamuro
LLJVM: LLVM bitcode to JVM bytecode
LLJVM: LLVM bitcode to JVM bytecode
Takeshi Yamamuro
An Experimental Study of Bitmap Compression vs. Inverted List Compression
An Experimental Study of Bitmap Compression vs. Inverted List Compression
Takeshi Yamamuro
Sparkのクエリ処理系と周辺の話題
Sparkのクエリ処理系と周辺の話題
Takeshi Yamamuro
20150513 legobease
20150513 legobease
Takeshi Yamamuro
20150516 icde2015 r19-4
20150516 icde2015 r19-4
Takeshi Yamamuro
VLDB2013 R1 Emerging Hardware
VLDB2013 R1 Emerging Hardware
Takeshi Yamamuro
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4
Takeshi Yamamuro
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
Takeshi Yamamuro
Introduction to Modern Analytical DB
Introduction to Modern Analytical DB
Takeshi Yamamuro
SIGMOD’12勉強会 -Session 7-
SIGMOD’12勉強会 -Session 7-
Takeshi Yamamuro
A x86-optimized rank&select dictionary for bit sequences
A x86-optimized rank&select dictionary for bit sequences
Takeshi Yamamuro
VAST-Tree, EDBT'12
VAST-Tree, EDBT'12
Takeshi Yamamuro
VLDB’11勉強会 -Session 9-
VLDB’11勉強会 -Session 9-
Takeshi Yamamuro
研究動向から考えるx86/x64最適化手法
研究動向から考えるx86/x64最適化手法
Takeshi Yamamuro
VLDB'10勉強会 -Session 20-
VLDB'10勉強会 -Session 20-
Takeshi Yamamuro
More from Takeshi Yamamuro
(20)
LT: Spark 3.1 Feature Expectation
LT: Spark 3.1 Feature Expectation
Apache Spark + Arrow
Apache Spark + Arrow
Quick Overview of Upcoming Spark 3.0 + α
Quick Overview of Upcoming Spark 3.0 + α
MLflowによる機械学習モデルのライフサイクルの管理
MLflowによる機械学習モデルのライフサイクルの管理
Taming Distributed/Parallel Query Execution Engine of Apache Spark
Taming Distributed/Parallel Query Execution Engine of Apache Spark
LLJVM: LLVM bitcode to JVM bytecode
LLJVM: LLVM bitcode to JVM bytecode
An Experimental Study of Bitmap Compression vs. Inverted List Compression
An Experimental Study of Bitmap Compression vs. Inverted List Compression
Sparkのクエリ処理系と周辺の話題
Sparkのクエリ処理系と周辺の話題
20150513 legobease
20150513 legobease
20150516 icde2015 r19-4
20150516 icde2015 r19-4
VLDB2013 R1 Emerging Hardware
VLDB2013 R1 Emerging Hardware
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
Introduction to Modern Analytical DB
Introduction to Modern Analytical DB
SIGMOD’12勉強会 -Session 7-
SIGMOD’12勉強会 -Session 7-
A x86-optimized rank&select dictionary for bit sequences
A x86-optimized rank&select dictionary for bit sequences
VAST-Tree, EDBT'12
VAST-Tree, EDBT'12
VLDB’11勉強会 -Session 9-
VLDB’11勉強会 -Session 9-
研究動向から考えるx86/x64最適化手法
研究動向から考えるx86/x64最適化手法
VLDB'10勉強会 -Session 20-
VLDB'10勉強会 -Session 20-
Recently uploaded
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur High Profile
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
pranjaldaimarysona
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
M Maged Hegazy, LLM, MBA, CCP, P3O
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
Tsuyoshi Horigome
Extrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
120cr0395
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Low Rate Call Girls In Saket, Delhi NCR
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
Suhani Kapoor
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
KurinjimalarL3
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
ranjana rawat
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
GDSCAESB
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
upamatechverse
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
rehmti665
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur High Profile
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
ranjana rawat
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
ranjana rawat
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Suman Mia
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
rakeshbaidya232001
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
Suhani Kapoor
Recently uploaded
(20)
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
Extrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
20180417 hivemall meetup#4
1.
Copyright©2018 NTT corp.
All Rights Reserved. An Introduction to Spark v2.3 & Hivemall-‐‑‒on-‐‑‒Spark v0.5.0 Takeshi Yamamuro @ NTT Lab.
2.
2Copyright©2018 NTT corp.
All Rights Reserved. • R&D/OSS engineer • Ph.D. in CS (Database Systems) • Love OSS activities • Apache Spark • Apache Hivemall • PostgreSQL • ... • My Active GitHub Products • spark-‐‑‒sql-‐‑‒server • Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocol • https://github.com/maropu/spark-‐‑‒sql-‐‑‒server • lljvm-‐‑‒translator • A lightweight library to inject LLVM bitcode into JVMs • https://github.com/maropu/lljvm-‐‑‒translator Introduce Myself
3.
3Copyright©2018 NTT corp.
All Rights Reserved. HIVEMALL ON SPARK v0.5.0
4.
4Copyright©2018 NTT corp.
All Rights Reserved. • Hivemall wrapper for Spark • Wrapper implementations for DataFrame/SQL • + some utilities for easy-‐‑‒to-‐‑‒use in Spark • The wrapper makes you... • run most of Hivemall functions in Spark • try Hivemall examples easily in your laptop • improve some Hivemall function performance in Spark Whatʼ’s Hivemall on Spark?
5.
5Copyright©2018 NTT corp.
All Rights Reserved. • Hivemall already has many fascinating ML algorithms and useful utilities • High barriers to add newer algorithms in MLlib Whyʼ’s Hivemall on Spark? https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
6.
6Copyright©2018 NTT corp.
All Rights Reserved. • Supported Spark Versions • v2.0, v2.1, and v2.2 • Upcoming release will support v2.3 • Custom Operations • Top-‐‑‒K Join SparkPlan: https://bit.ly/2HnaeG1 • Utility Functions: https://bit.ly/2qlk8zH • ... • Installation via Spark Packages • https://spark-‐‑‒packages.org • ./bin/spark-‐‑‒shell -‐‑‒-‐‑‒packages apache-‐‑‒hivemall:apache-‐‑‒ hivemall:0.5.1-‐‑‒spark2.2 A Status of Hivemall-‐‑‒on-‐‑‒Spark v0.5.0
7.
7Copyright©2018 NTT corp.
All Rights Reserved. • Joins Top-‐‑‒K entries only • “Vanilla Join + Rank Over” is too slow Example) Top-‐‑‒K Join Processing join key x join key y ・・・・・ Joins the top-K rows that have higher score values, e.g., f(x, y) leftDf rightDf Join Join
8.
8Copyright©2018 NTT corp.
All Rights Reserved. • 1. Download a Spark binary • 2. Fetch training and test data • 3. Load these data in Spark • 4. Build a model • 5. Do predictions Quick Example
9.
9Copyright©2018 NTT corp.
All Rights Reserved. 1. Download a Spark binary • Download a Spark v2.2.1 binary • https://spark.apache.org/downloads.html
10.
10Copyright©2018 NTT corp.
All Rights Reserved. 2. Fetch training and test data • E2006 tfidf regression dataset • https://bit.ly/2GOC0di $ wget http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ regression/E2006.train.bz2 $ wget http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ regression/E2006.test.bz2
11.
11Copyright©2018 NTT corp.
All Rights Reserved. 3. Load training data in Spark $ <SPARK_HOME>/bin/spark-shell --packages apache-hivemall:apache-hivemall:0.5.1-spark2.2 scala> import org.apache.spark.sql.hive.HivemallOps._ scala> import org.apache.spark.sql._ scala> :paste // Creates DataFrame from the bzip’d libsvm-formatted file val rawTrainDf = spark.read.format("libsvm").load("E2006.train.bz2") // Since `label` must be [0.0, 1.0], rescales them first val maxmin = rawTrainDf.select(max($"label"), min($"label")).collect.map { case Row(max: Double, min: Double) => (max, min) }.head val trainDf = rawTrainDf.select( rescale($"label", lit(maxmin._2), lit(maxmin._1)).as("label"), $"features”)
12.
12Copyright©2018 NTT corp.
All Rights Reserved. 3. Load test data in Spark scala> val rawTestDf = spark.read.format("libsvm").load("E2006.test.bz2”) scala> :paste val testDf = rawTestDf.select( rowid(), rescale($"label", lit(maxmin._2), lit(maxmin._1)).as("label"), $"features") .explode_vector($"features") .select($"rowid", $"label".as("target"), $"feature", $"weight".as("value")) .cache
13.
13Copyright©2018 NTT corp.
All Rights Reserved. 4. Build a model -‐‑‒ DataFrame scala> paste: val modelDf = trainDf.train_logistic_regr($"features", $"label") .groupBy("feature") .agg("weight" -> "avg")
14.
14Copyright©2018 NTT corp.
All Rights Reserved. 5. Do predictions -‐‑‒ DataFrame // Do predictions scala> paste: val predictDf = testDf .join(modelDf, testDf("feature") === modelDf("feature"), "LEFT_OUTER") .select($"rowid", ($"avg(weight)" * $"value").as("value")) .groupBy("rowid").sum("value") .select( $"rowid", sigmoid($"sum(value)").as("predicted”))
15.
15Copyright©2018 NTT corp.
All Rights Reserved. • Feature Selection + Spark Optimizer = Fast Data Extraction • HIVEMALL-‐‑‒181: Plan rewriting rules to filter meaningful training data before feature selections Current Work for Future Releases Arun Kumar, Jeffrey Naughton, Jignesh M. Patel, and Xiaojin Zhu, To Join or Not to Join?: Thinking Twice about Joins before Feature Selection, Proceedings of SIGMOD, 2016. key v0 key v1 v2 key v0 v1 v2 Data Extraction (e.g., by SQL) Feature Selection (e.g., by scikit-learn) Selected Features
16.
16Copyright©2018 NTT corp.
All Rights Reserved. • Feature Selection + Spark Optimizer = Fast Data Extraction • HIVEMALL-‐‑‒181: Plan rewriting rules to filter meaningful training data before feature selections Current Work for Future Releases Arun Kumar, Jeffrey Naughton, Jignesh M. Patel, and Xiaojin Zhu, To Join or Not to Join?: Thinking Twice about Joins before Feature Selection, Proceedings of SIGMOD, 2016. key v0 key v1 v2 key v1 v2 Data Extraction + Feature Selection Join Pruning by Data Statistics
17.
17Copyright©2018 NTT corp.
All Rights Reserved. SPARK v2.3
18.
18Copyright©2018 NTT corp.
All Rights Reserved. Whatʼ’s Apache Spark • Distributed data analytics engine, generalizing Map Reduce Spark GitHub
19.
19Copyright©2018 NTT corp.
All Rights Reserved. Whatʼ’s Apache Spark • 1. Unified Engine • support end-‐‑‒to-‐‑‒end APIs, e.g., MLlib and Streaming • 2. High-‐‑‒level APIs • easy-‐‑‒to-‐‑‒use, rich optimization • 3. Integrate broadly • storages, libraries, ...
20.
20Copyright©2018 NTT corp.
All Rights Reserved. • v2.3.0 released in 2018.2 • v2.x releases focus on API stabilities • minor releases: 4month dev. + 1month QA • Community discussion for v3.0 started recently • time for Apache Spark 3.0?: https://bit.ly/2qjcd6f Spark Release History 2012 2013 2014 2015 2016 2017 The original paper (RDD) published Incubated in ASF To an ASF top-level project v1.0 v1.1 v1.2 v1.3 v1.4 v1.5 v1.6 v2.0 v2.1 v0.6 v0.7 v0.8 v0.9 DataFrame APIs Codegen Support Dataset APIs Structure Streaming 2018 v2.2 v2.3 Today talk
21.
21Copyright©2018 NTT corp.
All Rights Reserved. Cited from: What's New in Upcoming Apache Spark 2.3, https://bit.ly/2GNS2nP An Introduction to Spark v2.3
22.
22Copyright©2018 NTT corp.
All Rights Reserved. Cited from: What's New in Upcoming Apache Spark 2.3, https://bit.ly/2GNS2nP An Introduction to Spark v2.3
23.
23Copyright©2018 NTT corp.
All Rights Reserved. An Introduction to Spark v2.3 • Talked by using the slide: What's New in Upcoming Apache Spark 2.3 • https://bit.ly/2GNS2nP
24.
24Copyright©2018 NTT corp.
All Rights Reserved. • Hivemall on Spark • Wrapper implementations for DataFrame/SQL • + some utilities for easy-‐‑‒to-‐‑‒use in Spark • Feature Selection + Spark Optimizer = Fast Data Extraction • WIP for Hivemall future releases • Spark v2.3 • Structured Streaming • Image support • Pandas UDF performance improvement • Spark on Kubernetes • ... Recap
Download now