SlideShare a Scribd company logo
1 of 12
高階関数(Higher-Order Functions)を
ちょっと深掘りしてみた
Spark Meetup Tokyo #2
Kamuela Lau (@kamu_lau)
Software Engineer, Consultant at Rondhuit
自己紹介
Kamuela Lau (twitter: @kamu_lau)
● 株式会社ロンウイット
○ ソフトウェアエンジニア・コンサルタント
● Georgia Institute of Technology
○ コンピュータサイエンス・機械学習特化修士
課程に在学中
● 記事執筆
○ https://codezine.jp/author/1834
○ https://jp.kamulau.com
高階関数(Higher-order function)とは?
● 以下の条件のいずれかを満たすような関数は高階関数
(higher-order function)と呼ぶ
○ 引数として関数を受け取る
○ 戻りとして関数を返す
● Apache Spark 2.4.0 で以下の高階関数が追加された
○ transform
○ filter
○ aggregate
○ exists
○ zip_with
LT の目的
GOALS:
● 高階関数を含めていくつかの手法で同じ処理を
実行
● 処理結果と explain の出力を比較
● Spark 3.0.0 の高階関数
NON-GOALS:
● UDF と Built-in Functions (Spark の標準のカラ
ム関数)のパフォーマンス差の詳細な解説・検証 https://github.com/Kamulau/higher-order-functions-exercise
本LTの高階関数のScala コードは以下の GitHub レポジトリ
で公開中
高階関数と他手法の比較
● “key” という文字列型カラムと “numbers”という整数配列型カラムが存在する
DataFrame があるとする
● 高階関数「aggregate」を含めて、いくつかの方法で “numbers” の平均値を
持つカラムを追加し比較する
Spark における整数配列の平均計算方法
● explode と groupBy
● map
● udf
● aggregate (高階関数)
explode と groupBy による結果と explain の出力
map による結果と explain の出力
udf による結果と explain の出力
高階関数による結果と explain の出力
Spark 3.0.0 で高階関数の Scala API が追加
Spark 3.0.0 で高階関数の Scala API が追加される。以下のように簡単に呼べる
参考文献・投稿
● Introducing New Built-in and Higher-Order Functions for Complex Data Types
in Apache Spark 2.4
https://databricks.com/blog/2018/11/16/introducing-new-built-in-functions-and-higher-order-functions-for-complex-data-types-in-
apache-spark.html
● Working with Nested Data Using Higher Order Functions in SQL on Databricks
https://databricks.com/blog/2017/05/24/working-with-nested-data-using-higher-order-functions-in-sql-on-databricks.html
● SPARK-27297 Add higher order functions to Scala API
https://issues.apache.org/jira/browse/SPARK-27297
● SPARK-27297 PR
https://github.com/apache/spark/pull/24232

More Related Content

Recently uploaded

Recently uploaded (7)

失忆方法迷幻药【购买网址: GHB1.com】 拍肩粉货到付款购买【商城网址: 91miwan.com☆】
失忆方法迷幻药【购买网址:  GHB1.com】 拍肩粉货到付款购买【商城网址:  91miwan.com☆】失忆方法迷幻药【购买网址:  GHB1.com】 拍肩粉货到付款购买【商城网址:  91miwan.com☆】
失忆方法迷幻药【购买网址: GHB1.com】 拍肩粉货到付款购买【商城网址: 91miwan.com☆】
 
QQ实名号自助购买【☆出售网址:fk578。com☆】【☆购买网址:fk578。com☆】实名微博号购买商城
QQ实名号自助购买【☆出售网址:fk578。com☆】【☆购买网址:fk578。com☆】实名微博号购买商城QQ实名号自助购买【☆出售网址:fk578。com☆】【☆购买网址:fk578。com☆】实名微博号购买商城
QQ实名号自助购买【☆出售网址:fk578。com☆】【☆购买网址:fk578。com☆】实名微博号购买商城
 
sm支付宝号购买【☆出售网址:fk578。com☆】【☆购买网址:fk578。com☆】抖音小号自动发号
sm支付宝号购买【☆出售网址:fk578。com☆】【☆购买网址:fk578。com☆】抖音小号自动发号sm支付宝号购买【☆出售网址:fk578。com☆】【☆购买网址:fk578。com☆】抖音小号自动发号
sm支付宝号购买【☆出售网址:fk578。com☆】【☆购买网址:fk578。com☆】抖音小号自动发号
 
境外手机卡商城【☆购买网站:fk578.com】【☆购买网址:fk578.com☆】☆☆快手账号交易平台
境外手机卡商城【☆购买网站:fk578.com】【☆购买网址:fk578.com☆】☆☆快手账号交易平台境外手机卡商城【☆购买网站:fk578.com】【☆购买网址:fk578.com☆】☆☆快手账号交易平台
境外手机卡商城【☆购买网站:fk578.com】【☆购买网址:fk578.com☆】☆☆快手账号交易平台
 
微信实名号哪里买【☆出售网址:fk578。com☆】【☆购买网址:fk578。com☆】soul实名账号批发
微信实名号哪里买【☆出售网址:fk578。com☆】【☆购买网址:fk578。com☆】soul实名账号批发微信实名号哪里买【☆出售网址:fk578。com☆】【☆购买网址:fk578。com☆】soul实名账号批发
微信实名号哪里买【☆出售网址:fk578。com☆】【☆购买网址:fk578。com☆】soul实名账号批发
 
购买官网听话水【购买网址: GHB1.com】 迷幻药货到付款【商城网址: 91miwan.com☆】
购买官网听话水【购买网址:  GHB1.com】 迷幻药货到付款【商城网址:  91miwan.com☆】购买官网听话水【购买网址:  GHB1.com】 迷幻药货到付款【商城网址:  91miwan.com☆】
购买官网听话水【购买网址: GHB1.com】 迷幻药货到付款【商城网址: 91miwan.com☆】
 
微信白号怎么买【☆出售网址:fk578。com☆】【☆购买网址:fk578。com☆】soul账号交易平台
微信白号怎么买【☆出售网址:fk578。com☆】【☆购买网址:fk578。com☆】soul账号交易平台微信白号怎么买【☆出售网址:fk578。com☆】【☆购买网址:fk578。com☆】soul账号交易平台
微信白号怎么买【☆出售网址:fk578。com☆】【☆购买网址:fk578。com☆】soul账号交易平台
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Higher order functions Lightning Talk

Editor's Notes

  1. Exchange hashpartitioning: shuffle Unique key が必要 元の配列を求める場合、collect_list が必要だが、元の順番は保証されない Numbers カラムに値がなければならない
  2. Entire row, black box Serialization/deserialization cost
  3. Black box として扱われるところがあり、標準カラム関数ほど早くない Spark SQL で利用するには、register する必要がある
  4. Udf より速い Spark SQL で使える Scala API がない
  5. expr や withExpr を使用せずに簡単に 高階関数を使えるようになる。 Spark 3.0.0
  6. https://www.slideshare.net/databricks/an-introduction-to-higher-order-functions-in-spark-sql-with-herman-van-hovell