4. What is Apache Spark ?
Apache Spark is a Hadoop-compatible
computing system that makes big data analysis
drastically faster, through in-memory
computation, and simpler to write, through
easy APIs in Java, Scala and Python.
4
6. BDAS: Berkeley Data Analytics Stack
• Is an open source software stack that
integrates software components being built by
the AMPLab to make sense of Big Data.
6
16. Spark Summit 2014
• 日にち:2014-06-30 - 2014-07-02
– Day 1, 2: Talks
– Day 3: Training
• 会場:The Westin St. Francis in San Francisco
• 参加者:1000 人以上
– 日本人の参加者もちらほら見かけた
– Training も基礎コースとアドバンテージコースの2つ
• それぞれに100 人以上の参加者
16
23. Apache Spark は
最も活発なプロジェクトのひとつ
• Spark’s Role in the Big Data
Ecosystem
– Matei Zaharia (CTO, Databricks)
• 過去6ヶ月のコミット数をほかのプ
ロジェクトと比べてみると圧倒的に
多い
23
34. How to be a contributor
• Read papers about distributed system, machine
learning and data structure
• Read the documentation about “Contributing to
Spark”
– https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
• Create issues on Apache Spark JIRA
– https://issues.apache.org/jira/browse/SPARK/
• Communicate with other developers on Apache
Spark Developers List
– http://apache-spark-developers-list.1001551.n3.nabble.com/
• Send your pull requests to Spark Github
– https://github.com/apache/spark
34
35. MLlib にコミット(しようとしている)
• MLlib:Spark 上で実行できる機械学習の共通ライブラリ
• MLlib のアルゴリズムの要件
– Be widely known
– Be used and accepted (academic citations and concrete use cases can
help justify this)
– Be highly scalable
– Be well documented
– Have APIs consistent with other algorithms in MLlib that accomplish
the same thing
– Come with a reasonable expectation of developer support.
35
36. 直近で取り組んでいる課題
• [SPARK-2335] k-Nearest Neighbor classification and
regression for MLLib
– https://issues.apache.org/jira/browse/SPARK-2335
• [SPARK-2966] Add an approximation algorithm for
hierarchical clustering to MLlib
– https://issues.apache.org/jira/browse/SPARK-2966
• [SPARK-3012] Standardized Distance Functions between two
Vectors for MLlib
– https://issues.apache.org/jira/browse/SPARK-3012
• 公式ドキュメントの日本語翻訳とか?
– Apache Spark Developers List - Can I translate the
documentations of Spark in Japanese?
• http://apache-spark-developers-list.1001551.n3.nabble.com/Can-I-translate-the-
documentations-of-Spark-in-Japanese-td7538.html
36