To Infinity and Beyond - OSDConf2014

Graph Computing with JanusGraph

Real-Time Analytics with Apache Storm

Taewoo Kim

Koalas: Unifying Spark and pandas APIs

Xiao Li

An Insider’s Guide to Maximizing Spark SQL Performance

Takuya UESHIN

Gremlin, the graph traversal language from Apache TinkerPop, continues to evolve in support of the growing graph ecosystem. In this session, we'll take a deep dive into Gremlin Language Variants (GLV) to see how TinkerPop enables modern programming languages to leverage Gremlin natively. By converting Gremlin into bytecode, the same instructions can be transmitted and interpreted by graph systems from different vendors. We'll uncover the benefits of this approach by demonstrating a Python-based graph architecture built to empower your application developers and data scientists. By using popular packages from Python open source, like Flask microframework and Jupyter notebooks, we'll see how you can easily transition your app development from your machine to the IBM Cloud. Presented at Graph Day SF on June 17, 2017.

FME-Based Tool for Automatic Updating of Geographical Git Repositories (Pushi...

Safe Software

Apache Storm - Real Time AnalyticsEdureka!

Graph Processing with Apache TinkerPop and Gremlin

Presented at the NVIDIA GPU-Accelerated Graph Ecosystem Roundtable. "Come share and learn more about how NVIDIA is accelerating the graph ecosystem and collaborating with the community on joint development opportunities. Join us to get the latest update on nvGraph, cuSTINGER, Gunrock, and query languages. Don't miss out on a great opportunity to provide feedback and take an active part in shaping the future of GPU-accelerated graph analytics." GPU Technology Conference, May 8, 2017, San Jose, California.

Asynchronous Hyperparameter Optimization with Apache Spark

For the past two years, the open-source Hopsworks platform has used Spark to distribute hyperparameter optimization tasks for Machine Learning. Hopsworks provides some basic optimizers (gridsearch, randomsearch, differential evolution) to propose combinations of hyperparameters (trials) that are run synchronously in parallel on executors as map functions. However, many such trials perform poorly, and we waste a lot of CPU and harware accelerator cycles on trials that could be stopped early, freeing up the resources for other trials. In this talk, we present our work on Maggy, an open-source asynchronous hyperparameter optimization framework built on Spark that transparently schedules and manages hyperparameter trials, increasing resource utilization, and massively increasing the number of trials that can be performed in a given period of time on a fixed amount of resources. Maggy is also used to support parallel ablation studies using Spark. We have commercial users evaluating Maggy and we will report on the gains they have seen in reduced time to find good hyperparameters and improved utilization of GPU hardware. Finally, we will perform a live demo on a Jupyter notebook, showing how to integrate maggy in existing PySpark applications.

Data Science with Elastic MapReduce (EMR) at NetflixKurt Brown

How to create a personal knowledge graph IBM Meetup Big Data Madrid 2017

Juantomás García Molina

Next Generation Big Data Platform at Netflix 2014

Eva Tse

Apache Spark & ScalaEdureka!

DevFest Nantes 2018 - Créer un data pipeline en 20 minutes avec Kafka Connect

EdwardBloom

SparkApplicationDevMadeEasy_Spark_Summit_2015Lance Co Ting Keh

Computing at scale

jerjou

Twisting Data into Cool Shapes

Shane Coughlan

Random Walks on Large Scale Graphs with Apache Spark with Min Shen

Random Walks on graphs is a useful technique in machine learning, with applications in personalized PageRank, representational learning and others. This session will describe a novel algorithm for enumerating walks on large-scale graphs that benefits from the several unique abilities of Apache Spark. The algorithm generates a recursive branching DAG of stages that separates out the “closed” and “open” walks. Spark’s shuffle file management system is ingeniously used to accumulate the walks while the computation is progressing. In-memory caching over multi-core executors enables moving the walks several “steps” forward before shuffling to the next stage. See performance benchmarks, and hear about LinkedIn’s experience with Spark in production clusters. The session will conclude with an observation of how Spark’s unique and powerful construct opens new models of computation, not possible with state-of-the-art, for developing high-performant and scalable algorithms in data science and machine learning.

Solidry @ bakheda2

#commentsPranav Prakash

Webtech1bPranav Prakash

Ibm haifa.mq.finalPranav Prakash

Test document

How to Create an Engaging Social Media ExperienceArun

Apple banana oranges_peachesPranav Prakash

Banana peachesPranav Prakash

What's hot

Start Flying with Python & Apache TinkerPop

FME-Based Tool for Automatic Updating of Geographical Git Repositories (Pushi...

Safe Software

Apache Storm - Real Time AnalyticsEdureka!

Graph Processing with Apache TinkerPop and Gremlin

Asynchronous Hyperparameter Optimization with Apache Spark

Data Science with Elastic MapReduce (EMR) at NetflixKurt Brown

How to create a personal knowledge graph IBM Meetup Big Data Madrid 2017

Juantomás García Molina

Next Generation Big Data Platform at Netflix 2014

Eva Tse

Apache Spark & ScalaEdureka!

DevFest Nantes 2018 - Créer un data pipeline en 20 minutes avec Kafka Connect

EdwardBloom

SparkApplicationDevMadeEasy_Spark_Summit_2015Lance Co Ting Keh

Computing at scale

jerjou

Twisting Data into Cool Shapes

Shane Coughlan

Random Walks on Large Scale Graphs with Apache Spark with Min Shen

What's hot (14)

Start Flying with Python & Apache TinkerPop

FME-Based Tool for Automatic Updating of Geographical Git Repositories (Pushi...

Apache Storm - Real Time Analytics

Graph Processing with Apache TinkerPop and Gremlin

Asynchronous Hyperparameter Optimization with Apache Spark

Data Science with Elastic MapReduce (EMR) at Netflix

How to create a personal knowledge graph IBM Meetup Big Data Madrid 2017

Next Generation Big Data Platform at Netflix 2014

Apache Spark & Scala

DevFest Nantes 2018 - Créer un data pipeline en 20 minutes avec Kafka Connect

SparkApplicationDevMadeEasy_Spark_Summit_2015

Computing at scale

Twisting Data into Cool Shapes

Random Walks on Large Scale Graphs with Apache Spark with Min Shen

Viewers also liked

Solidry @ bakheda2

#commentsPranav Prakash

Webtech1bPranav Prakash

Ibm haifa.mq.finalPranav Prakash

Test document

How to Create an Engaging Social Media ExperienceArun

Apple banana oranges_peachesPranav Prakash

Banana peachesPranav Prakash

Implementing Ajax In ColdFusion 7

The Social Semantic Web

John Breslin

A Hybrid Recommendation system