MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library for Apache Spark with Miruna Oprescu

•

7 likes•2,775 views

With the rapid growth of available datasets, it is imperative to have good tools for extracting insight from big data. The Spark ML library has excellent support for performing at-scale data processing and machine learning experiments, but more often than not, Data Scientists find themselves struggling with issues such as: low level data manipulation, lack of support for image processing, text analytics and deep learning, as well as the inability to use Spark alongside other popular machine learning libraries. To address these pain points, Microsoft recently released The Microsoft Machine Learning Library for Apache Spark (MMLSpark), an open-source machine learning library built on top of SparkML that seeks to simplify the data science process and integrate SparkML Pipelines with deep learning and computer vision libraries such as the Microsoft Cognitive Toolkit (CNTK) and OpenCV. With MMLSpark, Data Scientists can build models with 1/10th of the code through Pipeline objects that compose seamlessly with other parts of the SparkML ecosystem. In this session, we explore some of the main lessons learned from building MMLSpark. Join us if you would like to know how to extend Pipelines to ensure seamless integration with SparkML, how to auto-generate Python and R wrappers from Scala Transformers and Estimators, how to integrate and use previously non-distributed libraries in a distributed manner and how to efficiently deploy a Spark library across multiple platforms.

Data & Analytics

Miruna Oprescu (moprescu@microsoft.com)
Microsoft
MMLSPARK:
Lessons From Building A
SparkML-Compatible Machine
Learning Library For Apache Spark
#EUai7

MMLSpark
Microsoft Machine Learning Library for Apache Spark
GitHub: https://github.com/Azure/mmlspark
Spark Package:
pyspark/spark-shell/spark-submit --packages Azure:mmlspark:0.9
Docker:
docker run -it -p 8888:8888 -e ACCEPT_EULA=yes microsoft/mmlspark
Navigate to http://localhost:8888 to view example Jupyter notebooks
2#EUai7

Why MMLSpark?
• The (typical) data science workflow with Spark:
3#EUai7
ML algorithm
Data transforms
Data
Model

Why MMLSpark?
• Doesn’t look familiar?…maybe this does?
4#EUai7
ML algorithms
Data
transforms
Data
Pipeline
Python/R
UDFsData
massaging
External libraries
(CNTK, OpenCV)

Why MMLSpark?
• Workflow can be:
– Slow & time-consuming
– Intractable
– Difficult to debug, reproduce
– Hard to put in production
5#EUai7

MMLSpark Goals
q Stay in the Spark ecosystem as much as possible by
integrating domain-specific libraries (vision, text
analytics, etc.)
q Have better model management support
q Bring cutting edge ML algorithms to Spark
q Reduce the overhead from UDFs and other custom
functions
q Run on every platform & language supported by Spark
6#EUai7

MMLSpark
Lesson #1: Follow the SparkML Pipeline model for
composability.
• MMLSpark consists of Transforms, Estimators
and Models that can be combined with existing
SparkML components into pipelines.
• These abstractions ensure composability,
reusability via serialization, logging, ease of use
across languages.
7#EUai7

MMLSpark: Before and After
8#EUai7
Example: Book Reviews

MMLSpark
11#EUai7
Lesson #2: Leverage SparkML abstractions to
auto-generate Python and R interfaces.
• Decreases development time, ensures feature
parity, reduces errors and improves testing.

MMLSpark Architecture
12#EUai7
Pre-
trained
DNN
models
OpenCV Java
Bindings
Spark coreCNTK Java
Bindings
Scala API
PySpark/R wrappers
Wrapper generation

Python Wrappers: Example
13#EUai7
Scala source
Python wrapper
Generator

MMLSpark
Lesson #3: Turn parallelizable algorithms from
external libraries (e.g. OpenCV and
CNTK) into Scala Pipeline Stages.
• No data transfer overhead, all operations
happen at the JVM level.
• Enables cutting edge techniques such as
Transfer Learning.
14#EUai7

MMLSpark Architecture
15
Pre-
trained
DNN
models
OpenCV Java
Bindings
Spark coreCNTK Java
Bindings
Scala API
PySpark/R wrappers
Wrapper generation
#EUai7

MMLSpark: Example
• DNN Featurization with OpenCV and CNTK
16#EUai7

MMLSpark
Lesson #4: Run on every platform supported by
Spark. Test at the highest possible
level (Jupyter Notebooks).
• Publish self-contained packages that can be
used from a variety of targets.
• Avoid common integration issues by testing
directly with Jupyter Notebooks.
17#EUai7

MMLSpark: Bonus
• Image Schema unification
https://github.com/apache/spark/pull/19439
• Uses DataFrames as common format for
reading images.
• Standardizes handling of images as a datatype
used by different algorithms.
18#EUai7

Use case: Snow Leopards
20#EUai7
• 3,900-6,500 individuals left
in the wild
• Little known about their
behavior, movement
patterns, survival rates
• Camera trapping since
2009 (~1.3 mil images)

Use case: Snow Leopards
21#EUai7
• Automatic Image
Classification with
MMLSpark:
− Thousands of hours of
researcher and volunteer time
saved
− Resources redeployed to
science and conservation vs
image sorting
− Much more accurate data on
range and population

Thank You!
• Star our repo:
https://github.com/Azure/mmlspark
• Contact us:
Myself: Miruna Oprescu (moprescu@microsoft.com)
Dev Lead: Sudarshan Raghunathan (susudars@microsoft.com)
PM: Roope Astala (roastala@microsoft.com)
22#EUai7

What's hot

Ryan Blue explains how Netflix is building on Parquet to enhance its 40+ petabyte warehouse, combining Parquet’s features with Presto and Spark to boost ETL and interactive queries. Information about tuning Parquet is hard to find. Ryan shares what he’s learned, creating the missing guide you need. Topics include: * The tools and techniques Netflix uses to analyze Parquet tables * How to spot common problems * Recommendations for Parquet configuration settings to get the best performance out of your processing platform * The impact of this work in speeding up applications like Netflix’s telemetry service and A/B testing platform

Parquet performance tuning: the missing guide

Ryan Blue

Spark overview

Lisa Hua

Spark Performance Tuning .pdf

Amit Raj

Modern object storage offers the opportunity to combine software and hardware to create high performance, disaggregated data infrastructure. By decoupling compute and storage, enterprises can tune their environments to meet an expanded set of use cases including machine learning/big data. These modern object storage solutions boast throughput that is capable of saturating a 100 GBe switches, changing how we perceive, and how we ultimately deploy object storage.

Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud

Databricks

Apache Spark 101

Abdullah Çetin ÇAVDAR

PGConf.ASIA 2019 Bali - Setup a High-Availability and Load Balancing PostgreS...

Equnix Business Solutions

Advanced Apache Spark Meetup Project Tungsten Nov 12 2015

Chris Fregly

"During development and automated tests, it is common to create Kafka clusters from scratch and run workloads against those short-lived clusters. Starting a Kafka broker typically takes several seconds, and those seconds add up to precious time and resources. How about spinning up a Kafka broker in less than 0.2 seconds with less memory overhead? In this session, we will talk about kafka-native, which leverages GraalVM native image for compiling Kafka broker to native executable using Quarkus framework. After going through some implementation details, we will focus on how it can be used in a Docker container with Testcontainers to speed up integration testing of Kafka applications. We will finally discuss some current caveats and future opportunities of a native-compiled Kafka for cloud-native production clusters."

Running Kafka as a Native Binary Using GraalVM with Ozan Günalp

HostedbyConfluent

Après la petite intro sur le stockage distribué et la description de Ceph, Jian Zhang réalise dans cette présentation quelques benchmarks intéressants : tests séquentiels, tests random et surtout comparaison des résultats avant et après optimisations. Les paramètres de configuration touchés et optimisations (Large page numbers, Omap data sur un disque séparé, ...) apportent au minimum 2x de perf en plus.

Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...

Odinot Stanislas

Like many other messaging systems, Kafka has put limit on the maximum message size. User will fail to produce a message if it is too large. This limit makes a lot of sense and people usually send to Kafka a reference link which refers to a large message stored somewhere else. However, in some scenarios, it would be good to be able to send messages through Kafka without external storage. At LinkedIn, we have a few use cases that can benefit from such feature. This talk covers our solution to send large message through Kafka without additional storage.

Handle Large Messages In Apache Kafka

Jiangjie Qin

Apache Spark is a next-generation processing engine optimized for speed, ease of use, and advanced analytics well beyond batch. The Spark framework supports streaming data and complex, iterative algorithms, enabling applications to run 100x faster than traditional MapReduce programs. With Spark, developers can write sophisticated parallel applications for faster business decisions and better user outcomes, applied to a wide variety of architectures and industries. Learn What Apache Spark is and how it compares to Hadoop MapReduce, How to filter, map, reduce, and save Resilient Distributed Datasets (RDDs), Who is best suited to attend the course and what prior knowledge you should have, and the benefits of building Spark applications as part of an enterprise data hub.

Introduction to Apache Spark Developer Training

Cloudera, Inc.

PySpark dataframe

Jaemun Jung

Yahoo! is one of the most-visited web sites in the world. It runs one of the largest private cloud infrastructures, one that operates on petabytes of data every day. Being able to store and manage that data well is essential to the efficient functioning of Yahoo!`s Hadoop clusters. A key component that enables this efficient operation is data compression. With regard to compression algorithms, there is an underlying tension between compression ratio and compression performance. Consequently, Hadoop provides support for several compression algorithms, including gzip, bzip2, Snappy, LZ4 and others. This plethora of options can make it difficult for users to select appropriate codecs for their MapReduce jobs. This paper attempts to provide guidance in that regard. Performance results with Gridmix and with several corpuses of data are presented. The paper also describes enhancements we have made to the bzip2 codec that improve its performance. This will be of particular interest to the increasing number of users operating on “Big Data” who require the best possible ratios. The impact of using the Intel IPP libraries is also investigated; these have the potential to improve performance significantly. Finally, a few proposals for future enhancements to Hadoop in this area are outlined.

Compression Options in Hadoop - A Tale of Tradeoffs

DataWorks Summit

Enabling Vectorized Engine in Apache Spark

Kazuaki Ishizaki

The process of optimizing shard-aware drivers for ScyllaDB has involved several initiatives, often necessitating a complete rewrite from the ground up. Discover the efforts put into enhancing the performance of ScyllaDB drivers with a focus on Rust, and how its code base will serve as a foundation for drivers using other language bindings in the future. This session emphasizes the performance gains achieved by harnessing the power of the asynchronous Tokio framework as the backbone of a new, high-performance driver while thoughtfully architecting and optimizing various components of the driver.

Optimizing Performance in Rust for Low-Latency Database Drivers

ScyllaDB

Introduction to PySpark

Russell Jurney

LinkedIn leverages the Apache Hadoop ecosystem for its big data analytics. Steady growth of the member base at LinkedIn along with their social activities results in exponential growth of the analytics infrastructure. Innovations in analytics tooling lead to heavier workloads on the clusters, which generate more data, which in turn encourage innovations in tooling and more workloads. Thus, the infrastructure remains under constant growth pressure. Heterogeneous environments embodied via a variety of hardware and diverse workloads make the task even more challenging. This talk will tell the story of how we doubled our Hadoop infrastructure twice in the past two years. • We will outline our main use cases and historical rates of cluster growth in multiple dimensions. • We will focus on optimizations, configuration improvements, performance monitoring and architectural decisions we undertook to allow the infrastructure to keep pace with business needs. • The topics include improvements in HDFS NameNode performance, and fine tuning of block report processing, the block balancer, and the namespace checkpointer. • We will reveal a study on the optimal storage device for HDFS persistent journals (SATA vs. SAS vs. SSD vs. RAID). • We will also describe Satellite Cluster project which allowed us to double the objects stored on one logical cluster by splitting an HDFS cluster into two partitions without the use of federation and practically no code changes. • Finally, we will take a peek at our future goals, requirements, and growth perspectives. SPEAKERS Konstantin Shvachko, Sr Staff Software Engineer, LinkedIn Erik Krogen, Senior Software Engineer, LinkedIn

Scaling Hadoop at LinkedIn

DataWorks Summit

This presentation about Apache Spark covers all the basics that a beginner needs to know to get started with Spark. It covers the history of Apache Spark, what is Spark, the difference between Hadoop and Spark. You will learn the different components in Spark, and how Spark works with the help of architecture. You will understand the different cluster managers on which Spark can run. Finally, you will see the various applications of Spark and a use case on Conviva. Now, let's get started with what is Apache Spark. Below topics are explained in this Spark presentation: 1. History of Spark 2. What is Spark 3. Hadoop vs Spark 4. Components of Apache Spark 5. Spark architecture 6. Applications of Spark 7. Spark usecase What is this Big Data Hadoop training course about? The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab. What are the course objectives? Simplilearn’s Apache Spark and Scala certification training are designed to: 1. Advance your expertise in the Big Data Hadoop Ecosystem 2. Help you master essential Apache and Spark skills, such as Spark Streaming, Spark SQL, machine learning programming, GraphX programming and Shell Scripting Spark 3. Help you land a Hadoop developer job requiring Apache Spark expertise by giving you a real-life industry project coupled with 30 demos What skills will you learn? By completing this Apache Spark and Scala course you will be able to: 1. Understand the limitations of MapReduce and the role of Spark in overcoming these limitations 2. Understand the fundamentals of the Scala programming language and its features 3. Explain and master the process of installing Spark as a standalone cluster 4. Develop expertise in using Resilient Distributed Datasets (RDD) for creating applications in Spark 5. Master Structured Query Language (SQL) using SparkSQL 6. Gain a thorough understanding of Spark streaming features 7. Master and describe the features of Spark ML programming and GraphX programming Who should take this Scala course? 1. Professionals aspiring for a career in the field of real-time big data analytics 2. Analytics professionals 3. Research professionals 4. IT developers and testers 5. Data scientists 6. BI and reporting professionals 7. Students who wish to gain a thorough understanding of Apache Spark Learn more at https://www.simplilearn.com/big-data-and-analytics/apache-spark-scala-certification-training

What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...

Simplilearn

Processing Large Data with Apache Spark -- HasGeek

Venkata Naga Ravi

Spark SQL

Joud Khattab

What's hot (20)

Parquet performance tuning: the missing guide

Spark overview

Spark Performance Tuning .pdf

Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud

Apache Spark 101

PGConf.ASIA 2019 Bali - Setup a High-Availability and Load Balancing PostgreS...

Advanced Apache Spark Meetup Project Tungsten Nov 12 2015

Running Kafka as a Native Binary Using GraalVM with Ozan Günalp

Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...

Handle Large Messages In Apache Kafka

Introduction to Apache Spark Developer Training

PySpark dataframe

Compression Options in Hadoop - A Tale of Tradeoffs

Enabling Vectorized Engine in Apache Spark

Optimizing Performance in Rust for Low-Latency Database Drivers

Introduction to PySpark

Scaling Hadoop at LinkedIn

What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...

Processing Large Data with Apache Spark -- HasGeek

Spark SQL

Similar to MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library for Apache Spark with Miruna Oprescu

Sparkly Notebook: Interactive Analysis and Visualization with Spark

felixcss

Introduction to Apache Spark and MLlib

pumaranikar

Profiling & Testing with Spark

Roger Rafanell Mas

Spark tutorial

Sahan Bulathwela

Apache Spark Tutorial

Ahmet Bulut

As Apache Spark applications move to a containerized environment, there are many questions about how to best configure server systems in the container world. In this talk we will demonstrate a set of tools to better monitor performance and identify optimal configuration settings. We will demonstrate how Prometheus, a project that is now part of the Cloud Native Computing Foundation (CNCF: https://www.cncf.io/projects/), can be applied to monitor and archive system performance data in a containerized spark environment. In our examples, we will gather spark metric output through Prometheus and present the data with Grafana dashboards. We will use our examples to demonstrate how performance can be enhanced through different tuned configuration settings. Our demo will show how to configure settings across the cluster as well as within each node.

Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...

Databricks

Apache spark presentation

Mahboob Hussain

Apache Spark’s machine learning library provides a simple, elegant, yet powerful framework for creating scalable machine learning pipelines. It provides out of the box components for feature extraction and transformation, as well as various machine learning algorithms. However, in recent years specialized systems (such as TensorFlow, Caffe, PyTorch and Apache MXNet) have been dominant in the domain of AI and deep learning, as they allow greater performance and flexibility for training complex models. While there are a few deep learning frameworks that are Spark specific, in most cases these frameworks are separate from Spark and the ease of integration and feature set exposed varies considerably. This session will explore the role of Spark within the AI landscape, the current state of deep learning on top of Spark and the most recent developments in the Spark project to better integrate Spark with the deep learning ecosystem.

AI and Spark - IBM Community AI Day

Nick Pentreath

At Databricks, we have a unique view into over a hundred different companies trying out Spark for development and production use-cases, from their support tickets and forum posts. Having seen so many different workflows and applications, some discernible patterns emerge when looking at common performance and scalability issues that our users run into. This talk will discuss some of these common common issues from an engineering and operations perspective, describing solutions and clarifying misconceptions.

Spark Summit EU 2015: Lessons from 300+ production users

Databricks

Apache spark-melbourne-april-2015-meetup

Ned Shawa

"Apache Spark is today’s fastest growing Big Data analysis platform. Spark workloads typically maintain a persistent data set in memory, which is accessed multiple times over the network. Consequently, networking IO performance is a critical component in Spark systems. RDMA’s performance characteristics, such as high bandwidth, low latency, and low CPU overhead, offer a good opportunity for accelerating Spark by improving its data transfer facilities." "In this talk, we present a Java-based, RDMA network layer for Apache Spark. The implementation optimized both the RPC and the Shuffle mechanisms for RDMA. Initial benchmarking shows up to 25% improvement for Spark Applications." Watch the video presentation: http://wp.me/p3RLHQ-gzN Learn more: http://mellanox.com Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

Accelerating apache spark with rdma

inside-BigData.com

In machine learning projects, the preparation of large datasets is a key phase which can be complex and expensive. It was traditionally done by data engineers before the handover to data scientists or ML engineers. They operated in different environments due to the differences in the tools, frameworks and runtimes required in each phase. Spark's support for different types of workloads brought data engineering closer to the downstream activities like machine learning that depended on the data. Unifying data acquisition, preprocessing, training models and batch inferencing under a single platform enabled by Spark not only provided seamless experience between different phases and helped accelerate the end-to-end ML lifecycle but also lowered the TCO in the building, managing the infrastructure to cover different phases. With that, the needs of a shared infrastructure expanded to include specialized hardware like GPUs and support deep learning workloads as well. Spark can effectively make use of such infrastructure as it integrates with popular deep learning frameworks and supports acceleration of deep learning jobs using GPUs. In this talk, we share learnings and experiences in supporting different types of workloads in shared clusters equipped for doing deep learning as well as data engineering. We will cover the following topics: * Considerations for sharing the infrastructure for big data and deep learning in Spark * Deep learning in Spark in clusters with and without GPUs * Differences between distributed data processing and distributed machine learning * Multitenancy and isolation in shared infrastructure

Infrastructure for Deep Learning in Apache Spark

Databricks

Open Source Lambda Architecture for deep learning

Patrick Nicolas

Spark summit 2019 infrastructure for deep learning in apache spark 0425

Wee Hyong Tok

In the framework of the Intel Parallel Computing Centre at the Research Campus Garching in Munich, our group at LRZ presents recent results on performance optimization of Gadget-3, a widely used community code for computational astrophysics. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm and focus on threading parallelism optimization, change of the data layout into Structure of Arrays (SoA), compiler auto-vectorization and algorithmic improvements in the particle sorting. We measure lower execution time and improved threading scalability both on Intel Xeon (2.6× on Ivy Bridge) and Xeon Phi (13.7× on Knights Corner) systems. First tests on second generation Xeon Phi (Knights Landing) demonstrate the portability of the devised optimization solutions to upcoming architectures.

Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures

Dr. Fabio Baruffa

In-Memory Evolution in Apache Spark

Kazuaki Ishizaki

Emiliano Martinez | Deep learning in Spark Slides | Codemotion Madrid 2018

Codemotion

Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark

Databricks

Apache spark

TEJPAL GAUTAM

Overview and extended description: AI is expected to be the engine of technological advancements in the healthcare industry, especially in the areas of radiology and image processing. The purpose of this session is to demonstrate how we can build a AI-based Radiologist system using Apache Spark and Analytics Zoo to detect pneumonia and other diseases from chest x-ray images. The dataset, released by the NIH, contains around 110,00 X-ray images of around 30,000 unique patients, annotated with up to 14 different thoracic pathology labels. Stanford University developed a state-of-the-art model using CNN and exceeds average radiologist performance on the F1 metric. This talk focuses on how we can build a multi-label image classification model in a distributed Apache Spark infrastructure, and demonstrate how to build complex image transformations and deep learning pipelines using BigDL and Analytics Zoo with scalability and ease of use. Some practical image pre-processing procedures and evaluation metrics are introduced. We will also discuss runtime configuration, near-linear scalability for training and model serving, and other general performance topics.

Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...

Databricks

Similar to MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library for Apache Spark with Miruna Oprescu (20)

Sparkly Notebook: Interactive Analysis and Visualization with Spark

Introduction to Apache Spark and MLlib

Profiling & Testing with Spark

Spark tutorial

Apache Spark Tutorial

Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...

Apache spark presentation

AI and Spark - IBM Community AI Day

Spark Summit EU 2015: Lessons from 300+ production users

Apache spark-melbourne-april-2015-meetup

Accelerating apache spark with rdma

Infrastructure for Deep Learning in Apache Spark

Open Source Lambda Architecture for deep learning

Spark summit 2019 infrastructure for deep learning in apache spark 0425

Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures

In-Memory Evolution in Apache Spark

Emiliano Martinez | Deep learning in Spark Slides | Codemotion Madrid 2018

Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark

Apache spark

Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...

More from Spark Summit

In this session we will present a Configurable FPGA-Based Spark SQL Acceleration Architecture. It is target to leverage FPGA highly parallel computing capability to accelerate Spark SQL Query and for FPGA’s higher power efficiency than CPU we can lower the power consumption at the same time. The Architecture consists of SQL query decomposition algorithms, fine-grained FPGA based Engine Units which perform basic computation of sub string, arithmetic and logic operations. Using SQL query decomposition algorithm, we are able to decompose a complex SQL query into basic operations and according to their patterns each is fed into an Engine Unit. SQL Engine Units are highly configurable and can be chained together to perform complex Spark SQL queries, finally one SQL query is transformed into a Hardware Pipeline. We will present the performance benchmark results comparing the queries with FGPA-Based Spark SQL Acceleration Architecture on XEON E5 and FPGA to the ones with Spark SQL Query on XEON E5 with 10X ~ 100X improvement and we will demonstrate one SQL query workload from a real customer.

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang

Spark Summit

In this talk, we’ll present techniques for visualizing large scale machine learning systems in Spark. These are techniques that are employed by Netflix to understand and refine the machine learning models behind Netflix’s famous recommender systems that are used to personalize the Netflix experience for their 99 millions members around the world. Essential to these techniques is Vegas, a new OSS Scala library that aims to be the “missing MatPlotLib” for Spark/Scala. We’ll talk about the design of Vegas and its usage in Scala notebooks to visualize Machine Learning Models.

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...

Spark Summit

This presentation introduces how we design and implement a real-time processing platform using latest Spark Structured Streaming framework to intelligently transform the production lines in the manufacturing industry. In the traditional production line there are a variety of isolated structured, semi-structured and unstructured data, such as sensor data, machine screen output, log output, database records etc. There are two main data scenarios: 1) Picture and video data with low frequency but a large amount; 2) Continuous data with high frequency. They are not a large amount of data per unit. However the total amount of them is very large, such as vibration data used to detect the quality of the equipment. These data have the characteristics of streaming data: real-time, volatile, burst, disorder and infinity. Making effective real-time decisions to retrieve values from these data is critical to smart manufacturing. The latest Spark Structured Streaming framework greatly lowers the bar for building highly scalable and fault-tolerant streaming applications. Thanks to the Spark we are able to build a low-latency, high-throughput and reliable operation system involving data acquisition, transmission, analysis and storage. The actual user case proved that the system meets the needs of real-time decision-making. The system greatly enhance the production process of predictive fault repair and production line material tracking efficiency, and can reduce about half of the labor force for the production lines.

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu

Spark Summit

As common sense would suggest, weather has a definite impact on traffic. But how much? And under what circumstances? Can we improve traffic (congestion) prediction given weather data? Predictive traffic is envisioned to significantly impact how driver’s plan their day by alerting users before they travel, find the best times to travel, and over time, learn from new IoT data such as road conditions, incidents, etc. This talk will cover the traffic prediction work conducted jointly by IBM and the traffic data provider. As a part of this work, we conducted a case study over five large metropolitans in the US, 2.58 billion traffic records and 262 million weather records, to quantify the boost in accuracy of traffic prediction using weather data. We will provide an overview of our lambda architecture with Apache Spark being used to build prediction models with weather and traffic data, and Spark Streaming used to score the model and provide real-time traffic predictions. This talk will also cover a suite of extensions to Spark to analyze geospatial and temporal patterns in traffic and weather data, as well as the suite of machine learning algorithms that were used with Spark framework. Initial results of this work were presented at the National Association of Broadcasters meeting in Las Vegas in April 2017, and there is work to scale the system to provide predictions in over a 100 cities. Audience will learn about our experience scaling using Spark in offline and streaming mode, building statistical and deep-learning pipelines with Spark, and techniques to work with geospatial and time-series data.

Improving Traffic Prediction Using Weather Data with Ramya Raghavendra

Spark Summit

Graph is on the rise and it’s time to start learning about scalable graph analytics! In this session we will go over two Spark-based Graph Analytics frameworks: Tinkerpop and GraphFrames. While both frameworks can express very similar traversals, they have different performance characteristics and APIs. In this Deep-Dive by example presentation, we will demonstrate some common traversals and explain how, at a Spark level, each traversal is actually computed under the hood! Learn both the fluent Gremlin API as well as the powerful GraphFrame Motif api as we show examples of both simultaneously. No need to be familiar with Graphs or Spark for this presentation as we’ll be explaining everything from the ground up!

A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...

Spark Summit

Building accurate machine learning models has been an art of data scientists, i.e., algorithm selection, hyper parameter tuning, feature selection and so on. Recently, challenges to breakthrough this “black-arts” have got started. In cooperation with our partner, NEC Laboratories America, we have developed a Spark-based automatic predictive modeling system. The system automatically searches the best algorithm, parameters and features without any manual work. In this talk, we will share how the automation system is designed to exploit attractive advantages of Spark. The evaluation with real open data demonstrates that our system can explore hundreds of predictive models and discovers the most accurate ones in minutes on a Ultra High Density Server, which employs 272 CPU cores, 2TB memory and 17TB SSD in 3U chassis. We will also share open challenges to learn such a massive amount of models on Spark, particularly from reliability and stability standpoints. This talk will cover the presentation already shown on Spark Summit SF’17 (#SFds5) but from more technical perspective.

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...

Spark Summit

In Sweden, from the Rise ICE Data Center at www.hops.site, we are providing to reseachers both Spark-as-a-Service and, more recently, Tensorflow-as-a-Service as part of the Hops platform. In this talk, we examine the different ways in which Tensorflow can be included in Spark workflows, from batch to streaming to structured streaming applications. We will analyse the different frameworks for integrating Spark with Tensorflow, from Tensorframes to TensorflowOnSpark to Databrick’s Deep Learning Pipelines. We introduce the different programming models supported and highlight the importance of cluster support for managing different versions of python libraries on behalf of users. We will also present cluster management support for sharing GPUs, including Mesos and YARN (in Hops Hadoop). Finally, we will perform a live demonstration of training and inference for a TensorflowOnSpark application written on Jupyter that can read data from either HDFS or Kafka, transform the data in Spark, and train a deep neural network on Tensorflow. We will show how to debug the application using both Spark UI and Tensorboard, and how to examine logs and monitor training.

Apache Spark and Tensorflow as a Service with Jim Dowling

Spark Summit

Apache Spark and Tensorflow as a Service with Jim Dowling

Spark Summit

The Next Accelerator Logging Service (NXCALS) is a new Big Data project at CERN aiming to replace the existing Oracle-based service. The main purpose of the system is to store and present Controls/Infrastructure related data gathered from thousands of devices in the whole accelerator complex. The data is used to operate the machines, improve their performance and conduct studies for new beam types or future experiments. During this talk, Jakub will speak about NXCALS requirements and design choices that lead to the selected architecture based on Hadoop and Spark. He will present the Ingestion API, the abstractions behind the Meta-data Service and the Spark-based Extraction API where simple changes to the schema handling greatly improved the overall usability of the system. The system itself is not CERN specific and can be of interest to other companies or institutes confronted with similar Big Data problems.

Next CERN Accelerator Logging Service with Jakub Wozniak

Spark Summit

In Between (A mobile App for couples, downloaded 20M in Global), from daily batch for extracting metrics, analysis and dashboard. Spark is widely used by engineers and data analysts in Between, thanks to the performance and expendability of Spark, data operating has become extremely efficient. Entire team including Biz Dev, Global Operation, Designers are enjoying data results so Spark is empowering entire company for data driven operation and thinking. Kevin, Co-founder and Data Team leader of Between will be presenting how things are going in Between. Listeners will know how small and agile team is living with data (how we build organization, culture and technical base) after this presentation.

Powering a Startup with Apache Spark with Kevin Kim

Spark Summit

Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra

Spark Summit

In many cases, Big Data becomes just another buzzword because of the lack of tools that can support both the technological requirements for developing and deploying of the projects and/or the fluency of communication between the different profiles of people involved in the projects. In this talk, we will present Moriarty, a set of tools for fast prototyping of Big Data applications that can be deployed in an Apache Spark environment. These tools support the creation of Big Data workflows using the already existing functional blocks or supporting the creation of new functional blocks. The created workflow can then be deployed in a Spark infrastructure and used through a REST API. For better understanding of Moriarty, the prototyping process and the way it hides the Spark environment to the Big Data users and developers, we will present it together with a couple of examples based on a Industry 4.0 success cases and other on a logistic success case.

Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...

Spark Summit

Large-scale testing of new data products or enhancements to existing products in a research and development environment can be a technical challenge for data scientists. In some cases, tools available to data scientists lack production-level capacity, whereas other tools do not provide the algorithms needed to run the methodology. At Nielsen, the Databricks platform provided a solution to both of these challenges. This breakout session will cover a specific Nielsen business case where two methodology enhancements were developed and tested at large-scale using the Databricks platform. Development and large-scale testing of these enhancements would not have been possible using standard database tools.

How Nielsen Utilized Databricks for Large-Scale Research and Development with...

Spark Summit

Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...

Spark Summit

Since the invention of SQL and relational databases, data production has been about specifying how data is transformed through queries. While Apache Spark can certainly be used as a general distributed query engine, the power and granularity of Spark’s APIs enables a revolutionary increase in data engineering productivity: goal-based data production. Goal-based data production concerns itself with specifying WHAT the desired result is, leaving the details of HOW the result is achieved to a smart data warehouse running on top of Spark. That not only substantially increases productivity, but also significantly expands the audience that can work directly with Spark: from developers and data scientists to technical business users. With specific data and architecture patterns spanning the range from ETL to machine learning data prep and with live demos, this session will demonstrate how Spark users can gain the benefits of goal-based data production.

Goal Based Data Production with Sim Simeonov

Spark Summit

Have you imagined a simple machine learning solution able to prevent revenue leakage and monitor your distributed application? To answer this question, we offer a practical and a simple machine learning solution to create an intelligent monitoring application based on simple data analysis using Apache Spark MLlib. Our application uses linear regression models to make predictions and check if the platform is experiencing any operational problems that can impact in revenue losses. The application monitor distributed systems and provides notifications stating the problem detected, that way users can operate quickly to avoid serious problems which directly impact the company’s revenue and reduce the time for action. We will present an architecture for not only a monitoring system, but also an active actor for our outages recoveries. At the end of the presentation you will have access to our training program source code and you will be able to adapt and implement in your company. This solution already helped to prevent about US$3mi in losses last year.

Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...

Spark Summit

Getting Ready to use Redis with Apache Spark is a technical tutorial designed to address integrating Redis with an Apache Spark deployment to increase the performance of serving complex decision models. To set the context for the session, we start with a quick introduction to Redis and the capabilities Redis provides. We cover the basic data types provided by Redis and cover the module system. Using an ad serving use-case, we look at how Redis can improve the performance and reduce the cost of using complex ML-models in production. Attendees will be guided through the key steps of setting up and integrating Redis with Spark, including how to train a model using Spark then load and serve it using Redis, as well as how to work with the Spark Redis module. The capabilities of the Redis Machine Learning Module (redis-ml) will be discussed focusing primarily on decision trees and regression (linear and logistic) with code examples to demonstrate how to use these feature. At the end of the session, developers should feel confident building a prototype/proof-of-concept application using Redis and Spark. Attendees will understand how Redis complements Spark and how to use Redis to serve complex, ML-models with high performance.

Getting Ready to Use Redis with Apache Spark with Dvir Volk

Spark Summit

Here we present a general supervised framework for record deduplication and author-disambiguation via Spark. This work differentiates itself by – Application of Databricks and AWS makes this a scalable implementation. Compute resources are comparably lower than traditional legacy technology using big boxes 24/7. Scalability is crucial as Elsevier’s Scopus data, the biggest scientific abstract repository, covers roughly 250 million authorships from 70 million abstracts covering a few hundred years. – We create a fingerprint for each content by deep learning and/or word2vec algorithms to expedite pairwise similarity calculation. These encoders substantially reduce compute time while maintaining semantic similarity (unlike traditional TFIDF or predefined taxonomies). We will briefly discuss how to optimize word2vec training with high parallelization. Moreover, we show how these encoders can be used to derive a standard representation for all our entities namely such as documents, authors, users, journals, etc. This standard representation can simplify the recommendation problem into a pairwise similarity search and hence it can offer a basic recommender for cross-product applications where we may not have a dedicate recommender engine designed. – Traditional author-disambiguation or record deduplication algorithms are batch-processing with small to no training data. However, we have roughly 25 million authorships that are manually curated or corrected upon user feedback. Hence, it is crucial to maintain historical profiles and hence we have developed a machine learning implementation to deal with data streams and process them in mini batches or one document at a time. We will discuss how to measure the accuracy of such a system, how to tune it and how to process the raw data of pairwise similarity function into final clusters. Lessons learned from this talk can help all sort of companies where they want to integrate their data or deduplicate their user/customer/product databases.

Deduplication and Author-Disambiguation of Streaming Records via Supervised M...

Spark Summit

The use of large-scale machine learning and data mining methods is becoming ubiquitous in many application domains ranging from business intelligence and bioinformatics to self-driving cars. These methods heavily rely on matrix computations, and it is hence critical to make these computations scalable and efficient. These matrix computations are often complex and involve multiple steps that need to be optimized and sequenced properly for efficient execution. This work presents new efficient and scalable matrix processing and optimization techniques based on Spark. The proposed techniques estimate the sparsity of intermediate matrix-computation results and optimize communication costs. An evaluation plan generator for complex matrix computations is introduced as well as a distributed plan optimizer that exploits dynamic cost-based analysis and rule-based heuristics The result of a matrix operation will often serve as an input to another matrix operation, thus defining the matrix data dependencies within a matrix program. The matrix query plan generator produces query execution plans that minimize memory usage and communication overhead by partitioning the matrix based on the data dependencies in the execution plan. We implemented the proposed matrix techniques inside the Spark SQL, and optimize the matrix execution plan based on Spark SQL Catalyst. We conduct case studies on a series of ML models and matrix computations with special features on different datasets. These are PageRank, GNMF, BFGS, sparse matrix chain multiplications, and a biological data analysis. The open-source library ScaLAPACK and the array-based database SciDB are used for performance evaluation. Our experiments are performed on six real-world datasets are: social network data ( e.g., soc-pokec, cit-Patents, LiveJournal), Twitter2010, Netflix recommendation data, and 1000 Genomes Project sample. Experiments demonstrate that our proposed techniques achieve up to an order-of-magnitude performance.

MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...

Spark Summit

Kapil Malik and Arvind Heda will discuss a solution for interactive querying of large scale structured data, stored in a distributed file system (HDFS / S3), in a scalable and reliable manner using a unique combination of Spark SQL, Apache Zeppelin and Spark Job-server (SJS) on Yarn. The solution is production tested and can cater to thousands of queries processing terabytes of data every day. It contains following components – 1. Zeppelin server : A custom interpreter is deployed, which de-couples spark context from the user notebooks. It connects to the remote spark context on Spark Job-server. A rich set of APIs are exposed for the users. The user input is parsed, validated and executed remotely on SJS. 2. Spark job-server : A custom application is deployed, which implements the set of APIs exposed on Zeppelin custom interpreter, as one or more spark jobs. 3. Context router : It routes different user queries from custom interpreter to one of many Spark Job-servers / contexts. The solution has following characteristics – * Multi-tenancy There are hundreds of users, each having one or more Zeppelin notebooks. All these notebooks connect to same set of Spark contexts for running a job. * Fault tolerance The notebooks do not use Spark interpreter, but a custom interpreter, connecting to a remote context. If one spark context fails, the context router sends user queries to another context. * Load balancing Context router identifies which contexts are under heavy load / responding slowly, and selects the most optimal context for serving a user query. * Efficiency We use Alluxio for caching common datasets. * Elastic resource usage We use spark dynamic allocation for the contexts. This ensures that cluster resources are blocked by this application only when it’s doing some actual work.

Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...

Spark Summit

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu

Improving Traffic Prediction Using Weather Data with Ramya Raghavendra

A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...

Apache Spark and Tensorflow as a Service with Jim Dowling

Next CERN Accelerator Logging Service with Jakub Wozniak

Powering a Startup with Apache Spark with Kevin Kim

Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra

Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...

How Nielsen Utilized Databricks for Large-Scale Research and Development with...

Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...

Goal Based Data Production with Sim Simeonov

Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...

Getting Ready to Use Redis with Apache Spark with Dvir Volk

Deduplication and Author-Disambiguation of Streaming Records via Supervised M...

MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...

Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...

Recently uploaded

Model Call Girl Services in Delhi reach out to us at 🔝 9953056974 🔝✔️✔️ Our agency presents a selection of young, charming call girls available for bookings at Oyo Hotels. Experience high-class escort services at pocket-friendly rates, with our female escorts exuding both beauty and a delightful personality, ready to meet your desires. Whether it's Housewives, College girls, Russian girls, Muslim girls, or any other preference, we offer a diverse range of options to cater to your tastes. We provide both in-call and out-call services for your convenience. Our in-call location in Delhi ensures cleanliness, hygiene, and 100% safety, while our out-call services offer doorstep delivery for added ease. We value your time and money, hence we kindly request pic collectors, time-passers, and bargain hunters to refrain from contacting us. Our services feature various packages at competitive rates: One shot: ₹2000/in-call, ₹5000/out-call Two shots with one girl: ₹3500/in-call, ₹6000/out-call Body to body massage with sex: ₹3000/in-call Full night for one person: ₹7000/in-call, ₹10000/out-call Full night for more than 1 person: Contact us at 🔝 9953056974 🔝. for details Operating 24/7, we serve various locations in Delhi, including Green Park, Lajpat Nagar, Saket, and Hauz Khas near metro stations. For premium call girl services in Delhi 🔝 9953056974 🔝. Thank you for considering us!

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

9953056974 Low Rate Call Girls In Saket, Delhi NCR

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

adriantubila

Gen AI on Enterprise Cloud Apache NiFi Milvus Apache Kafka Apache Flink Cloudera Machine Learning Cloudera DataFlow https://medium.com/@tspann/building-a-milvus-connector-for-nifi-34372cb3c7fa https://www.meetup.com/futureofdata-princeton/events/300737266/ https://lu.ma/q7pcfyjn?source=post_page-----34372cb3c7fa--------------------------------&tk=TTyakY If you're interested in working with Generative AI on the cloud, this virtual workshop is for you. Tim Spann from Cloudera and Yujian Tang from Zilliz will cover how you can implement your own GenAI workflows on the cloud at enterprise scale. 9:00 - 9:05: Intro 9:05 - 9:15: What is Milvus 9:15 - 9:25: Cloudera Development Platform 9:25 - 10:00: Demo Location https://www.youtube.com/watch?v=IfWIzKsoHnA https://github.com/tspannhw/SpeakerProfile https://www.linkedin.com/in/yujiantang/

Generative AI on Enterprise Cloud with NiFi and Milvus

Timothy Spann

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore

amitlee9823

Discover Why Less is More in B2B Research

michael115558

Mature dropshipping via API with DroFx.pptx

olyaivanovalion

Call Girl In Dwarka ☎92055#41914 ¶¶ Indian,Russian Best Quality full Educated And Full Cooperative Independent Call Girls Escort Services In New Delhi- I Have Extremely Beautiful Broad Minded Cute Sexy & Hot Call Girls and Escorts, We Are Located in 3* 4* 5* Hotels in Delhi. Safe & Secure High Class Services Affordable Rate 100% Satisfaction, Unlimited Enjoyment. Any Time for Model/Teens Escort in Delhi High class luxury and premium escorts agency Indian Russian Call Girls In Delhi Booking Good High Profile Escorts (Call Girls) In Delhi 5 Star Hotel ,Incall Service,OutCall Service, We provide services by Call Girls,College Girls,Modals Get High Profile queens,Well Educated,Good Looking,Full Cooperative Model, Russian Models,Punjabi Girls Kashmeri Girls Services etc… We Provide Hottest Female With Safe And Consensual With Most Limits Respected Complete Satisfaction Guaranteed…Service. Call Me Spacial For Including Incall//outcall Service In New Delhi Indian Russian Escorts Service

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Delhi Call girls

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand Booking Contact Details :- WhatsApp Chat :- +91-7737669865 Call Girls In Model Towh +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in , Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At #K09 Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand

amitlee9823

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...

amitlee9823

Smarteg dropshipping via API with DroFx.pptx

olyaivanovalion

April 2024 - Crypto Market Report's Analysis

manisha194592

Invezz.com - Grow your wealth with trading signals

Invezz1

Saudi Arabia [ Abortion pills) Jeddah/riaydh/dammam/+966572737505☎️] cytotec tablets uses abortion pills 💊💊 How effective is the abortion pill? 💊💊 +966572737505) "Abortion pills in Jeddah" how to get cytotec tablets in Riyadh " Abortion pills in dammam*💊💊 The abortion pill is very effective. If you’re taking mifepristone and misoprostol, it depends on how far along the pregnancy is, and how many doses of medicine you take:💊💊 +966572737505) how to buy cytotec pills At 8 weeks pregnant or less, it works about 94-98% of the time. +966572737505[ 💊💊💊 At 8-9 weeks pregnant, it works about 94-96% of the time. +966572737505) At 9-10 weeks pregnant, it works about 91-93% of the time. +966572737505)💊💊 If you take an extra dose of misoprostol, it works about 99% of the time. At 10-11 weeks pregnant, it works about 87% of the time. +966572737505) If you take an extra dose of misoprostol, it works about 98% of the time. In general, taking both mifepristone and+966572737505 misoprostol works a bit better than taking misoprostol only. +966572737505 Taking misoprostol alone works to end the+966572737505 pregnancy about 85-95% of the time — depending on how far along the+966572737505 pregnancy is and how you take the medicine. +966572737505 The abortion pill usually works, but if it doesn’t, you can take more medicine or have an in-clinic abortion. +966572737505 When can I take the abortion pill?+966572737505 In general, you can have a medication abortion up to 77 days (11 weeks)+966572737505 after the first day of your last period. If it’s been 78 days or more since the first day of your last+966572737505 period, you can have an in-clinic abortion to end your pregnancy.+966572737505 Why do people choose the abortion pill? Which kind of abortion you choose all depends on your personal+966572737505 preference and situation. With+966572737505 medication+966572737505 abortion, some people like that you don’t need to have a procedure in a doctor’s office. You can have your medication abortion on your own+966572737505 schedule, at home or in another comfortable place that you choose.+966572737505 You get to decide who you want to be with during your abortion, or you can go it alone. Because+966572737505 medication abortion is similar to a miscarriage, many people feel like it’s more “natural” and less invasive. And some+966572737505 people may not have an in-clinic abortion provider close by, so abortion pills are more available to+966572737505 them. +966572737505 Your doctor, nurse, or health center staff can help you decide which kind of abortion is best for you. +966572737505 More questions from patients: Saudi Arabia+966572737505 CYTOTEC Misoprostol Tablets. Misoprostol is a medication that can prevent stomach ulcers if you also take NSAID medications. It reduces the amount of acid in your stomach, which protects your stomach lining. The brand name of this medication is Cytotec®.+966573737505) Unwanted Kit is a combination of two medici

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec

Abortion pills in Riyadh +966572737505 get cytotec

Ravak dropshipping via API with DroFx.pptx

olyaivanovalion

Klinik_ Apotek Onlin 085657271886 Solusi Menggugurkan Masalah Kehamilan Anda Jual Obat Aborsi Asli KLINIK ABORSI TERPEECAYA _ Jual Obat Aborsi Cytotec Misoprostol Asli 100% Ampuh Hanya 3 Jam Langsung Gugur || OBAT PENGGUGUR KANDUNGAN AMPUH MANJUR OBAT ABORSI OLINE" APOTIK Jual Obat Cytotec, Gastrul, Gynecoside Asli Ampuh. JUAL ” Obat Aborsi Tuntas | Obat Aborsi Manjur | Obat Aborsi Ampuh | Obat Penggugur Janin | Obat Pencegah Kehamilan | Obat Pelancar Haid | Obat terlambat Bulan | Ciri Obat Aborsi Asli | Obat Telat Bulan | Pil Aborsi Asli | Cara Menggugurkan Konten | Cara Aborsi Tuntas | Harga Obat Aborsi Asli | Pil Aborsi | Jual Obat Aborsi Cytotec | Cara Aborsi Sendiri | Cara Aborsi Usia 1 Bulan | Cara Aborsi Usia 2 Tahun | Cara Aborsi Usia 3 Bulan | Obat Aborsi Usia 4 Bulan | Cara Abrasi Usia 5 Bulan | Cara Menggugurkan Konten | Kandungan Obat Penggugur | Cara Menghitung Usia Konten | Cara Mengatasi Terlambat Bulan | Penjual Obat Aborsi Asli | Obat Aborsi Garansi | Kandungan Obat Peluntur | Obat Telat Datang Bulan | Obat Telat Haid | Obat Aborsi Paling Murah | Klinik Jual Obat Aborsi | Jual Pil Cytotec | Apotik Jual Obat Aborsi | Kandungan Dokter Abrasi | Cara Aborsi Cepat | Jual Obat Aborsi Bergaransi | Jual Obat Cytotec Asli | Obat Aborsi Aman Manjur | Obat Misoprostol Cytotec Asli. "APA ITU ABORSI" “Aborsi Adalah dengan membendung hormon yang di perlukan untuk mempertahankan kehamilan yaitu hormon progesteron, karena hormon ini dibendung, maka jalur kehamilan mulai membuka dan leher rahim menjadi melunak,sehingga mengeluarkan darah yang merupakan tanda bahwa obat telah bekerja || maksimal 1 jam obat diminum || PENJELASAN OBAT ABORSI USIA 1 _7 BULAN Pada usia kandungan ini, pasien akan merasakan sakit yang sedikit tidak berlebihan || sekitar 1 jam ||. namun hanya akan terjadi pada saatdarah keluar merupakan pertanda menstruasi. Hal ini dikarenakan pada usiakandungan 3 bulan,janin sudah terbentuk sebesar kepalan tangan orang dewasa. Cara kerja obat aborsi : JUAL OBAT ABORSI AMPUH dosis 3 bulan secara umum sama dengan cara kerja || DOSIS OBAT ABORSI 2 bulan”, hanya berbedanya selain mengisolasijanin juga menghancurkan janin dengan formula methotrexate dikandungdidalamnya. Formula methotrexate ini sangat ampuh untuk menghancurkan janinmenjadi serpihan-serpihan kecil akan sangat berguna pada saat dikeluarkan nanti. APA ALASAN WANITA MELAKUKAN ABORSI? Aborsi di lakukan wanita hamil baik yang sudah menikah maupun belum menikah dengan berbagai alasan , akan tetapi alasan yang utama adalah alasan-alasan non medis (termasuk aborsi sendiri / di sengaja/ buatan] MELAYANI PEMESANAN OBAT ABORSI SETIAP HARI, SIAP KIRIM KESELURUH KOTA BESAR DI INDONESIA DAN LUAR NEGERI. HUBUNGI PEMESANAN LEBIH NYAMAN VIA WA/: 085657271886

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

ZurliaSoop

Capstone Project on IBM Data Analytics Program

MoniSankarHazra

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Escorts Service Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

amitlee9823

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...

amitlee9823

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Escorts Service Booking Contact Details :- WhatsApp Chat :- +91-7737669865 4-May-2024(SMW) Call Girls In Model Towh +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in , Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...

amitlee9823

Probability Grade 10 Third Quarter Lessons

JoseMangaJr1

Recently uploaded (20)

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Generative AI on Enterprise Cloud with NiFi and Milvus

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore

Discover Why Less is More in B2B Research

Mature dropshipping via API with DroFx.pptx

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...

Smarteg dropshipping via API with DroFx.pptx

April 2024 - Crypto Market Report's Analysis

Invezz.com - Grow your wealth with trading signals

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec

Ravak dropshipping via API with DroFx.pptx

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

Capstone Project on IBM Data Analytics Program

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...

Probability Grade 10 Third Quarter Lessons

MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library for Apache Spark with Miruna Oprescu

1. Miruna Oprescu (moprescu@microsoft.com) Microsoft MMLSPARK: Lessons From Building A SparkML-Compatible Machine Learning Library For Apache Spark #EUai7

2. MMLSpark Microsoft Machine Learning Library for Apache Spark GitHub: https://github.com/Azure/mmlspark Spark Package: pyspark/spark-shell/spark-submit --packages Azure:mmlspark:0.9 Docker: docker run -it -p 8888:8888 -e ACCEPT_EULA=yes microsoft/mmlspark Navigate to http://localhost:8888 to view example Jupyter notebooks 2#EUai7

3. Why MMLSpark? • The (typical) data science workflow with Spark: 3#EUai7 ML algorithm Data transforms Data Model

4. Why MMLSpark? • Doesn’t look familiar?…maybe this does? 4#EUai7 ML algorithms Data transforms Data Pipeline Python/R UDFsData massaging External libraries (CNTK, OpenCV)

5. Why MMLSpark? • Workflow can be: – Slow & time-consuming – Intractable – Difficult to debug, reproduce – Hard to put in production 5#EUai7

6. MMLSpark Goals q Stay in the Spark ecosystem as much as possible by integrating domain-specific libraries (vision, text analytics, etc.) q Have better model management support q Bring cutting edge ML algorithms to Spark q Reduce the overhead from UDFs and other custom functions q Run on every platform & language supported by Spark 6#EUai7

7. MMLSpark Lesson #1: Follow the SparkML Pipeline model for composability. • MMLSpark consists of Transforms, Estimators and Models that can be combined with existing SparkML components into pipelines. • These abstractions ensure composability, reusability via serialization, logging, ease of use across languages. 7#EUai7

8. MMLSpark: Before and After 8#EUai7 Example: Book Reviews

9. MMLSpark: Before 9#EUai7

10. MMLSpark: After 10#EUai7

11. MMLSpark 11#EUai7 Lesson #2: Leverage SparkML abstractions to auto-generate Python and R interfaces. • Decreases development time, ensures feature parity, reduces errors and improves testing.

12. MMLSpark Architecture 12#EUai7 Pre- trained DNN models OpenCV Java Bindings Spark coreCNTK Java Bindings Scala API PySpark/R wrappers Wrapper generation

13. Python Wrappers: Example 13#EUai7 Scala source Python wrapper Generator

14. MMLSpark Lesson #3: Turn parallelizable algorithms from external libraries (e.g. OpenCV and CNTK) into Scala Pipeline Stages. • No data transfer overhead, all operations happen at the JVM level. • Enables cutting edge techniques such as Transfer Learning. 14#EUai7

15. MMLSpark Architecture 15 Pre- trained DNN models OpenCV Java Bindings Spark coreCNTK Java Bindings Scala API PySpark/R wrappers Wrapper generation #EUai7

16. MMLSpark: Example • DNN Featurization with OpenCV and CNTK 16#EUai7

17. MMLSpark Lesson #4: Run on every platform supported by Spark. Test at the highest possible level (Jupyter Notebooks). • Publish self-contained packages that can be used from a variety of targets. • Avoid common integration issues by testing directly with Jupyter Notebooks. 17#EUai7

18. MMLSpark: Bonus • Image Schema unification https://github.com/apache/spark/pull/19439 • Uses DataFrames as common format for reading images. • Standardizes handling of images as a datatype used by different algorithms. 18#EUai7

19. Use case: Snow Leopards 19#EUai7

20. Use case: Snow Leopards 20#EUai7 • 3,900-6,500 individuals left in the wild • Little known about their behavior, movement patterns, survival rates • Camera trapping since 2009 (~1.3 mil images)

21. Use case: Snow Leopards 21#EUai7 • Automatic Image Classification with MMLSpark: − Thousands of hours of researcher and volunteer time saved − Resources redeployed to science and conservation vs image sorting − Much more accurate data on range and population

22. Thank You! • Star our repo: https://github.com/Azure/mmlspark • Contact us: Myself: Miruna Oprescu (moprescu@microsoft.com) Dev Lead: Sudarshan Raghunathan (susudars@microsoft.com) PM: Roope Astala (roastala@microsoft.com) 22#EUai7

MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library for Apache Spark with Miruna Oprescu

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library for Apache Spark with Miruna Oprescu

Similar to MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library for Apache Spark with Miruna Oprescu (20)

More from Spark Summit

More from Spark Summit (20)

Recently uploaded

Recently uploaded (20)

MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library for Apache Spark with Miruna Oprescu