Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher Burdorf, NBC Universal)

•

9 likes•5,534 views

This document discusses using Spark MLlib to predict which digital media files should be offlined from storage to free up space. It describes using k-means clustering, naive Bayes classification, and support vector machines (SVM) on features like file size, age, and airing schedule. SVM performed best and allowed building a predictive system in under an hour. The system is run twice daily on a Spark cluster to select files for purging from a large storage system based on predictions. Some initial issues were addressed and the system is now running robustly in production.

Data & Analytics

USE OF SPARK MLLIB
FOR PREDICTING
OFFLINING OF DIGITAL
MEDIA
Christopher Burdorf
NBCUniversal

OVERVIEW
λ Problem Definition
λ Cluster Configurations
λ Parameters
λ MLLib libraries utilized
λ Results
λ Conclusions

Problem Definition
λ Digital media files distributed internationally 24/7 over internet to
cable TV channels in APAC, Europe, and Asia.
λ TV Shows, Commercials
λ Fast Isilon storage fills up frequently, thus offlining is necessary to
create space for new files
λ Multiple parameters are used to pick candidates
λ Previous system works well, but is slow and has large overhead

Cluster Configurations
λ Development
- 1 Linux System with 16 cores and 16 Gigs RAM
- 3 Mac OS Systems each with 8 cores and 16 Gigs RAM
- Mesos cluster manager
λ Production
- 3 Linux systems with 32 cores and 64 Gigs of RAM
- Mesos master and slaves running in Docker containers
-

Features/Parameters
λ File size
λ File age
λ Days since last airing
λ Days until next airing
λ Immediately remove files that have not been scheduled for
more than 3 weeks (was 2 weeks to start with).

K-Means Clustering
,
where xi(j) -cj is a chosen distance measure between a data point and the cluster centre , is an indicator of the
distance of the n data points from their respective cluster centers.

K means clustering results
λ Attempted with multiple centroid points
λ No meaningful clusters for any sets of data attempted
λ Results appeared to be tied to the nature of the data
λ Multiple parameters created difficulties

Naive Bayes classifier
λ From Bayes theorem: p(C_k|x)=(p(C_k)p(x|C_k))/p(x)
λ  posterior = (prior X likelihood)/evidence
λ Thus, the attempt was to classify media files for removal or
retention on the file system based on the different parameter
values and their expected results
λ Had difficulties training for many factors
- Adding new factors to training set altered results for
previously trained classifier and required retraining the entire
classification system

SVM Results
λ Found to be most robust method
λ Compensated well for new added features
λ Built training set from production data and auto-generated data
that fit the criteria
λ Spark Mllib optimization allows us to build a predictive system in
just over an hour
- Run twice daily due to constant changes in schedules and
online media.

Production environment
λ Spark Mllib SVM is trained and predictions are generated for
each media file online twice daily. Predictions are stored in
HBase
λ Media manager daemon regularly scans the file system and if
available space requires purging, it will select files based on
predictions stored in Hbase
λ Backup system checks to see if schedule has changed for a
given media file selected to be purged, because Spark SVM
predictions are only generated twice daily

Production issues
λ System was run in test mode for 6 weeks without a problem
λ Once switched to production, issues arose
- 2 weeks was too short a time to immediately offline files
λ Switched to 3 weeks
- Docker containers filled up with Spark temp files and
crashed over the weekend (ouch!)
λ Solved with cron job periodically removing them.
- Web service access to Hbase locked up.
λ Solved by having thread timeout on web services call.

Conclusions
λ System has been running in production mode for some time
now.
- Fine tuning appears to be complete
λ Spark SVM has performed well
- Fast and robust
λ Good application for machine learning though critical aspect of
the system increased task complexity.

What's hot

Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...

Spark Summit

Spark Summit EU talk by Luca Canali

Spark Summit

Extreme Apache Spark: how in 3 months we created a pipeline that can process ...

Josef A. Habdank

Stepping beyond ETL in batches, large enterprises are looking at ways to generate more up-to-date insights. As we step into the age of Continuous Application, this session will explore the ever more popular Structure Streaming API in Apache Spark, its application to R, and building examples of machine learning use cases. Starting with an introduction to the high-level concepts, the session will dive into the core of the execution plan internals and examine how SparkR extends the existing system to add the streaming capability. Learn how to build various data science applications on data streams integrating with R packages to leverage the rich R ecosystem of 10k+ packages. Session hashtag: #SFdev2

SSR: Structured Streaming for R and Machine Learning

felixcss

Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung

Spark Summit

Spark Summit EU talk by Qifan Pu

Spark Summit

Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)

Spark Summit

The majority of reported Spark deployments are now in the cloud. In such an environment, it is preferable for Spark to access data directly from services such as Amazon S3, thereby decoupling storage and compute. However, there are limitations to object stores such as S3. Chained or concurrent ETL jobs often run into issues on S3 due to inconsistent file listings and the lack of atomic rename support. Metadata performance also becomes an issue when running jobs over many thousands to millions of files. Speaker: Eric Liang This talk was originally presented at Spark Summit East 2017.

Robust and Scalable ETL over Cloud Storage with Apache Spark

Databricks

Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg

Spark Summit

spark-timeseries is a Scala / Java / Python library for interacting with time series data on Apache Spark. Time-series are an important part of data science applications, but are notoriously difficult in the context of distributed systems, due to their sequential nature. Getting this right is therefore a challenging but important element of progress in the universe of distributed systems applied to data science. This talk will cover the current overall design of spark-timeseries, the current functionalities, and will provide some usage examples. Because the project is still at an early stage, the talk will also cover the current weaknesses and future improvements that are in the spark-timeseries project roadmap.

Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette

Spark Summit

Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins

Spark Summit

Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...

Spark Summit

Spark Summit EU talk by Kent Buenaventura and Willaim Lau

Spark Summit

Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...

Spark Summit

Low Latency Execution For Apache Spark

Jen Aman

Re-Architecting Spark For Performance Understandability

Jen Aman

Swisscom is the leading mobile-service provider in Switzerland, with a market share high enough to enable us to model and understand the collective mobility in every area of the country. To accomplish that, we built an urban planning tool that helps cities better manage their infrastructure based on data-based insights, produced with Apache Spark, YARN, Kafka and a good dose of machine learning. In this talk, we will explain how building such a tool involves mining a massive amount of raw data (1.5E9 records/day) to extract fine-grained mobility features from raw network traces. These features are obtained using different machine learning algorithms. For example, we built an algorithm that segments a trajectory into mobile and static periods and trained classifiers that enable us to distinguish between different means of transport. As we sketch the different algorithmic components, we will present our approach to continuously run and test them, which involves complex pipelines managed with Oozie and fuelled with ground truth data. Finally, we will delve into the streaming part of our analytics and see how network events allow Swisscom to understand the characteristics of the flow of people on roads and paths of interest. This requires making a link between network coverage information and geographical positioning in the space of milliseconds and using Spark streaming with libraries that were originally designed for batch processing. We will conclude on the advantages and pitfalls of Spark involved in running this kind of pipeline on a multi-tenant cluster. Audiences should come back from this talk with an overall picture of the use of Apache Spark and related components of its ecosystem in the field of trajectory mining.

Mobility insights at Swisscom - Understanding collective mobility in Switzerland

François Garillot

Apache Flink vs Apache Spark - Reproducible experiments on cloud.

Shelan Perera

Taking Spark Streaming to the Next Level with Datasets and DataFrames

Databricks

Machine learning is being deployed in a growing number of applications which demand real-time, accurate, and robust predictions under heavy serving loads. However, most machine learning frameworks and systems only address model training and not deployment. Clipper is a general-purpose model-serving system that addresses these challenges. Interposing between applications that consume predictions and the machine-learning models that produce predictions, Clipper simplifies the model deployment process by isolating models in their own containers and communicating with them over a lightweight RPC system. This architecture allows models to be deployed for serving in the same runtime environment as that used during training. Further, it provides simple mechanisms for scaling out models to meet increased throughput demands and performing fine-grained physical resource allocation for each model. In this talk, I will provide an overview of the Clipper serving system and then discuss how to get started using Clipper to serve Spark and TensorFlow models in a production serving environment.

Clipper: A Low-Latency Online Prediction Serving System

Databricks

What's hot (20)

Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...

Spark Summit EU talk by Luca Canali

Extreme Apache Spark: how in 3 months we created a pipeline that can process ...

SSR: Structured Streaming for R and Machine Learning

Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung

Spark Summit EU talk by Qifan Pu

Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)

Robust and Scalable ETL over Cloud Storage with Apache Spark

Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg

Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette

Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins

Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...

Spark Summit EU talk by Kent Buenaventura and Willaim Lau

Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...

Low Latency Execution For Apache Spark

Re-Architecting Spark For Performance Understandability

Mobility insights at Swisscom - Understanding collective mobility in Switzerland

Apache Flink vs Apache Spark - Reproducible experiments on cloud.

Taking Spark Streaming to the Next Level with Datasets and DataFrames

Clipper: A Low-Latency Online Prediction Serving System

Similar to Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher Burdorf, NBC Universal)

ResumeJagannath

Jagannath Timma

S104872 spectrum nas-one-day-jburg-v1809e

Tony Pearson

071410 sun a_1515_feldman_stephen

Steve Feldman

Dipesh Singh 01112016

Dipesh Singh

Dynamic Provisioning of Data Intensive Computing Middleware Frameworks

Linh Ngo

Slides presented at Percona Live Europe Open Source Database Conference 2019, Amsterdam, 2019-10-01. Imagine a world where all Wikipedia articles disappear due to a human error or software bug. Sounds unreal? According to some estimations, it would take an excess of hundreds of million person-hours to be written again. To prevent that scenario from ever happening, our SRE team at Wikimedia recently refactored the relational database recovery system. In this session, we will discuss how we backup 550TB of MariaDB data without impacting the 15 billion page views per month we get. We will cover what were our initial plans to replace the old infrastructure, how we achieved recovering 2TB databases in less than 30 minutes while maintaining per-table granularity, as well as the different types of backups we implemented. Lastly, we will talk about lessons learned, what went well, how our original plans changed and future work.

Backing up Wikipedia Databases

Jaime Crespo

Testing Delphix: easy data virtualization

Franck Pachot

Update on Trinity System Procurement and Plans

inside-BigData.com

The rapid growth of in-memory compute applications is not surprising given the tremendous performance gains they can offer. Jobs that used to take hours can now take minutes or seconds as they are no longer subject to the rotational and seek latencies of spinning media. While Flash memory provide some relief, it is still a hundred times slower than the DRAM that in-memory compute applications utilize as their primary storage. One drawback to in-memory compute applications is the high cost associated with DRAM. Not only are the acquisition costs an order of magnitude more expensive than Flash, DRAM consumes far more power. Power can be a significant issue in data centers besides contributing a major part to the operational costs. In addition, a single server has limited capacity for DRAM and datasets that are larger, need to find an alternate solution or cope with the nuisance of sharding. Furthermore, in order to utilize the maximum capacity of DRAM in a server, higher cost DRAM needs to be installed further escalating the cost of compute. We discuss a paradigm to allow in-memory computing applications to extend their capacity by utilizing Flash memory; often with minimal performance loss. We give examples of applications that have been modified to utilize the paradigm and show performance comparisons. We also discuss TCO and the relative cost per transaction of the different solutions.

IMCSummit 2015 - Day 2 General Session - Flash-Extending In-Memory Computing

In-Memory Computing Summit

Combining Machine Learning frameworks with Apache Spark

DataWorks Summit/Hadoop Summit

At SpotX, we have built and maintained a portfolio of Spark Streaming applications -- all of which process records in the millions per minute. From pure data ingestion, to ETL, to real-time reporting, to live customer-facing products and features, continuous applications are in our DNA. Come along with us as we outline our journey from square one to present in the world of Spark Streaming. We'll detail what we've learned about efficient processing and monitoring, reliability and stability, and long term support of a streaming app. Come learn from our mistakes, and leave with some handy settings and designs you can implement in your own streaming apps.

Headaches and Breakthroughs in Building Continuous Applications

Databricks

Taking Splunk to the Next Level - Architecture Breakout Session

Splunk

Daya_DBA

Dayasekhar Babu Kurakalva

CD presentation march 12th, 2018

Ran Levy

Good practices to design and implement IT architecture based on AWS

LCloud

Jacek pracuje z chmurą obliczeniową AWS od 2008 roku, na początku jako niezależny ekspert/administrator AWS, obecnie architekt AWS/właściciel firmy, która jest pierwszym AWS Partnerem w Polsce. Wykorzystując swoje wieloletnie doświadczenie, pomaga klientom w różnej wielkości projektach, od start-ups po rozwiązania typu Enterprise, we wdrażaniu najlepszych rozwiązań na chmurze AWS, które spełnią wszystkie wymagania biznesowe.

Dobre praktyki projektowania architektury i wdrażania systemów IT dla chmury ...

Jacek Biernat

Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...

Landon Robinson

For increasing business opportunity, the Financial industry companies requires the power, flexibility and scalability of latest enterprise technologies for its 24/7 services. Rakuten Card, one of the largest credit card companies in Japan, recently renewed their credit card core processing systems utilizing with Java EE. Among the myriad of available technologies, why did we choose Exalogic and Exadata, with Apache Spark distributed configuration? How did we ported from one of the oldest COBOL based mainframe in Japan? What were the key of the success factors into launching and operating this mission critical service? This session unveils our great results, and how our selections are effective for financial enterprise systems.

Case Study: Credit Card Core System with Exalogic, Exadata, Oracle Cloud Mach...

Hirofumi Iwasaki

Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...

LarryZaman

BGOUG "Agile Data: revolutionizing database cloning'

Kyle Hailey

Similar to Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher Burdorf, NBC Universal) (20)

ResumeJagannath

S104872 spectrum nas-one-day-jburg-v1809e

071410 sun a_1515_feldman_stephen

Dipesh Singh 01112016

Dynamic Provisioning of Data Intensive Computing Middleware Frameworks

Backing up Wikipedia Databases

Testing Delphix: easy data virtualization

Update on Trinity System Procurement and Plans

IMCSummit 2015 - Day 2 General Session - Flash-Extending In-Memory Computing

Combining Machine Learning frameworks with Apache Spark

Headaches and Breakthroughs in Building Continuous Applications

Taking Splunk to the Next Level - Architecture Breakout Session

Daya_DBA

CD presentation march 12th, 2018

Good practices to design and implement IT architecture based on AWS

Dobre praktyki projektowania architektury i wdrażania systemów IT dla chmury ...

Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...

Case Study: Credit Card Core System with Exalogic, Exadata, Oracle Cloud Mach...

Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...

BGOUG "Agile Data: revolutionizing database cloning'

More from Spark Summit

In this session we will present a Configurable FPGA-Based Spark SQL Acceleration Architecture. It is target to leverage FPGA highly parallel computing capability to accelerate Spark SQL Query and for FPGA’s higher power efficiency than CPU we can lower the power consumption at the same time. The Architecture consists of SQL query decomposition algorithms, fine-grained FPGA based Engine Units which perform basic computation of sub string, arithmetic and logic operations. Using SQL query decomposition algorithm, we are able to decompose a complex SQL query into basic operations and according to their patterns each is fed into an Engine Unit. SQL Engine Units are highly configurable and can be chained together to perform complex Spark SQL queries, finally one SQL query is transformed into a Hardware Pipeline. We will present the performance benchmark results comparing the queries with FGPA-Based Spark SQL Acceleration Architecture on XEON E5 and FPGA to the ones with Spark SQL Query on XEON E5 with 10X ~ 100X improvement and we will demonstrate one SQL query workload from a real customer.

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang

Spark Summit

In this talk, we’ll present techniques for visualizing large scale machine learning systems in Spark. These are techniques that are employed by Netflix to understand and refine the machine learning models behind Netflix’s famous recommender systems that are used to personalize the Netflix experience for their 99 millions members around the world. Essential to these techniques is Vegas, a new OSS Scala library that aims to be the “missing MatPlotLib” for Spark/Scala. We’ll talk about the design of Vegas and its usage in Scala notebooks to visualize Machine Learning Models.

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...

Spark Summit

This presentation introduces how we design and implement a real-time processing platform using latest Spark Structured Streaming framework to intelligently transform the production lines in the manufacturing industry. In the traditional production line there are a variety of isolated structured, semi-structured and unstructured data, such as sensor data, machine screen output, log output, database records etc. There are two main data scenarios: 1) Picture and video data with low frequency but a large amount; 2) Continuous data with high frequency. They are not a large amount of data per unit. However the total amount of them is very large, such as vibration data used to detect the quality of the equipment. These data have the characteristics of streaming data: real-time, volatile, burst, disorder and infinity. Making effective real-time decisions to retrieve values from these data is critical to smart manufacturing. The latest Spark Structured Streaming framework greatly lowers the bar for building highly scalable and fault-tolerant streaming applications. Thanks to the Spark we are able to build a low-latency, high-throughput and reliable operation system involving data acquisition, transmission, analysis and storage. The actual user case proved that the system meets the needs of real-time decision-making. The system greatly enhance the production process of predictive fault repair and production line material tracking efficiency, and can reduce about half of the labor force for the production lines.

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu

Spark Summit

As common sense would suggest, weather has a definite impact on traffic. But how much? And under what circumstances? Can we improve traffic (congestion) prediction given weather data? Predictive traffic is envisioned to significantly impact how driver’s plan their day by alerting users before they travel, find the best times to travel, and over time, learn from new IoT data such as road conditions, incidents, etc. This talk will cover the traffic prediction work conducted jointly by IBM and the traffic data provider. As a part of this work, we conducted a case study over five large metropolitans in the US, 2.58 billion traffic records and 262 million weather records, to quantify the boost in accuracy of traffic prediction using weather data. We will provide an overview of our lambda architecture with Apache Spark being used to build prediction models with weather and traffic data, and Spark Streaming used to score the model and provide real-time traffic predictions. This talk will also cover a suite of extensions to Spark to analyze geospatial and temporal patterns in traffic and weather data, as well as the suite of machine learning algorithms that were used with Spark framework. Initial results of this work were presented at the National Association of Broadcasters meeting in Las Vegas in April 2017, and there is work to scale the system to provide predictions in over a 100 cities. Audience will learn about our experience scaling using Spark in offline and streaming mode, building statistical and deep-learning pipelines with Spark, and techniques to work with geospatial and time-series data.

Improving Traffic Prediction Using Weather Data with Ramya Raghavendra

Spark Summit

Graph is on the rise and it’s time to start learning about scalable graph analytics! In this session we will go over two Spark-based Graph Analytics frameworks: Tinkerpop and GraphFrames. While both frameworks can express very similar traversals, they have different performance characteristics and APIs. In this Deep-Dive by example presentation, we will demonstrate some common traversals and explain how, at a Spark level, each traversal is actually computed under the hood! Learn both the fluent Gremlin API as well as the powerful GraphFrame Motif api as we show examples of both simultaneously. No need to be familiar with Graphs or Spark for this presentation as we’ll be explaining everything from the ground up!

A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...

Spark Summit

Building accurate machine learning models has been an art of data scientists, i.e., algorithm selection, hyper parameter tuning, feature selection and so on. Recently, challenges to breakthrough this “black-arts” have got started. In cooperation with our partner, NEC Laboratories America, we have developed a Spark-based automatic predictive modeling system. The system automatically searches the best algorithm, parameters and features without any manual work. In this talk, we will share how the automation system is designed to exploit attractive advantages of Spark. The evaluation with real open data demonstrates that our system can explore hundreds of predictive models and discovers the most accurate ones in minutes on a Ultra High Density Server, which employs 272 CPU cores, 2TB memory and 17TB SSD in 3U chassis. We will also share open challenges to learn such a massive amount of models on Spark, particularly from reliability and stability standpoints. This talk will cover the presentation already shown on Spark Summit SF’17 (#SFds5) but from more technical perspective.

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...

Spark Summit

In Sweden, from the Rise ICE Data Center at www.hops.site, we are providing to reseachers both Spark-as-a-Service and, more recently, Tensorflow-as-a-Service as part of the Hops platform. In this talk, we examine the different ways in which Tensorflow can be included in Spark workflows, from batch to streaming to structured streaming applications. We will analyse the different frameworks for integrating Spark with Tensorflow, from Tensorframes to TensorflowOnSpark to Databrick’s Deep Learning Pipelines. We introduce the different programming models supported and highlight the importance of cluster support for managing different versions of python libraries on behalf of users. We will also present cluster management support for sharing GPUs, including Mesos and YARN (in Hops Hadoop). Finally, we will perform a live demonstration of training and inference for a TensorflowOnSpark application written on Jupyter that can read data from either HDFS or Kafka, transform the data in Spark, and train a deep neural network on Tensorflow. We will show how to debug the application using both Spark UI and Tensorboard, and how to examine logs and monitor training.

Apache Spark and Tensorflow as a Service with Jim Dowling

Spark Summit

Apache Spark and Tensorflow as a Service with Jim Dowling

Spark Summit

With the rapid growth of available datasets, it is imperative to have good tools for extracting insight from big data. The Spark ML library has excellent support for performing at-scale data processing and machine learning experiments, but more often than not, Data Scientists find themselves struggling with issues such as: low level data manipulation, lack of support for image processing, text analytics and deep learning, as well as the inability to use Spark alongside other popular machine learning libraries. To address these pain points, Microsoft recently released The Microsoft Machine Learning Library for Apache Spark (MMLSpark), an open-source machine learning library built on top of SparkML that seeks to simplify the data science process and integrate SparkML Pipelines with deep learning and computer vision libraries such as the Microsoft Cognitive Toolkit (CNTK) and OpenCV. With MMLSpark, Data Scientists can build models with 1/10th of the code through Pipeline objects that compose seamlessly with other parts of the SparkML ecosystem. In this session, we explore some of the main lessons learned from building MMLSpark. Join us if you would like to know how to extend Pipelines to ensure seamless integration with SparkML, how to auto-generate Python and R wrappers from Scala Transformers and Estimators, how to integrate and use previously non-distributed libraries in a distributed manner and how to efficiently deploy a Spark library across multiple platforms.

MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...

Spark Summit

The Next Accelerator Logging Service (NXCALS) is a new Big Data project at CERN aiming to replace the existing Oracle-based service. The main purpose of the system is to store and present Controls/Infrastructure related data gathered from thousands of devices in the whole accelerator complex. The data is used to operate the machines, improve their performance and conduct studies for new beam types or future experiments. During this talk, Jakub will speak about NXCALS requirements and design choices that lead to the selected architecture based on Hadoop and Spark. He will present the Ingestion API, the abstractions behind the Meta-data Service and the Spark-based Extraction API where simple changes to the schema handling greatly improved the overall usability of the system. The system itself is not CERN specific and can be of interest to other companies or institutes confronted with similar Big Data problems.

Next CERN Accelerator Logging Service with Jakub Wozniak

Spark Summit

In Between (A mobile App for couples, downloaded 20M in Global), from daily batch for extracting metrics, analysis and dashboard. Spark is widely used by engineers and data analysts in Between, thanks to the performance and expendability of Spark, data operating has become extremely efficient. Entire team including Biz Dev, Global Operation, Designers are enjoying data results so Spark is empowering entire company for data driven operation and thinking. Kevin, Co-founder and Data Team leader of Between will be presenting how things are going in Between. Listeners will know how small and agile team is living with data (how we build organization, culture and technical base) after this presentation.

Powering a Startup with Apache Spark with Kevin Kim

Spark Summit

Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra

Spark Summit

In many cases, Big Data becomes just another buzzword because of the lack of tools that can support both the technological requirements for developing and deploying of the projects and/or the fluency of communication between the different profiles of people involved in the projects. In this talk, we will present Moriarty, a set of tools for fast prototyping of Big Data applications that can be deployed in an Apache Spark environment. These tools support the creation of Big Data workflows using the already existing functional blocks or supporting the creation of new functional blocks. The created workflow can then be deployed in a Spark infrastructure and used through a REST API. For better understanding of Moriarty, the prototyping process and the way it hides the Spark environment to the Big Data users and developers, we will present it together with a couple of examples based on a Industry 4.0 success cases and other on a logistic success case.

Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...

Spark Summit

Large-scale testing of new data products or enhancements to existing products in a research and development environment can be a technical challenge for data scientists. In some cases, tools available to data scientists lack production-level capacity, whereas other tools do not provide the algorithms needed to run the methodology. At Nielsen, the Databricks platform provided a solution to both of these challenges. This breakout session will cover a specific Nielsen business case where two methodology enhancements were developed and tested at large-scale using the Databricks platform. Development and large-scale testing of these enhancements would not have been possible using standard database tools.

How Nielsen Utilized Databricks for Large-Scale Research and Development with...

Spark Summit

Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...

Spark Summit

Since the invention of SQL and relational databases, data production has been about specifying how data is transformed through queries. While Apache Spark can certainly be used as a general distributed query engine, the power and granularity of Spark’s APIs enables a revolutionary increase in data engineering productivity: goal-based data production. Goal-based data production concerns itself with specifying WHAT the desired result is, leaving the details of HOW the result is achieved to a smart data warehouse running on top of Spark. That not only substantially increases productivity, but also significantly expands the audience that can work directly with Spark: from developers and data scientists to technical business users. With specific data and architecture patterns spanning the range from ETL to machine learning data prep and with live demos, this session will demonstrate how Spark users can gain the benefits of goal-based data production.

Goal Based Data Production with Sim Simeonov

Spark Summit

Have you imagined a simple machine learning solution able to prevent revenue leakage and monitor your distributed application? To answer this question, we offer a practical and a simple machine learning solution to create an intelligent monitoring application based on simple data analysis using Apache Spark MLlib. Our application uses linear regression models to make predictions and check if the platform is experiencing any operational problems that can impact in revenue losses. The application monitor distributed systems and provides notifications stating the problem detected, that way users can operate quickly to avoid serious problems which directly impact the company’s revenue and reduce the time for action. We will present an architecture for not only a monitoring system, but also an active actor for our outages recoveries. At the end of the presentation you will have access to our training program source code and you will be able to adapt and implement in your company. This solution already helped to prevent about US$3mi in losses last year.

Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...

Spark Summit

Getting Ready to use Redis with Apache Spark is a technical tutorial designed to address integrating Redis with an Apache Spark deployment to increase the performance of serving complex decision models. To set the context for the session, we start with a quick introduction to Redis and the capabilities Redis provides. We cover the basic data types provided by Redis and cover the module system. Using an ad serving use-case, we look at how Redis can improve the performance and reduce the cost of using complex ML-models in production. Attendees will be guided through the key steps of setting up and integrating Redis with Spark, including how to train a model using Spark then load and serve it using Redis, as well as how to work with the Spark Redis module. The capabilities of the Redis Machine Learning Module (redis-ml) will be discussed focusing primarily on decision trees and regression (linear and logistic) with code examples to demonstrate how to use these feature. At the end of the session, developers should feel confident building a prototype/proof-of-concept application using Redis and Spark. Attendees will understand how Redis complements Spark and how to use Redis to serve complex, ML-models with high performance.

Getting Ready to Use Redis with Apache Spark with Dvir Volk

Spark Summit

Here we present a general supervised framework for record deduplication and author-disambiguation via Spark. This work differentiates itself by – Application of Databricks and AWS makes this a scalable implementation. Compute resources are comparably lower than traditional legacy technology using big boxes 24/7. Scalability is crucial as Elsevier’s Scopus data, the biggest scientific abstract repository, covers roughly 250 million authorships from 70 million abstracts covering a few hundred years. – We create a fingerprint for each content by deep learning and/or word2vec algorithms to expedite pairwise similarity calculation. These encoders substantially reduce compute time while maintaining semantic similarity (unlike traditional TFIDF or predefined taxonomies). We will briefly discuss how to optimize word2vec training with high parallelization. Moreover, we show how these encoders can be used to derive a standard representation for all our entities namely such as documents, authors, users, journals, etc. This standard representation can simplify the recommendation problem into a pairwise similarity search and hence it can offer a basic recommender for cross-product applications where we may not have a dedicate recommender engine designed. – Traditional author-disambiguation or record deduplication algorithms are batch-processing with small to no training data. However, we have roughly 25 million authorships that are manually curated or corrected upon user feedback. Hence, it is crucial to maintain historical profiles and hence we have developed a machine learning implementation to deal with data streams and process them in mini batches or one document at a time. We will discuss how to measure the accuracy of such a system, how to tune it and how to process the raw data of pairwise similarity function into final clusters. Lessons learned from this talk can help all sort of companies where they want to integrate their data or deduplicate their user/customer/product databases.

Deduplication and Author-Disambiguation of Streaming Records via Supervised M...

Spark Summit

The use of large-scale machine learning and data mining methods is becoming ubiquitous in many application domains ranging from business intelligence and bioinformatics to self-driving cars. These methods heavily rely on matrix computations, and it is hence critical to make these computations scalable and efficient. These matrix computations are often complex and involve multiple steps that need to be optimized and sequenced properly for efficient execution. This work presents new efficient and scalable matrix processing and optimization techniques based on Spark. The proposed techniques estimate the sparsity of intermediate matrix-computation results and optimize communication costs. An evaluation plan generator for complex matrix computations is introduced as well as a distributed plan optimizer that exploits dynamic cost-based analysis and rule-based heuristics The result of a matrix operation will often serve as an input to another matrix operation, thus defining the matrix data dependencies within a matrix program. The matrix query plan generator produces query execution plans that minimize memory usage and communication overhead by partitioning the matrix based on the data dependencies in the execution plan. We implemented the proposed matrix techniques inside the Spark SQL, and optimize the matrix execution plan based on Spark SQL Catalyst. We conduct case studies on a series of ML models and matrix computations with special features on different datasets. These are PageRank, GNMF, BFGS, sparse matrix chain multiplications, and a biological data analysis. The open-source library ScaLAPACK and the array-based database SciDB are used for performance evaluation. Our experiments are performed on six real-world datasets are: social network data ( e.g., soc-pokec, cit-Patents, LiveJournal), Twitter2010, Netflix recommendation data, and 1000 Genomes Project sample. Experiments demonstrate that our proposed techniques achieve up to an order-of-magnitude performance.

MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...

Spark Summit

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu

Improving Traffic Prediction Using Weather Data with Ramya Raghavendra

A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...

Apache Spark and Tensorflow as a Service with Jim Dowling

MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...

Next CERN Accelerator Logging Service with Jakub Wozniak

Powering a Startup with Apache Spark with Kevin Kim

Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra

Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...

How Nielsen Utilized Databricks for Large-Scale Research and Development with...

Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...

Goal Based Data Production with Sim Simeonov

Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...

Getting Ready to Use Redis with Apache Spark with Dvir Volk

Deduplication and Author-Disambiguation of Streaming Records via Supervised M...

MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...

Recently uploaded

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...

amitlee9823

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...

amitlee9823

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...

amitlee9823

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amritsar No💰Advance Cash On Delivery Service Escorts Service Available Whatsapp Chaya ☎️ : [+91-6367187148] Escorts Service Amritsar are always ready to make their clients happy. Their exotic looks and sexy personalities are sure to turn heads. You can enjoy with them, including massages and erotic encounters.#P12Our area Escorts are young and sexy, so you can expect to have an exotic time with them. They are trained to satiate your naughty nerves and they can handle anything that you want. They are also intelligent, so they know how to make you feel comfortable and relaxed SERVICE ✅ ❣️ ⭐➡️HOT & SEXY MODELS // COLLEGE GIRLS HOUSE WIFE RUSSIAN , AIR HOSTES ,VIP MODELS . AVAILABLE FOR COMPLETE ENJOYMENT WITH HIGH PROFILE INDIAN MODEL AVAILABLE HOTEL & HOME ★ SAFE AND SECURE HIGH CLASS SERVICE AFFORDABLE RATE ★ SATISFACTION,UNLIMITED ENJOYMENT. ★ All Meetings are confidential and no information is provided to any one at any cost. ★ EXCLUSIVE PROFILes Are Safe and Consensual with Most Limits Respected ★ Service Available In: - HOME & HOTEL Star Hotel Service .In Call & Out call SeRvIcEs : ★ A-Level (star escort) ★ Strip-tease ★ BBBJ (Bareback Blowjob)Receive advanced sexual techniques in different mode make their life more pleasurable. ★ Spending time in hotel rooms ★ BJ (Blowjob Without a Condom) ★ Completion (Oral to completion) ★ Covered (Covered blowjob Without condom ★ANAL SERVICES.

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...

karishmasinghjnh

Building Real-Time Pipelines With FLaNK Timothy Spann, Principal Developer Advocate, Streaming - Cloudera Future of Data meetup, startup grind, AI Camp The combination of Apache Flink, Apache NiFi, and Apache Kafka for building real-time data processing pipelines is extremely powerful, as demonstrated by this case study using the FLaNK-MTA project. The project leverages these technologies to process and analyze real-time data from the New York City Metropolitan Transportation Authority (MTA). FLaNK-MTA demonstrates how to efficiently collect, transform, and analyze high-volume data streams, enabling timely insights and decision-making. Apache NiFi Apache Kafka Apache Flink Apache Iceberg LLM Generative AI Slack Postgresql

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK

Timothy Spann

Ashok Vihar Call Girls in Delhi (–9953330565) Escort Service In Delhi NCR PROVIDE 100% REAL GIRLS ALL ARE GIRLS LOOKING MODELS AND RAM MODELS ALL GIRLS” INDIAN , RUSSIAN ,KASMARI ,PUNJABI HOT GIRLS AND MATURED HOUSE WIFE BOOKING ONLY DECENT GUYS AND GENTLEMAN NO FAKE PERSON FREE HOME SERVICE IN CALL FULL AC ROOM SERVICE IN SOUTH DELHI Ultimate Destination for finding a High Profile Independent Escorts in Delhi.Gurgaon.Noida..!.Like You Feel 100% Real Girl Friend Experience. We are High Class Delhi Escort Agency offering quality services with discretion. We only offer services to gentlemen people. We have lots of girls working with us like students, Russian, models, house wife, and much More We Provide Short Time and Full Night Service Call ☎☎+91–9953330565 ❤꧂ • In Call and Out Call Service in Delhi NCR • 3* 5* 7* Hotels Service in Delhi NCR • 24 Hours Available in Delhi NCR • Indian, Russian, Punjabi, Kashmiri Escorts • Real Models, College Girls, House Wife, Also Available • Short Time and Full Time Service Available • Hygienic Full AC Neat and Clean Rooms Avail. In Hotel 24 hours • Daily New Escorts Staff Available • Minimum to Maximum Range Available. Location;- Delhi, Gurgaon, NCR, Noida, and All Over in Delhi Hotel and Home Services HOTEL SERVICE AVAILABLE :-REDDISSON BLU,ITC WELCOM DWARKA,HOTEL-JW MERRIOTT,HOLIDAY INN MAHIPALPUR AIROCTY,CROWNE PLAZA OKHALA,EROSH NEHRU PLACE,SURYAA KALKAJI,CROWEN PLAZA ROHINI,SHERATON PAHARGANJ,THE AMBIENC,VIVANTA,SURAJKUND,ASHOKA CONTINENTAL , LEELA CHANKYAPURI,_ALL 3* 5* 7* STARTS HOTEL SERVICE BOOKING CALL Call WHATSAPP Call ☎+91–9953330565❤꧂ NIGHT SHORT TIME BOTH ARE AVAILABLE

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service

9953056974 Low Rate Call Girls In Saket, Delhi NCR

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Service Booking Contact Details :- WhatsApp Chat :- +91-7737669865 Call Girls In Model Towh +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in , India,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At #K09 Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...

amitlee9823

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Escorts Service Booking Contact Details :- WhatsApp Chat :- +91-7737669865 Call Girls In Model Towh +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in , Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At #K09 Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...

amitlee9823

Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (Mysore ) Booking Contact Details :- WhatsApp Chat :- +91-9352988975 If you're looking for India Call girls you've come to the right place. You'll find some of the most beautiful call girls in our location with. These ladies have pleasing personalities, hot figures, and a passion for physical pleasure. Call girls in India Lucknow Many men have booked them for their erotic and soul-mixing performances, which are sure to leave you with unforgettable memories. #K09 Escort Service India is available in the city for men and women of all ages. They can satisfy your sexual needs and will make your experience even more enjoyable and memorable. Whether you're looking for a blow-job, stripping, lovemaking, or other dirty acts, you'll be able to find a match for your tastes and budget. These highly trained professionals will help you have an unforgettable night. One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —9352988975 We are available 24*7 all days of the year. Call us — 9352988975 Thank you for Visiting.

Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...

gajnagarg

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (Bangalore) Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...

amitlee9823

Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl (Palakkad ) Booking Contact Details :- WhatsApp Chat :- +91-9352988975 If you're looking for India Call girls you've come to the right place. You'll find some of the most beautiful call girls in our location with. These ladies have pleasing personalities, hot figures, and a passion for physical pleasure. Call girls in India Lucknow Many men have booked them for their erotic and soul-mixing performances, which are sure to leave you with unforgettable memories. #K09 Escort Service India is available in the city for men and women of all ages. They can satisfy your sexual needs and will make your experience even more enjoyable and memorable. Whether you're looking for a blow-job, stripping, lovemaking, or other dirty acts, you'll be able to find a match for your tastes and budget. These highly trained professionals will help you have an unforgettable night. One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —9352988975 We are available 24*7 all days of the year. Call us — 9352988975 Thank you for Visiting.

Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...

gajnagarg

Saudi Arabia [ Abortion pills) Jeddah/riaydh/dammam/+966572737505☎️] cytotec tablets uses abortion pills 💊💊 How effective is the abortion pill? 💊💊 +966572737505) "Abortion pills in Jeddah" how to get cytotec tablets in Riyadh " Abortion pills in dammam*💊💊 The abortion pill is very effective. If you’re taking mifepristone and misoprostol, it depends on how far along the pregnancy is, and how many doses of medicine you take:💊💊 +966572737505) how to buy cytotec pills At 8 weeks pregnant or less, it works about 94-98% of the time. +966572737505[ 💊💊💊 At 8-9 weeks pregnant, it works about 94-96% of the time. +966572737505) At 9-10 weeks pregnant, it works about 91-93% of the time. +966572737505)💊💊 If you take an extra dose of misoprostol, it works about 99% of the time. At 10-11 weeks pregnant, it works about 87% of the time. +966572737505) If you take an extra dose of misoprostol, it works about 98% of the time. In general, taking both mifepristone and+966572737505 misoprostol works a bit better than taking misoprostol only. +966572737505 Taking misoprostol alone works to end the+966572737505 pregnancy about 85-95% of the time — depending on how far along the+966572737505 pregnancy is and how you take the medicine. +966572737505 The abortion pill usually works, but if it doesn’t, you can take more medicine or have an in-clinic abortion. +966572737505 When can I take the abortion pill?+966572737505 In general, you can have a medication abortion up to 77 days (11 weeks)+966572737505 after the first day of your last period. If it’s been 78 days or more since the first day of your last+966572737505 period, you can have an in-clinic abortion to end your pregnancy.+966572737505 Why do people choose the abortion pill? Which kind of abortion you choose all depends on your personal+966572737505 preference and situation. With+966572737505 medication+966572737505 abortion, some people like that you don’t need to have a procedure in a doctor’s office. You can have your medication abortion on your own+966572737505 schedule, at home or in another comfortable place that you choose.+966572737505 You get to decide who you want to be with during your abortion, or you can go it alone. Because+966572737505 medication abortion is similar to a miscarriage, many people feel like it’s more “natural” and less invasive. And some+966572737505 people may not have an in-clinic abortion provider close by, so abortion pills are more available to+966572737505 them. +966572737505 Your doctor, nurse, or health center staff can help you decide which kind of abortion is best for you. +966572737505 More questions from patients: Saudi Arabia+966572737505 CYTOTEC Misoprostol Tablets. Misoprostol is a medication that can prevent stomach ulcers if you also take NSAID medications. It reduces the amount of acid in your stomach, which protects your stomach lining. The brand name of this medication is Cytotec®.+966573737505) Unwanted Kit is a combination of two medici

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec

Abortion pills in Riyadh +966572737505 get cytotec

Anomaly detection and data imputation within time series

Paris Women in Machine Learning and Data Science

Digital advertising, or paid media, encompasses the strategic deployment of online advertisements to reach target audiences efficiently and effectively. This includes any digital platform that supports advertising to deliver unique messages for any objective. Understanding the mechanics of digital advertising platforms, along with insights into audience behaviors and preferences, allows marketers to optimize their ad spend and achieve significant engagement and conversion rates. This lecture is for Advanced Digital & Social Media Strategy (MGMTX 466.05) at UCLA Extension.

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...

Valters Lauzums

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand Booking Contact Details :- WhatsApp Chat :- +91-7737669865 Call Girls In Model Towh +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in , Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At #K09 Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand

amitlee9823

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand Booking Contact Details :- WhatsApp Chat :- +91-7737669865 Call Girls In Model Towh +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in , Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At #K09 Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand

amitlee9823

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K to 25K High Profile Escorts In Pune Booking Now open +91- 8005736733 Why you Choose Us- +91- 8005736733 HOT⇄ 8005736733 Mr ashu ji Call Mr ashu Ji +91- 8005736733 (V030524]N) 𝐇𝐨𝐭𝐞𝐥 𝐑𝐨𝐨𝐦𝐬 𝐈𝐧𝐜𝐥𝐮𝐝𝐢𝐧𝐠 𝐑𝐚𝐭𝐞 𝐒𝐡𝐨𝐭𝐬/𝐇𝐨𝐮𝐫𝐲🆓 .█▬█⓿▀█▀ 𝐈𝐍𝐃𝐄𝐏𝐄𝐍𝐃𝐄𝐍𝐓 𝐆𝐈𝐑𝐋 𝐕𝐈𝐏 𝐄𝐒𝐂𝐎𝐑𝐓 Hello Guys ! High Profiles young Beauties and Good Looking standard Profiles Available , Enquire Now if you are interested in Hifi Service and want to get connect with someone who can understand your needs. Service offers you the most beautiful High Profile sexy independent female Escorts in genuine ✔✔✔ To enjoy with hot and sexy girls ✔✔✔ ★providing:- • Models • vip Models • Russian Models • Foreigner Models • TV Actress and Celebrities • Receptionist • Air Hostess • Call Center Working Girls/Women • Hi-Tech Co. Girls/Women • Housewife

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...

SUHANI PANDEY

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Escorts Service Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

amitlee9823

Discover Why Less is More in B2B Research

michael115558

➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Escorts Service Booking Contact Details :- WhatsApp Chat :- +91-7737669865 Call Girls In Model Towh +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in , Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At #K09 Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...

amitlee9823

Recently uploaded (20)

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...

Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...

Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec

Anomaly detection and data imputation within time series

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

Discover Why Less is More in B2B Research

➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...

Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher Burdorf, NBC Universal)

1. USE OF SPARK MLLIB FOR PREDICTING OFFLINING OF DIGITAL MEDIA Christopher Burdorf NBCUniversal

2. OVERVIEW λ Problem Definition λ Cluster Configurations λ Parameters λ MLLib libraries utilized λ Results λ Conclusions

3. Problem Definition λ Digital media files distributed internationally 24/7 over internet to cable TV channels in APAC, Europe, and Asia. λ TV Shows, Commercials λ Fast Isilon storage fills up frequently, thus offlining is necessary to create space for new files λ Multiple parameters are used to pick candidates λ Previous system works well, but is slow and has large overhead

4. Cluster Configurations λ Development - 1 Linux System with 16 cores and 16 Gigs RAM - 3 Mac OS Systems each with 8 cores and 16 Gigs RAM - Mesos cluster manager λ Production - 3 Linux systems with 32 cores and 64 Gigs of RAM - Mesos master and slaves running in Docker containers - 

5. Features/Parameters λ File size λ File age λ Days since last airing λ Days until next airing λ Immediately remove files that have not been scheduled for more than 3 weeks (was 2 weeks to start with).

6. K-Means Clustering , where xi(j) -cj is a chosen distance measure between a data point and the cluster centre , is an indicator of the distance of the n data points from their respective cluster centers.

7. K means clustering results λ Attempted with multiple centroid points λ No meaningful clusters for any sets of data attempted λ Results appeared to be tied to the nature of the data λ Multiple parameters created difficulties

8. Naive Bayes classifier λ From Bayes theorem: p(C_k|x)=(p(C_k)p(x|C_k))/p(x) λ  posterior = (prior X likelihood)/evidence λ Thus, the attempt was to classify media files for removal or retention on the file system based on the different parameter values and their expected results λ Had difficulties training for many factors - Adding new factors to training set altered results for previously trained classifier and required retraining the entire classification system

9. Support Vector Machines

10. SVM Results λ Found to be most robust method λ Compensated well for new added features λ Built training set from production data and auto-generated data that fit the criteria λ Spark Mllib optimization allows us to build a predictive system in just over an hour - Run twice daily due to constant changes in schedules and online media.

11. Production environment λ Spark Mllib SVM is trained and predictions are generated for each media file online twice daily. Predictions are stored in HBase λ Media manager daemon regularly scans the file system and if available space requires purging, it will select files based on predictions stored in Hbase λ Backup system checks to see if schedule has changed for a given media file selected to be purged, because Spark SVM predictions are only generated twice daily

12. Production issues λ System was run in test mode for 6 weeks without a problem λ Once switched to production, issues arose - 2 weeks was too short a time to immediately offline files λ Switched to 3 weeks - Docker containers filled up with Spark temp files and crashed over the weekend (ouch!) λ Solved with cron job periodically removing them. - Web service access to Hbase locked up. λ Solved by having thread timeout on web services call.

13. Conclusions λ System has been running in production mode for some time now. - Fine tuning appears to be complete λ Spark SVM has performed well - Fast and robust λ Good application for machine learning though critical aspect of the system increased task complexity.

Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher Burdorf, NBC Universal)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher Burdorf, NBC Universal)

Similar to Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher Burdorf, NBC Universal) (20)

More from Spark Summit

More from Spark Summit (20)

Recently uploaded

Recently uploaded (20)

Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher Burdorf, NBC Universal)