Continuous Integration for Spark Apps by Sean McIntyre

•

11 likes•3,625 views

The document discusses the challenges of continuous integration for Apache Spark applications and presents a solution developed by Uncharted Software. It describes squeezing Spark, tests, and other tools into Docker containers to enable building and testing Spark apps across branches in a shared environment. This approach allows automating testing of Spark code commits, detecting issues early, and providing visibility of test results.

Data & Analytics

Hi, I’m Sean!
© 2015 Uncharted Software Inc.

It’s hard to test Spark Apps :(
© 2015 Uncharted Software Inc.

Case Study: Uncharted Spark Pipeline
© 2015 Uncharted Software Inc.

Case Study: Uncharted Spark Pipeline
Some key issues:
● Ensure reliability
● Prevent regressions
● Maintain compatibility with multiple versions of Spark
● Open-source - need a quick and easy way to evaluate PRs
© 2015 Uncharted Software Inc.

What is Continuous Integration?
© 2015 Uncharted Software Inc.

“Continuous Integration (CI) is a development practice that requires developers to integrate
code into a shared repository several times a day. Each check-in is then verified by an
automated build, allowing teams to detect problems early.”
-- ThoughtWorks
© 2015 Uncharted Software Inc.

“Continuous Integration (CI) is a development practice that is pretty damned
important for writing quality software.”
-- Me
© 2015 Uncharted Software Inc.

So, What is Continuous Integration?
© 2015 Uncharted Software Inc.

Best Practices, courtesy of Wikipedia
1. Maintain a code repository (Git)
© 2015 Uncharted Software Inc.

Best Practices, courtesy of Wikipedia
1. Maintain a code repository (Git)
2. Automate the build (Gradle)
© 2015 Uncharted Software Inc.

Best Practices, courtesy of Wikipedia
1. Maintain a code repository (Git)
2. Automate the build (Gradle)
3. Tests should be part of the build (ScalaTest)
© 2015 Uncharted Software Inc.

Best Practices, courtesy of Wikipedia
1. Maintain a code repository (Git)
2. Automate the build (Gradle)
3. Tests should be part of the build (ScalaTest)
4. Commit/push feature branches often
© 2015 Uncharted Software Inc.

Why are these difficult with Apache Spark?
5. Build (and test) All The Branches
6. Test in a clone of the production
environment
7. Keep the build fast
8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.

What is a Spark App?
© 2015 Uncharted Software Inc.

What is a Spark app?
Source
JAR
Spark ?
This thing.
JAR
© 2015 Uncharted Software Inc.

And...
Source
JAR
Spark ?
We need to test this
JAR
© 2015 Uncharted Software Inc.

But...
Source
JAR
ScalaTest
Scala RE
By default, we have this
JAR
(boom)
© 2015 Uncharted Software Inc.

v1: Squish Spark inside ScalaTest
Source
JAR
ScalaTest
with
SparkContext
So, we try this
JAR
it works!
(sort of)
© 2015 Uncharted Software Inc.

it works!
(sort of)
© 2015 Uncharted Software Inc.

6. Test in a clone of the production environment
© 2015 Uncharted Software Inc.

v2: Squish ScalaTest into Spark
Source
Test
JAR
Tests Main.scala
Spark
JAR
Test
JAR
Test
Output
JAR
© 2015 Uncharted Software Inc.

Main.scala
© 2015 Uncharted Software Inc.

Progress?
5. Build (and test) All The Branches
6. Test in a clone of the production environment
7. Keep the build fast
8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.

What now?
5. Build (and test) All The Branches
6. Test in a clone of the production environment
7. Keep the build fast
8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.

Docker Container
(uncharted/sparklet)
v3: Squish Spark and Test JAR into Docker
Test
Output
Source
Test
JAR
Tests Main.scala
Spark
JAR
JAR
Test
JAR
© 2015 Uncharted Software Inc.

build.gradle (excerpt)
© 2015 Uncharted Software Inc.

Travis CI VM
Docker Container
v4: Squish Docker into Travis CI
Test
Output
Source
Test
JAR
Tests Main.scala
Spark
JAR
JAR
Test
JAR
© 2015 Uncharted Software Inc.

.travis.yml
© 2015 Uncharted Software Inc.

All done!
5. Build (and test) All The Branches
6. Test in a clone of the production environment
7. Keep the build fast
8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.

Next Steps?
Alpine Linux
docker-compose
Windows (dev environment) support
python
© 2015 Uncharted Software Inc.

Questions?
https://github.com/unchartedsoftware/sparkpipe-core
https://github.com/Ghnuberath
@Ghnuberath
https://hub.docker.com/r/uncharted/sparklet/
smcintyre@unchartedsoftware.com

 Watch the recording here: https://youtu.be/cakxixc-yQk  ❗️ Notifications & Alerts ⚠️ When operating a cluster, different teams may wish to receive notifications about the status of their GitOps pipelines. For example, the on-call team would receive alerts about reconciliation failures in the cluster, while the dev team may wish to be alerted when a new version of an app was deployed and if the deployment is healthy.  Webhook Receivers  The GitOps toolkit controllers are by design pull-based. In order to notify the controllers about changes in Git or Helm repositories, you can setup webhooks and trigger a cluster reconciliation every time a source changes. Using webhook receivers, you can build push-based GitOps pipelines that react to external events. Alison Dowdney, Developer Experience Engineer at Weaveworks and CNCF Ambassador, walks through how to define a provider, an alert, git commit status, exposing the webhook receiver and defining a git repository and receiver. Resources  Flux2 Documentation: https://fluxcd.io/docs/  Flux Guide: Setup Notifications: https://fluxcd.io/docs/guides/notifications/  Flux Guide: Setup Webhook receivers: https://fluxcd.io/docs/guides/webhook-receivers/  Flux Roadmap: https://fluxcd.io/docs/roadmap/  Alison's Demo Repo: https://github.com/alisondy/flux-demos

Scrum Gathering Portugal 2016 - Containerizing Tests with Docker

Stefan Teixeira

DevOps Transformation in Technical

Opsta

The Seven Habits of Highly Effective Puppet Users - PuppetConf 2014

Puppet

6º Encontro do Grupo de Testes Carioca - Testes em um contexto de Continuous ...

Stefan Teixeira

Hands-on GitOps Patterns for Helm Users

Weaveworks

Hands-on GitOps Patterns for Helm Users YouTube Recording: https://youtu.be/ljouUBPtnuI There are a lot of opinions on how to structure Flux 2 manifests the "GitOps Way." Flux maintainers have given specific examples of how to properly do this in the Flux user guides, demos, and example repos. But most of these focus on Kustomize, and not yet on patterns for users who want to only use Helm. In this session, Scott Rigby, Developer Experience Engineer at Weaveworks, shares current work towards GitOps patterns for those who want to only use Helm with Flux 2. We welcome your feedback about use-cases and challenges!

Unit Testing in R with Testthat - HRUG

egoodwintx

Git Power Routines

Nicola Paolucci

Slide counterparts of video course found here: https://www.youtube.com/playlist?list=PLDshL1Z581YYxLsjYwM25HkIYrymXb7H_ Follow along or just sit back and enjoy a live, hands on tutorial on the power routines of experienced git users. We'll explore with real world examples how to amend commits, do an interactive rebase - and why would you want to do one in the first place, how to solve conflicts without any merge tools, the power of less known merge strategies, how to do interactive commits, and much more. Course notes Part 1/8: Introduction Who am I? Content overview. Choice of test project. Part 2/8: Housekeeping Find my aliases here. Enhance your shell with liquidprompt. Start with an empty commit. Short cuts to list commits with nice annotations: Part 3/8: Amending and rebasing Amend a commit. Reset a commit to perform a rename. Different resets affect different parts of a git repository. Part 4/8: Interactive rebase How to perform an interactive rebase to remove a binary file stuck in the repository Part 5/8: Solving conflicts In this 5th part of the course we'll show a few concepts useful when solving merge and rebase conflicts with an interactive example on how to solve one. We'll explain --ours, --theirs, conflict markers and what's the process needed to solve a conflict using Git. Part 6/8: What is a merge and alternative merge strategies We cover the basics of what a merge is in Git and we show the use of a couple of less known merge strategies like the "ours" merge strategy and the "octopus" strategy. Part 7/8: Git interactive add In this part we show how to perform and interactive add using Git. That is splitting the contents of some change across to more semantically meaningful commits. Part 8/8: How to use Git stash How to use git stash to solve common workflow situations, swap context with ease and smoothness and look cool to your colleagues.

Git with t for teams

Sven Peters

Git is not just a version control system. Git can change the way you interact with your team members. Lot’s of teams don’t think about reflecting their development workflow in Git and just use it out-of-the-box. Git, however, can be much more powerful, giving your team a boost in productivity, protecting your delivery pipeline and helping you to work better together. In this session we will start with a central workflow that is used by a lot of Subversion teams. You will learn how to practically integrate ALM solutions like continuous deployment, code reviews, change tracking and much more into your individual workflow. You will find out how to protect your master branch from accidental commits, broken builds and unreviewed code. This presentation will help you discover the best way to work together as a team – whether you’re yet to migrate to Git or even an experienced Git user.

20170807 - How to Fail Your TDD Rollout - A Train Wreck Story

Chris Edwards, P.Eng.

We dreamt of a future where our whole department embraced TDD. A future where the quality of our code and the product was elevated. We had the best intentions, however, this story does NOT have a happy ending. Learn from my experience working with the leaders of a department of 40 to attempt 100% TDD adoption. I will contrast this with our successful adoption and spread of SOLID design practices, and look at what we would have done differently with our TDD advocacy if we were to repeat it. Some of our lessons learned: Don't try to mandate TDD, bring in outside experts, and refactoring skills are key.

Achieving Continuous Delivery with Puppet

Devoteam Revolve

DevOps 及 TDD 開發流程哲學

謝宗穎

Scaling your CI Pipeline with Docker and Concourse

Chris Edwards, P.Eng.

Apigee deploy grunt plugin.1.0

Diego Zuluaga

Keeping your build tool updated in a multi repository world

Roberto Pérez Alcolea

Gradle is an open-source build automation tool focused on flexibility, build reproducibility and performance. Over the years, this tool has evolved and introduced new concepts and features around dependency management, publication and other aspects on build and release of artifacts for the Java platform. Keeping up to date with all these features across several projects can be challenging. How do you make sure that all your projects can be upgraded to the latest version of Gradle? What if you have thousands of projects and hundreds of engineers? How can you abstract common tasks for them and make sure that new releases work as expected? At Netflix, we built Nebula, a collection of Gradle plugins that helps engineers remove boilerplate in Gradle build files, and makes building software the Netflix way easy. This reduces the cognitive load on developers, allowing them to focus on writing code. In this talk, I’ll share with you our philosophy on how to build JVM artifacts and the pieces that help us boost the productivity of engineers at Netflix. I’ll talk about: - What is Nebula - What are the common problems we face and try to solve - How we distribute it to every JVM engineer - How we ensure that Nebula/Gradle changes do not break builds so we can ship new features with confidence at Netflix

What's hot

TDC 2016 Floripa - Testando APIs REST com Supertest e Promises

Stefan Teixeira

Fundamentals of DevOps and CI/CD

Batyr Nuryyev

Ágiles 2016 - Using open source tools to support Continuous Delivery

Stefan Teixeira

Latinoware 2016 - Continuous Delivery com ferramentas open source

Stefan Teixeira

TDC 2016 SP - Continuous Delivery para aplicações Java com ferramentas open-s...

Stefan Teixeira

Continuous Integration With Jenkins Docker SQL Server

Chris Adkin

True Git colleenfry

Setting up Notifications, Alerts & Webhooks with Flux v2 by Alison Dowdney

Weaveworks

Scrum Gathering Portugal 2016 - Containerizing Tests with Docker

Stefan Teixeira

DevOps Transformation in Technical

Opsta

The Seven Habits of Highly Effective Puppet Users - PuppetConf 2014

Puppet

6º Encontro do Grupo de Testes Carioca - Testes em um contexto de Continuous ...

Stefan Teixeira

Hands-on GitOps Patterns for Helm Users

Weaveworks

Unit Testing in R with Testthat - HRUG

egoodwintx

Git Power Routines

Nicola Paolucci

Git with t for teams

Sven Peters

20170807 - How to Fail Your TDD Rollout - A Train Wreck Story

Chris Edwards, P.Eng.

Achieving Continuous Delivery with Puppet

Devoteam Revolve

DevOps 及 TDD 開發流程哲學

謝宗穎

Scaling your CI Pipeline with Docker and Concourse

Chris Edwards, P.Eng.

What's hot (20)

TDC 2016 Floripa - Testando APIs REST com Supertest e Promises

Fundamentals of DevOps and CI/CD

Ágiles 2016 - Using open source tools to support Continuous Delivery

Latinoware 2016 - Continuous Delivery com ferramentas open source

TDC 2016 SP - Continuous Delivery para aplicações Java com ferramentas open-s...

Continuous Integration With Jenkins Docker SQL Server

True Git

Setting up Notifications, Alerts & Webhooks with Flux v2 by Alison Dowdney

Scrum Gathering Portugal 2016 - Containerizing Tests with Docker

DevOps Transformation in Technical

The Seven Habits of Highly Effective Puppet Users - PuppetConf 2014

6º Encontro do Grupo de Testes Carioca - Testes em um contexto de Continuous ...

Hands-on GitOps Patterns for Helm Users

Unit Testing in R with Testthat - HRUG

Git Power Routines

Git with t for teams

20170807 - How to Fail Your TDD Rollout - A Train Wreck Story

Achieving Continuous Delivery with Puppet

DevOps 及 TDD 開發流程哲學

Scaling your CI Pipeline with Docker and Concourse

Similar to Continuous Integration for Spark Apps by Sean McIntyre

Apigee deploy grunt plugin.1.0

Diego Zuluaga

Keeping your build tool updated in a multi repository world

Roberto Pérez Alcolea

Our DevOps Journey: 6 Month Waterfalls to 1 Hour Code Deploys

Dynatrace

Our DevOps Journey Transforming 6 Month Waterfalls to 1 Hour Code Deploys https://info.dynatrace.com/17q3_wc_from_agile_to_cloudy_devops_na_registration.html In the 2nd part of our webinar series, Anita Engleder, DevOps Lead at Dynatrace reviews and dissects lessons learned during the transformational journey moving Dynatrace from an on-prem culture to one that is cloud native. She will lend her perspective as a key member of the team that executed on the original vision: to “implement a new cloud native offering and deploy a new feature release every 2 weeks. Additionally, be able to support a 1-hour lead time from Code Change to Production”. On November 17th at 1pm/10am PT Anita will present the challenges she and her team faced transforming 6 Months Waterfall to 1 Hour Code Deploys. In this webinar Anita will discuss: How to enable a complete cultural shift across multiple teams, in terms of thought process AND execution What the specific role of her DevOps team is and how it played into the transformation The role of Feature teams and why continuous feedback is critical for them How to successfully influence key stakeholders for complete alignment Today Anita’s team runs 170 production changes every day, running across several AWS Data Centers as well as On-Premise – something that would have been thought impossible only a few years prior.

Continuous Load Testing with CloudTest and Jenkins

SOASTA

Two key challenges to continuous load testing are provisioning a test system to handle the load and accessing load generators to drive the traffic. In this webinar from SOASTA & CloudBees, you will learn how to: Build realistic automated web performance tests and run them in Jenkins Architect and launch a test environment that auto-provisions in the cloud Manage a load generation grid to drive load tests in a lights-out mode Establish a performance baseline in your daily Jenkins reports

Continuous integration and delivery for java based web applications

Sunil Dalal

Software Factory - Overviewslides_teltools

Continous integration and delivery for single page applications

Sunil Dalal

Continuous Delivery for IT Operations Teams

Mark Rendell

From 0 to DevOps in 80 Days [Webinar Replay]

Dynatrace

From 0 to DevOps in 80 Days Link to the webinar replay: https://info.dynatrace.com/apm_dtm_ops_17q3_wc_from_enterprise_tocloud_native_na_registration.html “Innovate or die” may sound extreme, but it’s the only way to thrive in today’s ever competitive market. Bernd Greifeneder, CTO of Dynatrace, wanted to ensure that the company was relevant 5 years from now so he formed an internal incubator with one goal: transform Dynatrace into a Cloud Native DevOps organization. The incubator focused on what the company needed to do in order to integrate nascent cloud technologies so that they wouldn’t be left in the dust when the inevitable tipping point to cloud arrives. Transforming into a cloud native company would allow for rapid release cycles and provide an embedded feedback loop. The Results: Dynatrace now has a 99.998% availability of SaaS Service and can deploy changes within an hour if necessary. In parallel, a new SaaS and managed offering is released every 2 weeks with 170 production updates per day. Watch this recorded webinar as Bernd Greifeneder shares the lessons learned moving Dynatrace from an on-prem company to one that is cloud native. Bernd discusses: • The driving factors that led to the transformation • The goals that were set back in 2011 towards the engineering team • How to sell such a transformation project in a large enterprise organization • How to support this multi-year project from top down without impacting regular operations • What's next on the innovator's mind

Emulators as an Emerging Best Practice for API Providers

Cisco DevNet

Modern software leverage a lot of external APIs, which raises several issues: how can development teams safely build and test their software when some components they rely on keep evolving. And what if some of these components are key for the whole software to even fire up. These are some of the challenges faced today by Cisco internal development teams, but also partner and enterprise developers using Cisco's software. The industry proposes two main strategies to circumvent these challenges: API Mocking, and Service Virtualization. Both of these strategies have pro and cons. I came up with the idea of proposing API emulators has a 3rd option. From another perspective, Serverless platforms are recognized to be unique as they are scalable and straightforward to use. It turns out that major industry vendors (Google, Amazon, Microsoft…) have invested to create Emulator as companions to their Serverless platforms, in order to attract, on-board and have developers adopt Serverless platforms. In this talk, I'll explain the motivation behind API emulators, in the perspective of DevOps, CI/CD, Development Processes, and Serverless/Microservices architectures. I'll illustrate the talk with several prototypes: - Mini-Spark: an Emulator for the Cisco Spark API which mimics the execution environment of the Cisco Spark Cloud platform - Tropo-Ready: an Emulator for Tropo Serverless Telephony platform, tied it to Visual Studio Code in order to reach to developer communities I'll dive into the details of the process to create such emulators, and elaborate on the consideration that Emulators are emerging as key components of API providers's strategy.

Continuous delivery is more than dev ops

Agile Montréal

Mobile App Quality Roadmap for DevTest Teams

Perfecto by Perforce

Continuous Delivery with a PaaS Application

Mark Rendell

Code Coverage for Total Security in Application Migrations

Dana Luther

So the time has come to take the leap and upgrade your application to a new major version of the underlying framework, or, perhaps, to an entirely different framework... how do you ensure that none of your functionality or usability is impacted by a potentially drastic rewrite of the underlying systems? How can you move forward with 100% confidence in your migrated codebase? Testing, testing and more testing. Using a combination of unit, functional and acceptance tests can give you the certainty you need. In this talk, we will go over key strategies for ensuring that you begin with full code coverage and move forward with confidence.

Adrian marinica continuous integration in the visual studio world

Codecamp Romania

Continuous Load Testing with CloudTest and Jenkins

SOASTA

.Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013

Tikal Knowledge

Spring Boot

Jaran Flaath

SAPUI5/OpenUI5 - Continuous Integration

Peter Muessig

Unit Tests and Test Seams for abap Hamburg June 2017 presented

Rainer Winkler

Similar to Continuous Integration for Spark Apps by Sean McIntyre (20)

Apigee deploy grunt plugin.1.0

Keeping your build tool updated in a multi repository world

Our DevOps Journey: 6 Month Waterfalls to 1 Hour Code Deploys

Continuous Load Testing with CloudTest and Jenkins

Continuous integration and delivery for java based web applications

Software Factory - Overview

Continous integration and delivery for single page applications

Continuous Delivery for IT Operations Teams

From 0 to DevOps in 80 Days [Webinar Replay]

Emulators as an Emerging Best Practice for API Providers

Continuous delivery is more than dev ops

Mobile App Quality Roadmap for DevTest Teams

Continuous Delivery with a PaaS Application

Code Coverage for Total Security in Application Migrations

Adrian marinica continuous integration in the visual studio world

Continuous Load Testing with CloudTest and Jenkins

.Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013

Spring Boot

SAPUI5/OpenUI5 - Continuous Integration

Unit Tests and Test Seams for abap Hamburg June 2017 presented

More from Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang

Spark Summit

In this session we will present a Configurable FPGA-Based Spark SQL Acceleration Architecture. It is target to leverage FPGA highly parallel computing capability to accelerate Spark SQL Query and for FPGA’s higher power efficiency than CPU we can lower the power consumption at the same time. The Architecture consists of SQL query decomposition algorithms, fine-grained FPGA based Engine Units which perform basic computation of sub string, arithmetic and logic operations. Using SQL query decomposition algorithm, we are able to decompose a complex SQL query into basic operations and according to their patterns each is fed into an Engine Unit. SQL Engine Units are highly configurable and can be chained together to perform complex Spark SQL queries, finally one SQL query is transformed into a Hardware Pipeline. We will present the performance benchmark results comparing the queries with FGPA-Based Spark SQL Acceleration Architecture on XEON E5 and FPGA to the ones with Spark SQL Query on XEON E5 with 10X ~ 100X improvement and we will demonstrate one SQL query workload from a real customer.

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...

Spark Summit

In this talk, we’ll present techniques for visualizing large scale machine learning systems in Spark. These are techniques that are employed by Netflix to understand and refine the machine learning models behind Netflix’s famous recommender systems that are used to personalize the Netflix experience for their 99 millions members around the world. Essential to these techniques is Vegas, a new OSS Scala library that aims to be the “missing MatPlotLib” for Spark/Scala. We’ll talk about the design of Vegas and its usage in Scala notebooks to visualize Machine Learning Models.

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu

Spark Summit

This presentation introduces how we design and implement a real-time processing platform using latest Spark Structured Streaming framework to intelligently transform the production lines in the manufacturing industry. In the traditional production line there are a variety of isolated structured, semi-structured and unstructured data, such as sensor data, machine screen output, log output, database records etc. There are two main data scenarios: 1) Picture and video data with low frequency but a large amount; 2) Continuous data with high frequency. They are not a large amount of data per unit. However the total amount of them is very large, such as vibration data used to detect the quality of the equipment. These data have the characteristics of streaming data: real-time, volatile, burst, disorder and infinity. Making effective real-time decisions to retrieve values from these data is critical to smart manufacturing. The latest Spark Structured Streaming framework greatly lowers the bar for building highly scalable and fault-tolerant streaming applications. Thanks to the Spark we are able to build a low-latency, high-throughput and reliable operation system involving data acquisition, transmission, analysis and storage. The actual user case proved that the system meets the needs of real-time decision-making. The system greatly enhance the production process of predictive fault repair and production line material tracking efficiency, and can reduce about half of the labor force for the production lines.

Improving Traffic Prediction Using Weather Data with Ramya Raghavendra

Spark Summit

As common sense would suggest, weather has a definite impact on traffic. But how much? And under what circumstances? Can we improve traffic (congestion) prediction given weather data? Predictive traffic is envisioned to significantly impact how driver’s plan their day by alerting users before they travel, find the best times to travel, and over time, learn from new IoT data such as road conditions, incidents, etc. This talk will cover the traffic prediction work conducted jointly by IBM and the traffic data provider. As a part of this work, we conducted a case study over five large metropolitans in the US, 2.58 billion traffic records and 262 million weather records, to quantify the boost in accuracy of traffic prediction using weather data. We will provide an overview of our lambda architecture with Apache Spark being used to build prediction models with weather and traffic data, and Spark Streaming used to score the model and provide real-time traffic predictions. This talk will also cover a suite of extensions to Spark to analyze geospatial and temporal patterns in traffic and weather data, as well as the suite of machine learning algorithms that were used with Spark framework. Initial results of this work were presented at the National Association of Broadcasters meeting in Las Vegas in April 2017, and there is work to scale the system to provide predictions in over a 100 cities. Audience will learn about our experience scaling using Spark in offline and streaming mode, building statistical and deep-learning pipelines with Spark, and techniques to work with geospatial and time-series data.

A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...

Spark Summit

Graph is on the rise and it’s time to start learning about scalable graph analytics! In this session we will go over two Spark-based Graph Analytics frameworks: Tinkerpop and GraphFrames. While both frameworks can express very similar traversals, they have different performance characteristics and APIs. In this Deep-Dive by example presentation, we will demonstrate some common traversals and explain how, at a Spark level, each traversal is actually computed under the hood! Learn both the fluent Gremlin API as well as the powerful GraphFrame Motif api as we show examples of both simultaneously. No need to be familiar with Graphs or Spark for this presentation as we’ll be explaining everything from the ground up!

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...

Spark Summit

Building accurate machine learning models has been an art of data scientists, i.e., algorithm selection, hyper parameter tuning, feature selection and so on. Recently, challenges to breakthrough this “black-arts” have got started. In cooperation with our partner, NEC Laboratories America, we have developed a Spark-based automatic predictive modeling system. The system automatically searches the best algorithm, parameters and features without any manual work. In this talk, we will share how the automation system is designed to exploit attractive advantages of Spark. The evaluation with real open data demonstrates that our system can explore hundreds of predictive models and discovers the most accurate ones in minutes on a Ultra High Density Server, which employs 272 CPU cores, 2TB memory and 17TB SSD in 3U chassis. We will also share open challenges to learn such a massive amount of models on Spark, particularly from reliability and stability standpoints. This talk will cover the presentation already shown on Spark Summit SF’17 (#SFds5) but from more technical perspective.

Apache Spark and Tensorflow as a Service with Jim Dowling

Spark Summit

In Sweden, from the Rise ICE Data Center at www.hops.site, we are providing to reseachers both Spark-as-a-Service and, more recently, Tensorflow-as-a-Service as part of the Hops platform. In this talk, we examine the different ways in which Tensorflow can be included in Spark workflows, from batch to streaming to structured streaming applications. We will analyse the different frameworks for integrating Spark with Tensorflow, from Tensorframes to TensorflowOnSpark to Databrick’s Deep Learning Pipelines. We introduce the different programming models supported and highlight the importance of cluster support for managing different versions of python libraries on behalf of users. We will also present cluster management support for sharing GPUs, including Mesos and YARN (in Hops Hadoop). Finally, we will perform a live demonstration of training and inference for a TensorflowOnSpark application written on Jupyter that can read data from either HDFS or Kafka, transform the data in Spark, and train a deep neural network on Tensorflow. We will show how to debug the application using both Spark UI and Tensorboard, and how to examine logs and monitor training.

Apache Spark and Tensorflow as a Service with Jim Dowling

Spark Summit

MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...

Spark Summit

With the rapid growth of available datasets, it is imperative to have good tools for extracting insight from big data. The Spark ML library has excellent support for performing at-scale data processing and machine learning experiments, but more often than not, Data Scientists find themselves struggling with issues such as: low level data manipulation, lack of support for image processing, text analytics and deep learning, as well as the inability to use Spark alongside other popular machine learning libraries. To address these pain points, Microsoft recently released The Microsoft Machine Learning Library for Apache Spark (MMLSpark), an open-source machine learning library built on top of SparkML that seeks to simplify the data science process and integrate SparkML Pipelines with deep learning and computer vision libraries such as the Microsoft Cognitive Toolkit (CNTK) and OpenCV. With MMLSpark, Data Scientists can build models with 1/10th of the code through Pipeline objects that compose seamlessly with other parts of the SparkML ecosystem. In this session, we explore some of the main lessons learned from building MMLSpark. Join us if you would like to know how to extend Pipelines to ensure seamless integration with SparkML, how to auto-generate Python and R wrappers from Scala Transformers and Estimators, how to integrate and use previously non-distributed libraries in a distributed manner and how to efficiently deploy a Spark library across multiple platforms.

Next CERN Accelerator Logging Service with Jakub Wozniak

Spark Summit

The Next Accelerator Logging Service (NXCALS) is a new Big Data project at CERN aiming to replace the existing Oracle-based service. The main purpose of the system is to store and present Controls/Infrastructure related data gathered from thousands of devices in the whole accelerator complex. The data is used to operate the machines, improve their performance and conduct studies for new beam types or future experiments. During this talk, Jakub will speak about NXCALS requirements and design choices that lead to the selected architecture based on Hadoop and Spark. He will present the Ingestion API, the abstractions behind the Meta-data Service and the Spark-based Extraction API where simple changes to the schema handling greatly improved the overall usability of the system. The system itself is not CERN specific and can be of interest to other companies or institutes confronted with similar Big Data problems.

Powering a Startup with Apache Spark with Kevin Kim

Spark Summit

In Between (A mobile App for couples, downloaded 20M in Global), from daily batch for extracting metrics, analysis and dashboard. Spark is widely used by engineers and data analysts in Between, thanks to the performance and expendability of Spark, data operating has become extremely efficient. Entire team including Biz Dev, Global Operation, Designers are enjoying data results so Spark is empowering entire company for data driven operation and thinking. Kevin, Co-founder and Data Team leader of Between will be presenting how things are going in Between. Listeners will know how small and agile team is living with data (how we build organization, culture and technical base) after this presentation.

Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra

Spark Summit

Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...

Spark Summit

In many cases, Big Data becomes just another buzzword because of the lack of tools that can support both the technological requirements for developing and deploying of the projects and/or the fluency of communication between the different profiles of people involved in the projects. In this talk, we will present Moriarty, a set of tools for fast prototyping of Big Data applications that can be deployed in an Apache Spark environment. These tools support the creation of Big Data workflows using the already existing functional blocks or supporting the creation of new functional blocks. The created workflow can then be deployed in a Spark infrastructure and used through a REST API. For better understanding of Moriarty, the prototyping process and the way it hides the Spark environment to the Big Data users and developers, we will present it together with a couple of examples based on a Industry 4.0 success cases and other on a logistic success case.

How Nielsen Utilized Databricks for Large-Scale Research and Development with...

Spark Summit

Large-scale testing of new data products or enhancements to existing products in a research and development environment can be a technical challenge for data scientists. In some cases, tools available to data scientists lack production-level capacity, whereas other tools do not provide the algorithms needed to run the methodology. At Nielsen, the Databricks platform provided a solution to both of these challenges. This breakout session will cover a specific Nielsen business case where two methodology enhancements were developed and tested at large-scale using the Databricks platform. Development and large-scale testing of these enhancements would not have been possible using standard database tools.

Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...

Spark Summit

Goal Based Data Production with Sim Simeonov

Spark Summit

Since the invention of SQL and relational databases, data production has been about specifying how data is transformed through queries. While Apache Spark can certainly be used as a general distributed query engine, the power and granularity of Spark’s APIs enables a revolutionary increase in data engineering productivity: goal-based data production. Goal-based data production concerns itself with specifying WHAT the desired result is, leaving the details of HOW the result is achieved to a smart data warehouse running on top of Spark. That not only substantially increases productivity, but also significantly expands the audience that can work directly with Spark: from developers and data scientists to technical business users. With specific data and architecture patterns spanning the range from ETL to machine learning data prep and with live demos, this session will demonstrate how Spark users can gain the benefits of goal-based data production.

Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...

Spark Summit

Have you imagined a simple machine learning solution able to prevent revenue leakage and monitor your distributed application? To answer this question, we offer a practical and a simple machine learning solution to create an intelligent monitoring application based on simple data analysis using Apache Spark MLlib. Our application uses linear regression models to make predictions and check if the platform is experiencing any operational problems that can impact in revenue losses. The application monitor distributed systems and provides notifications stating the problem detected, that way users can operate quickly to avoid serious problems which directly impact the company’s revenue and reduce the time for action. We will present an architecture for not only a monitoring system, but also an active actor for our outages recoveries. At the end of the presentation you will have access to our training program source code and you will be able to adapt and implement in your company. This solution already helped to prevent about US$3mi in losses last year.

Getting Ready to Use Redis with Apache Spark with Dvir Volk

Spark Summit

Getting Ready to use Redis with Apache Spark is a technical tutorial designed to address integrating Redis with an Apache Spark deployment to increase the performance of serving complex decision models. To set the context for the session, we start with a quick introduction to Redis and the capabilities Redis provides. We cover the basic data types provided by Redis and cover the module system. Using an ad serving use-case, we look at how Redis can improve the performance and reduce the cost of using complex ML-models in production. Attendees will be guided through the key steps of setting up and integrating Redis with Spark, including how to train a model using Spark then load and serve it using Redis, as well as how to work with the Spark Redis module. The capabilities of the Redis Machine Learning Module (redis-ml) will be discussed focusing primarily on decision trees and regression (linear and logistic) with code examples to demonstrate how to use these feature. At the end of the session, developers should feel confident building a prototype/proof-of-concept application using Redis and Spark. Attendees will understand how Redis complements Spark and how to use Redis to serve complex, ML-models with high performance.

Deduplication and Author-Disambiguation of Streaming Records via Supervised M...

Spark Summit

Here we present a general supervised framework for record deduplication and author-disambiguation via Spark. This work differentiates itself by – Application of Databricks and AWS makes this a scalable implementation. Compute resources are comparably lower than traditional legacy technology using big boxes 24/7. Scalability is crucial as Elsevier’s Scopus data, the biggest scientific abstract repository, covers roughly 250 million authorships from 70 million abstracts covering a few hundred years. – We create a fingerprint for each content by deep learning and/or word2vec algorithms to expedite pairwise similarity calculation. These encoders substantially reduce compute time while maintaining semantic similarity (unlike traditional TFIDF or predefined taxonomies). We will briefly discuss how to optimize word2vec training with high parallelization. Moreover, we show how these encoders can be used to derive a standard representation for all our entities namely such as documents, authors, users, journals, etc. This standard representation can simplify the recommendation problem into a pairwise similarity search and hence it can offer a basic recommender for cross-product applications where we may not have a dedicate recommender engine designed. – Traditional author-disambiguation or record deduplication algorithms are batch-processing with small to no training data. However, we have roughly 25 million authorships that are manually curated or corrected upon user feedback. Hence, it is crucial to maintain historical profiles and hence we have developed a machine learning implementation to deal with data streams and process them in mini batches or one document at a time. We will discuss how to measure the accuracy of such a system, how to tune it and how to process the raw data of pairwise similarity function into final clusters. Lessons learned from this talk can help all sort of companies where they want to integrate their data or deduplicate their user/customer/product databases.

MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...

Spark Summit

The use of large-scale machine learning and data mining methods is becoming ubiquitous in many application domains ranging from business intelligence and bioinformatics to self-driving cars. These methods heavily rely on matrix computations, and it is hence critical to make these computations scalable and efficient. These matrix computations are often complex and involve multiple steps that need to be optimized and sequenced properly for efficient execution. This work presents new efficient and scalable matrix processing and optimization techniques based on Spark. The proposed techniques estimate the sparsity of intermediate matrix-computation results and optimize communication costs. An evaluation plan generator for complex matrix computations is introduced as well as a distributed plan optimizer that exploits dynamic cost-based analysis and rule-based heuristics The result of a matrix operation will often serve as an input to another matrix operation, thus defining the matrix data dependencies within a matrix program. The matrix query plan generator produces query execution plans that minimize memory usage and communication overhead by partitioning the matrix based on the data dependencies in the execution plan. We implemented the proposed matrix techniques inside the Spark SQL, and optimize the matrix execution plan based on Spark SQL Catalyst. We conduct case studies on a series of ML models and matrix computations with special features on different datasets. These are PageRank, GNMF, BFGS, sparse matrix chain multiplications, and a biological data analysis. The open-source library ScaLAPACK and the array-based database SciDB are used for performance evaluation. Our experiments are performed on six real-world datasets are: social network data ( e.g., soc-pokec, cit-Patents, LiveJournal), Twitter2010, Netflix recommendation data, and 1000 Genomes Project sample. Experiments demonstrate that our proposed techniques achieve up to an order-of-magnitude performance.

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu

Improving Traffic Prediction Using Weather Data with Ramya Raghavendra

A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...

Apache Spark and Tensorflow as a Service with Jim Dowling

MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...

Next CERN Accelerator Logging Service with Jakub Wozniak

Powering a Startup with Apache Spark with Kevin Kim

Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra

Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...

How Nielsen Utilized Databricks for Large-Scale Research and Development with...

Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...

Goal Based Data Production with Sim Simeonov

Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...

Getting Ready to Use Redis with Apache Spark with Dvir Volk

Deduplication and Author-Disambiguation of Streaming Records via Supervised M...

MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...

Recently uploaded

一比一原版(CBU毕业证)卡普顿大学毕业证成绩单

nscud

CBU毕业证【微信95270640】☀《卡普顿大学毕业证购买》GoogleQ微信95270640《CBU毕业证模板办理》加拿大文凭、本科、硕士、研究生学历都可以做,二、业务范围： ★、全套服务：毕业证、成绩单、化学专业毕业证书伪造《卡普顿大学大学毕业证》Q微信95270640《CBU学位证书购买》专业为留学生办理卡普顿大学卡普顿大学本科毕业证成绩单【100%存档可查】留学全套申请材料办理。本公司承诺所有毕业证成绩单成品全部按照学校原版工艺对照一比一制作和学校一样的羊皮纸张保证您证书的质量！如果你回国在学历认证方面有以下难题请联系我们我们将竭诚为你解决认证瓶颈 1所有材料真实但资料不全无法提供完全齐整的原件。【如：成绩单丶毕业证丶回国证明等材料中有遗失的。】 2获得真实的国外最终学历学位但国外本科学历就读经历存在问题或缺陷。【如：国外本科是教育部不承认的或者是联合办学项目教育部没有备案的或者外本科没有正常毕业的。】 3学分转移联合办学等情况复杂不知道怎么整理材料的。时间紧迫自己不清楚递交流程的。如果你是以上情况之一请联系我们我们将在第一时间内给你免费咨询相关信息。我们将帮助你整理认证所需的各种材料.帮你解决国外学历认证难题。国外卡普顿大学卡普顿大学本科毕业证成绩单办理方法： 1客户提供办理信息：姓名生日专业学位毕业时间等（如信息不确定可以咨询顾问：我们有专业老师帮你查询卡普顿大学卡普顿大学本科毕业证成绩单）； 2开始安排制作卡普顿大学毕业证成绩单电子图； 3卡普顿大学毕业证成绩单电子版做好以后发送给您确认； 4卡普顿大学毕业证成绩单电子版您确认信息无误之后安排制作成品； 5卡普顿大学成品做好拍照或者视频给您确认； 6快递给客户（国内顺丰国外DHLUPS等快读邮寄）。父亲太伟大了居然能单匹马地跑到这么远这么大这么美的地方赚大钱高楼大厦鳞次栉比大街小巷人潮涌动山娃一路张望一路惊叹他发现城里的桥居然层层叠叠扭来扭去桥下没水却有着水一般的车水马龙山娃惊诧于城里的公交车那么大那么美不用买票乖乖地掷下二枚硬币空调享受还能坐着看电视呢屡经辗转山娃终于跟着父亲到家了山娃没想到父亲城里的家会如此寒碜更没料到父亲的城里竟有如此简陋的鬼地方父亲的家在高楼最底屋最下面很矮很黑是子

一比一原版(UVic毕业证)维多利亚大学毕业证成绩单

ukgaet

UVic毕业证【微信95270640】（维多利亚大学毕业证成绩单本科学历）Q微信95270640(补办UVic学位文凭证书)维多利亚大学留信网学历认证怎么办理维多利亚大学毕业证成绩单精仿本科学位证书硕士文凭证书认证Seneca College diplomaoffer,Transcript办理硕士学位证书造假维多利亚大学假文凭学位证书制作UVic本科毕业证书硕士学位证书精仿维多利亚大学学历认证成绩单修改制作，办理真实认证、留信认证、使馆公证、购买成绩单，购买假文凭，购买假学位证，制造假国外大学文凭、毕业公证、毕业证明书、录取通知书、Offer、在读证明、雅思托福成绩单、假文凭、假毕业证、请假条、国际驾照、网上存档可查！【实体公司】办维多利亚大学维多利亚大学毕业证文凭证书学历认证学位证文凭认证办留信网认证办留服认证办教育部认证（网上可查实体公司专业可靠） — — — 留学归国服务中心 — — - 【主营项目】一.维多利亚大学毕业证成绩单使馆认证教育部认证成绩单等！二.真实使馆公证(即留学回国人员证明,不成功不收费) 三.真实教育部学历学位认证（教育部存档！教育部留服网站永久可查）四.办理各国各大学文凭(一对一专业服务,可全程监控跟踪进度) 国外毕业证学位证成绩单办理流程： 1客户提供维多利亚大学维多利亚大学毕业证文凭证书办理信息：姓名生日专业学位毕业时间等（如信息不确定可以咨询顾问：我们有专业老师帮你查询）； 2开始安排制作毕业证成绩单电子图； 3毕业证成绩单电子版做好以后发送给您确认； 4毕业证成绩单电子版您确认信息无误之后安排制作成品； 5成品做好拍照或者视频给您确认； 6快递给客户（国内顺丰国外DHLUPS等快读邮寄）。专业服务请勿犹豫联系我！本公司是留学创业和海归创业者们的桥梁。一次办理终生受用一步到位高效服务。详情请在线咨询办理,欢迎有诚意办理的客户咨询!洽谈。招聘代理：本公司诚聘英国加拿大澳洲新西兰美国法国德国新加坡各地代理人员如果你有业余时间有兴趣就请联系我们咨询顾问：+微信:95270640刀劈开抑或用拳头砸开每人抱起一大块就啃啃得满嘴满脸猴屁股般的红艳大家一个劲地指着对方吃吃地笑瓜裂得古怪奇形怪状却丝毫不影响瓜味甜丝丝的满嘴生津遍地都是瓜横七竖八的活像掷满了一地的大石块摘走二三只爷爷是断然发现不了的即便发现爷爷也不恼反而教山娃辨认孰熟孰嫩孰甜孰淡名义上是护瓜往往在瓜棚里坐上一刻饱吃一顿后山娃就领着阿黑漫山遍野地跑阿黑是一条黑色的大猎狗挺机灵的是山娃多年的忠实伙伴平时山娃上学阿黑也静

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单

nscud

CBU毕业证【微信95270640】《如何办理不列颠海角大学毕业证认证》【办证Q微信95270640】《不列颠海角大学文凭毕业证制作》《CBU学历学位证书哪里买》办理不列颠海角大学学位证书扫描件、办理不列颠海角大学雅思证书！国际留学归国服务中心《如何办不列颠海角大学毕业证认证》《CBU学位证书扫描件哪里买》实体公司，注册经营，行业标杆，精益求精！ 1:1完美还原海外各大学毕业材料上的工艺：水印阴影底纹钢印LOGO烫金烫银LOGO烫金烫银复合重叠。文字图案浮雕激光镭射紫外荧光温感复印防伪。可办理以下真实不列颠海角大学存档留学生信息存档认证： 1不列颠海角大学真实留信网认证（网上可查永久存档无风险百分百成功入库）； 2真实教育部认证（留服）等一切高仿或者真实可查认证服务（暂时不可办理）； 3购买英美真实学籍（不用正常就读直接出学历）； 4英美一年硕士保毕业证项目（保录取学校挂名不用正常就读保毕业）留学本科/硕士毕业证书成绩单制作流程： 1客户提供办理信息：姓名生日专业学位毕业时间等（如信息不确定可以咨询顾问：我们有专业老师帮你查询不列颠海角大学不列颠海角大学本科学位证成绩单）； 2开始安排制作不列颠海角大学毕业证成绩单电子图； 3不列颠海角大学毕业证成绩单电子版做好以后发送给您确认； 4不列颠海角大学毕业证成绩单电子版您确认信息无误之后安排制作成品； 5不列颠海角大学成品做好拍照或者视频给您确认； 6快递给客户（国内顺丰国外DHLUPS等快读邮寄） — — — — — — — — — — — 《文凭顾问Q/微：95270640》这么大这么美的地方赚大钱高楼大厦鳞次栉比大街小巷人潮涌动山娃一路张望一路惊叹他发现城里的桥居然层层叠叠扭来扭去桥下没水却有着水一般的车水马龙山娃惊诧于城里的公交车那么大那么美不用买票乖乖地掷下二枚硬币空调享受还能坐着看电视呢屡经辗转山娃终于跟着父亲到家了山娃没想到父亲城里的家会如此寒碜更没料到父亲的城里竟有如此简陋的鬼地方父亲的家在高楼最底屋最下面很矮很黑是很不显眼的地下室父亲的家安在别人脚底下孰

Adjusting primitives for graph : SHORT REPORT / NOTES

Subhajit Sahu

Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is Multiply with different modes (map) 1. Performance of sequential execution based vs OpenMP based vector multiply. 2. Comparing various launch configs for CUDA based vector multiply. Sum with different storage types (reduce) 1. Performance of vector element sum using float vs bfloat16 as the storage type. Sum with different modes (reduce) 1. Performance of sequential execution based vs OpenMP based vector element sum. 2. Performance of memcpy vs in-place based CUDA based vector element sum. 3. Comparing various launch configs for CUDA based vector element sum (memcpy). 4. Comparing various launch configs for CUDA based vector element sum (in-place). Sum with in-place strategies of CUDA mode (reduce) 1. Comparing various launch configs for CUDA based vector element sum (in-place).

一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单

ewymefz

UPenn毕业证【微信95270640】办理宾夕法尼亚大学毕业证原版一模一样、UPenn毕业证制作【Q微信95270640】《宾夕法尼亚大学毕业证购买流程》《UPenn成绩单制作》宾夕法尼亚大学毕业证书UPenn毕业证文凭宾夕法尼亚大学本科毕业证书,学历学位认证如何办理【留学国外学位学历认证、毕业证、成绩单、大学Offer、雅思托福代考、语言证书、学生卡、高仿教育部认证等一切高仿或者真实可查认证服务】代办国外（海外）英国、加拿大、美国、新西兰、澳大利亚、新西兰等国外各大学毕业证、文凭学历证书、成绩单、学历学位认证真实可查。办国外宾夕法尼亚大学宾夕法尼亚大学硕士学位证成绩单教育部学历学位认证留信认证大使馆认证留学回国人员证明修改成绩单信封申请学校offer录取通知书在读证明offer letter。快速办理高仿国外毕业证成绩单： 1宾夕法尼亚大学毕业证+成绩单+留学回国人员证明+教育部学历认证（全套留学回国必备证明材料给父母及亲朋好友一份完美交代）; 2雅思成绩单托福成绩单OFFER在读证明等留学相关材料（申请学校转学甚至是申请工签都可以用到）。 3.毕业证 #成绩单等全套材料从防伪到印刷从水印到钢印烫金高精仿度跟学校原版100%相同。专业服务请勿犹豫联系我！联系人微信号：95270640诚招代理：本公司诚聘当地代理人员如果你有业余时间有兴趣就请联系我们。国外宾夕法尼亚大学宾夕法尼亚大学硕士学位证成绩单办理过程： 1客户提供办理信息：姓名生日专业学位毕业时间等（如信息不确定可以咨询顾问：我们有专业老师帮你查询）； 2开始安排制作毕业证成绩单电子图； 3毕业证成绩单电子版做好以后发送给您确认； 4毕业证成绩单电子版您确认信息无误之后安排制作成品； 5成品做好拍照或者视频给您确认； 6快递给客户（国内顺丰国外DHLUPS等快读邮寄）。我们在哪里父母对我们的爱和思念为我们的生命增加了光彩给予我们自由追求的力量生活的力量我们也不忘感恩正因为这股感恩的线牵着我们使我们在一年的结束时刻义无反顾的踏上了回家的旅途人们常说父母恩最难回报愿我能以当年爸爸妈妈对待小时候的我们那样耐心温柔地对待我将渐渐老去的父母体谅他们以反哺之心奉敬父母以感恩之心孝顺父母哪怕只为父母换洗衣服为父母喂饭送汤按摩酸痛的腰背握着父母的手扶着他们一步一步地慢慢散步.娃

Ch03-Managing the Object-Oriented Information Systems Project a.pdf

haila53

【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】

NABLAS株式会社

Empowering Data Analytics Ecosystem.pptx

benishzehra469

Show drafts volume_up Empowering the Data Analytics Ecosystem: A Laser Focus on Value The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem: 1. Democratize Access, Not Data: Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse. Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources. 2. Foster Collaboration with Clear Roles: Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities. Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together. 3. Leverage Advanced Analytics Strategically: AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis. Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems. 4. Prioritize Data Quality with Automation: Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues. Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors. 5. Cultivate a Data-Driven Mindset: Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making. Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action. Benefits of a Precise Ecosystem: Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency. Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights. Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement. Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation. By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.

Malana- Gimlet Market Analysis (Portfolio 2)

TravisMalana

Opendatabay - Open Data Marketplace.pptx

Opendatabay

Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets. First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience. From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets. Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.

一比一原版(NYU毕业证)纽约大学毕业证成绩单

ewymefz

NYU毕业证【微信95270640】《如何办理NYU毕业证纽约大学文凭学历》【Q微信95270640】《纽约大学文凭学历证书》《纽约大学毕业证书与成绩单样本图片》毕业证书补办 Fake Degree做学费单《毕业证明信-推荐信》成绩单，录取通知书，Offer，在读证明，雅思托福成绩单，真实大使馆教育部认证，回国人员证明，留信网认证。网上存档永久可查！【本科硕士】纽约大学纽约大学毕业证学位证（GPA修改）；学历认证（教育部认证）；大学Offer录取通知书留信认证使馆认证；雅思语言证书等高仿类证书。办理流程： 1客户提供办理纽约大学纽约大学毕业证学位证信息：姓名生日专业学位毕业时间等（如信息不确定可以咨询顾问：我们有专业老师帮你查询）； 2开始安排制作毕业证成绩单电子图； 3毕业证成绩单电子版做好以后发送给您确认； 4毕业证成绩单电子版您确认信息无误之后安排制作成品； 5成品做好拍照或者视频给您确认； 6快递给客户（国内顺丰国外DHLUPS等快读邮寄）真实网上可查的证明材料 1教育部学历学位认证留服官网真实存档可查永久存档。 2留学回国人员证明（使馆认证）使馆网站真实存档可查。我们对海外大学及学院的毕业证成绩单所使用的材料尺寸大小防伪结构（包括：纽约大学纽约大学毕业证学位证隐形水印阴影底纹钢印LOGO烫金烫银LOGO烫金烫银复合重叠。文字图案浮雕激光镭射紫外荧光温感复印防伪）都有原版本文凭对照。质量得到了广大海外客户群体的认可同时和海外学校留学中介做到与时俱进及时掌握各大院校的（毕业证成绩单资格证结业证录取通知书在读证明等相关材料）的版本更新信息能够在第一时间掌握最新的海外学历文凭的样版尺寸大小纸张材质防伪技术等等并在第一时间收集到原版实物以求达到客户的需求。本公司还可以按照客户原版印刷制作且能够达到客户理想的要求。有需要办理证件的客户请联系我们在线客服中心微信：95270640 或咨询在线已转到了尽头他的城市生活也将划上一个不很圆满的句号了值得庆幸的是山娃早记下了他们的学校和联系方式说也奇怪在山娃离城的头一天父亲居然请假陪山娃耍了一天那一天父亲陪着山娃辗转长隆水上乐园疯了一整天水上漂流高空冲浪看大马戏大凡里面有的父亲都带着他去疯一把山娃算了算这一次足足花了老爸元够他挣上半个月的山娃很不解一向节俭的父亲啥时变得如此阔绰大方大把大把掏钱时居然连眉头也不皱一下车票早买好了直达卧铺车得经子

Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...

Subhajit Sahu

Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.

FP Growth Algorithm and its Applications

MaleehaSheikh2

Criminal IP - Threat Hunting Webinar.pdf

Criminal IP

原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样

u86oixdj

学校原件一模一样【微信：741003700 】《(Deakin毕业证书)迪肯大学毕业证学位证》【微信：741003700 】学位证，留信认证（真实可查，永久存档）原件一模一样纸张工艺/offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原。 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700 【主营项目】一.毕业证【q微741003700】成绩单、使馆认证、教育部认证、雅思托福成绩单、学生卡等！二.真实使馆公证(即留学回国人员证明,不成功不收费) 三.真实教育部学历学位认证（教育部存档！教育部留服网站永久可查）四.办理各国各大学文凭(一对一专业服务,可全程监控跟踪进度) 如果您处于以下几种情况： ◇在校期间，因各种原因未能顺利毕业……拿不到官方毕业证【q/微741003700】 ◇面对父母的压力，希望尽快拿到； ◇不清楚认证流程以及材料该如何准备； ◇回国时间很长，忘记办理； ◇回国马上就要找工作，办给用人单位看； ◇企事业单位必须要求办理的 ◇需要报考公务员、购买免税车、落转户口 ◇申请留学生创业基金留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理

ahzuo

CBU毕业证offer【微信95270640】《卡普顿大学毕业证书》《QQ微信95270640》学位证书电子版：在线制作卡普顿大学毕业证成绩单GPA修改（制作CBU毕业证成绩单CBU文凭证书样本）、卡普顿大学毕业证书与成绩单样本图片、《CBU学历证书学位证书》、卡普顿大学毕业证案例毕业证书制作軟體、在线制作加拿大硕士学历证书真实可查. 如果您是以下情况，我们都能竭诚为您解决实际问题：【公司采用定金+余款的付款流程，以最大化保障您的利益，让您放心无忧】 1、在校期间，因各种原因未能顺利毕业，拿不到官方毕业证+微信95270640 2、面对父母的压力，希望尽快拿到卡普顿大学卡普顿大学毕业证成绩单； 3、不清楚流程以及材料该如何准备卡普顿大学卡普顿大学毕业证成绩单； 4、回国时间很长，忘记办理； 5、回国马上就要找工作，办给用人单位看； 6、企事业单位必须要求办理的；面向美国乔治城大学毕业留学生提供以下服务: 【★卡普顿大学卡普顿大学毕业证成绩单毕业证、成绩单等全套材料，从防伪到印刷，从水印到钢印烫金，与学校100%相同】【★真实使馆认证（留学人员回国证明），使馆存档可通过大使馆查询确认】【★真实教育部认证，教育部存档，教育部留服网站可查】【★真实留信认证，留信网入库存档，可查卡普顿大学卡普顿大学毕业证成绩单】我们从事工作十余年的有着丰富经验的业务顾问，熟悉海外各国大学的学制及教育体系，并且以挂科生解决毕业材料不全问题为基础，为客户量身定制1对1方案，未能毕业的回国留学生成功搭建回国顺利发展所需的桥梁。我们一直努力以高品质的教育为起点，以诚信、专业、高效、创新作为一切的行动宗旨，始终把“诚信为主、质量为本、客户第一”作为我们全部工作的出发点和归宿点。同时为海内外留学生提供大学毕业证购买、补办成绩单及各类分数修改等服务；归国认证方面，提供《留信网入库》申请、《国外学历学位认证》申请以及真实学籍办理等服务，帮助众多莘莘学子实现了一个又一个梦想。专业服务，请勿犹豫联系我如果您真实毕业回国，对于学历认证无从下手，请联系我，我们免费帮您递交诚招代理：本公司诚聘当地代理人员，如果你有业余时间，或者你有同学朋友需要，有兴趣就请联系我你赢我赢，共创双赢你做代理，可以帮助卡普顿大学同学朋友你做代理，可以拯救卡普顿大学失足青年你做代理，可以挽救卡普顿大学一个个人才你做代理，你将是别人人生卡普顿大学的转折点你做代理，可以改变自己，改变他人，给他人和自己一个机会道银边山娃摸索着扯了扯灯绳小屋顿时一片刺眼的亮瞅瞅床头的诺基亚山娃苦笑着摇了摇头连他自己都感到奇怪居然又睡到上午点半掐指算算随父亲进城已一个多星期了山娃几乎天天起得这么迟在乡下老家暑假五点多山娃就醒来在爷爷奶奶嘁嘁喳喳的忙碌声中一骨碌爬起把牛驱到后龙山再从莲塘里采回一蛇皮袋湿漉漉的莲蓬也才点多点半早就吃过早餐玩耍去了山娃的家在闽西山区依山傍水山清水秀门前潺潺流淌的蜿蜒小溪一直都是山娃和小伙伴们盛试

一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理

mbawufebxi

原版定制【微信:41543339】【(Bradford毕业证书)布拉德福德大学毕业证】【微信:41543339】成绩单、外壳、offer、留信学历认证（永久存档真实可查）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【我们承诺采用的是学校原版纸张（纸质、底色、纹路），我们拥有全套进口原装设备，特殊工艺都是采用不同机器制作，仿真度基本可以达到100%，所有工艺效果都可提前给客户展示，不满意可以根据客户要求进行调整，直到满意为止！】【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信41543339】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信41543339】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才留信网服务项目： 1、留学生专业人才库服务（留信分析） 2、国（境）学习人员提供就业推荐信服务 3、留学人员区块链存储服务 → 【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：客户在留信官方认证查询网站查询到认证通过结果后付款，不成功不收费！

SOCRadar Germany 2024 Threat Landscape Report

SOCRadar

As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape. In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity. 🔑 Key findings include: 🔍 Increased frequency and complexity of cyber threats. 🔍 Escalation of state-sponsored and criminally motivated cyber operations. 🔍 Active dark web exchanges of malicious tools and tactics. Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities. This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.

一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单

ewymefz

UofM毕业证【微信95270640】办文凭{明尼苏达大学毕业证}Q微Q微信95270640UofM毕业证书成绩单/学历认证UofM Diploma未毕业、挂科怎么办？+QQ微信：Q微信95270640-大学Offer（申请大学）、成绩单（申请考研）、语言证书、在读证明、使馆公证、办真实留信网认证、真实大使馆认证、学历认证办国外明尼苏达大学明尼苏达大学毕业证假文凭教育部学历学位认证留信认证大使馆认证留学回国人员证明修改成绩单信封申请学校offer录取通知书在读证明offer letter。快速办理高仿国外毕业证成绩单： 1明尼苏达大学毕业证+成绩单+留学回国人员证明+教育部学历认证（全套留学回国必备证明材料给父母及亲朋好友一份完美交代）; 2雅思成绩单托福成绩单OFFER在读证明等留学相关材料（申请学校转学甚至是申请工签都可以用到）。 3.毕业证 #成绩单等全套材料从防伪到印刷从水印到钢印烫金高精仿度跟学校原版100%相同。专业服务请勿犹豫联系我！联系人微信号：95270640诚招代理：本公司诚聘当地代理人员如果你有业余时间有兴趣就请联系我们。国外明尼苏达大学明尼苏达大学毕业证假文凭办理过程： 1客户提供办理信息：姓名生日专业学位毕业时间等（如信息不确定可以咨询顾问：我们有专业老师帮你查询）； 2开始安排制作毕业证成绩单电子图； 3毕业证成绩单电子版做好以后发送给您确认； 4毕业证成绩单电子版您确认信息无误之后安排制作成品； 5成品做好拍照或者视频给您确认； 6快递给客户（国内顺丰国外DHLUPS等快读邮寄）。有一次山娃坐在门口写作业写着写着竟伏在桌上睡着了迷迷糊糊中山娃似乎听到了父亲的脚步声当他晃晃悠悠站起来时才诧然发现一位衣衫破旧的妇女挎着一只硕大的蛇皮袋手里拎着长铁钩正站在门口朝黑色的屋内张望不好坏人小偷山娃一怔却也灵机一动立马仰起头双手拢在嘴边朝楼上大喊：“爸爸爸——有人找——那人一听朝山娃尴尬地笑笑悻悻地走了山娃立马“嘭的一声将铁门锁死心却咚咚地乱跳当山娃跟父亲说起这事时父亲很吃惊抚摸着山娃母

社内勉強会資料_LLM Agents　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　.

NABLAS株式会社

Recently uploaded (20)

一比一原版(CBU毕业证)卡普顿大学毕业证成绩单

一比一原版(UVic毕业证)维多利亚大学毕业证成绩单

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单

Adjusting primitives for graph : SHORT REPORT / NOTES

一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单

Ch03-Managing the Object-Oriented Information Systems Project a.pdf

【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】

Empowering Data Analytics Ecosystem.pptx

Malana- Gimlet Market Analysis (Portfolio 2)

Opendatabay - Open Data Marketplace.pptx

一比一原版(NYU毕业证)纽约大学毕业证成绩单

Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...

FP Growth Algorithm and its Applications

Criminal IP - Threat Hunting Webinar.pdf

原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理

一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理

SOCRadar Germany 2024 Threat Landscape Report

一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单

社内勉強会資料_LLM Agents　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　.

Continuous Integration for Spark Apps by Sean McIntyre

1. Continuous Integration for Spark Apps

5. Case Study: Uncharted Spark Pipeline Some key issues: ● Ensure reliability ● Prevent regressions ● Maintain compatibility with multiple versions of Spark ● Open-source - need a quick and easy way to evaluate PRs © 2015 Uncharted Software Inc.

7. “Continuous Integration (CI) is a development practice that requires developers to integrate code into a shared repository several times a day. Each check-in is then verified by an automated build, allowing teams to detect problems early.” -- ThoughtWorks © 2015 Uncharted Software Inc.

13. Best Practices, courtesy of Wikipedia 1. Maintain a code repository (Git) 2. Automate the build (Gradle) 3. Tests should be part of the build (ScalaTest) 4. Commit/push feature branches often © 2015 Uncharted Software Inc.

14. Best Practices, courtesy of Wikipedia 1. Maintain a code repository (Git) 2. Automate the build (Gradle) 3. Tests should be part of the build (ScalaTest) 4. Commit/push feature branches often 5. Build (and test) All The Branches © 2015 Uncharted Software Inc.

15. Best Practices, courtesy of Wikipedia 1. Maintain a code repository (Git) 2. Automate the build (Gradle) 3. Tests should be part of the build (ScalaTest) 4. Commit/push feature branches often 5. Build (and test) All The Branches 6. Test in a clone of the production environment © 2015 Uncharted Software Inc.

16. Best Practices, courtesy of Wikipedia 1. Maintain a code repository (Git) 2. Automate the build (Gradle) 3. Tests should be part of the build (ScalaTest) 4. Commit/push feature branches often 5. Build (and test) All The Branches 6. Test in a clone of the production environment 7. Keep the build fast © 2015 Uncharted Software Inc.

17. Best Practices, courtesy of Wikipedia 1. Maintain a code repository (Git) 2. Automate the build (Gradle) 3. Tests should be part of the build (ScalaTest) 4. Commit/push feature branches often 5. Build (and test) All The Branches 6. Test in a clone of the production environment 7. Keep the build fast 8. Everyone can see the results of builds © 2015 Uncharted Software Inc.

18. Best Practices, courtesy of Wikipedia 1. Maintain a code repository (Git) 2. Automate the build (Gradle) 3. Tests should be part of the build (ScalaTest) 4. Commit/push feature branches often 5. Build (and test) All The Branches 6. Test in a clone of the production environment 7. Keep the build fast 8. Everyone can see the results of builds }duh. © 2015 Uncharted Software Inc.

19. Best Practices, courtesy of Wikipedia 1. Maintain a code repository (Git) 2. Automate the build (Gradle) 3. Tests should be part of the build (ScalaTest) 4. Commit/push feature branches often 5. Build (and test) All The Branches 6. Test in a clone of the production environment 7. Keep the build fast 8. Everyone can see the results of builds }...less duh. © 2015 Uncharted Software Inc.

20. Why are these difficult with Apache Spark? 5. Build (and test) All The Branches 6. Test in a clone of the production environment 7. Keep the build fast 8. Everyone can see the results of builds © 2015 Uncharted Software Inc.

45. Questions? https://github.com/unchartedsoftware/sparkpipe-core https://github.com/Ghnuberath @Ghnuberath https://hub.docker.com/r/uncharted/sparklet/ smcintyre@unchartedsoftware.com

Continuous Integration for Spark Apps by Sean McIntyre

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Continuous Integration for Spark Apps by Sean McIntyre

Similar to Continuous Integration for Spark Apps by Sean McIntyre (20)

More from Spark Summit

More from Spark Summit (20)

Recently uploaded

Recently uploaded (20)

Continuous Integration for Spark Apps by Sean McIntyre