End-to-End Data Pipelines with Apache Spark

•

3 likes•612 views

This presentation is about building a data product backed by Apache Spark. The source code for the demo can be found at http://brkyvz.github.io/spark-pipeline

Software

End-to-End Data Pipelines
with Apache Spark
Burak Yavuz
December 27, 2015

Outline
• Intro - Spark & Ecosystem
• Build an End-to-End Data Product
• Step 1: Understand your Data
• SparkSQL - DataFrames
• Step 2: Build your Service
• SparkMLlib - ML Pipelines
• Step 3: Monitor your Service
• Spark Streaming
• Kafka
3

Timeline of Spark
• 2010: a research paper
• 2010-13: a project under github/mesos
• 2013-14: Apache incubating -> TLP
• 2014: the most active project in the ASF
4

Spark Ecosystem
• 770 contributors
• 6000+ forks on GitHub
• 14000+ commits!
6
https://github.com/apache/spark

7
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf

8
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf

9
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf

• a community index of 3rd-party packages
• helps users find packages
• helps package developers meet users
• users provide feedback through voting and
commenting
• index maintained by Databricks
11
3rd Party Packages
Community
Spark Packages
http://spark-packages.org

Types of Packages Currently Available
• Data Source Connectors
• spark-avro, spark-redshift, spark-mongodb, spark-
sequoiadb, spark-cassandra-connector, …
• Deployment Scripts
• spark_azure, spark_gce, sbt-spark-ec2
• Machine Learning Algorithms
• spark-hash, spark-mrmr-feature-selection, streaming-
matrix-factorization, generalized-kmeans-clustering
• and many more…
12

What’s new in Spark 1.6
• Dataset API
• Automatic memory configuration
• Optimized state storage in Spark Streaming
• Pipeline persistence in Spark ML
13

Demo
Source Code: http://brkyvz.github.io/spark-pipeline
Scenario: As an e-commerce company, we would like to recommend
products that users may like in order to increase sales and profit.
Dataset: http://jmcauley.ucsd.edu/data/amazon/
- 18 GB
- 82.83 million reviews
We will use a subset with 24 million reviews
14

Recommendation Engines
• Finding Similar Items
• Clustering using:
• Metadata
• Matrix Factorization
• Frequent Itemsets
• Ranking
• Rating Prediction using:
• Matrix Factorization
17

Architecture
18
Web
Service 1
Web
Service 2
Web
Service 3
Cassandra
Sales Data
Database
Spark
Sales + Ratings
Rating
Data
ML Model
Recommendations
Request

Solution Proposal
Use Matrix Factorization to understand customers
and items.
Then:
1) Predict the rating for a product for a given user
2) Find similar products, and show top k
21

Matrix Factorization
22
https://databricks-training.s3.amazonaws.com/slides/Spark_Summit_MLlib_070214_v2.pdf

Matrix Factorization
23
https://databricks-training.s3.amazonaws.com/slides/Spark_Summit_MLlib_070214_v2.pdf

24
https://databricks-training.s3.amazonaws.com/slides/Spark_Summit_MLlib_070214_v2.pdf

• Distributed messaging system
• High-throughput
• Fast
• Scalable
• Durable
• http://kafka.apache.org/
26
Apache Kafka

Architecture
27
Web
Service 1
Web
Service 2
Web
Service 3
Kafka Spark Streaming

Architecture
28
Web
Service 1
Web
Service 2
Web
Service 3
Kafka Spark Streaming

What's hot

A Thorough Comparison of Delta Lake, Iceberg and Hudi

Databricks

"Structured Streaming has proven to be the best platform for building distributed stream processing applications. Its unified SQL/Dataset/DataFrame APIs and Spark's built-in functions make it easy for developers to express complex computations. However, expressing the business logic is only part of the larger problem of building end-to-end streaming pipelines that interact with a complex ecosystem of storage systems and workloads. It is important for the developer to truly understand the business problem needs to be solved. What are you trying to consume? Single source? Joining multiple streaming sources? Joining streaming with static data? What are you trying to produce? What is the final output that the business wants? What type of queries does the business want to run on the final output? When do you want it? When does the business want to the data? What is the acceptable latency? Do you really want to millisecond-level latency? How much are you willing to pay for it? This is the ultimate question and the answer significantly determines how feasible is it solve the above questions. These are the questions that we ask every customer in order to help them design their pipeline. In this talk, I am going to go through the decision tree of designing the right architecture for solving your problem."

Designing Structured Streaming Pipelines—How to Architect Things Right

Databricks

Introduction to Apache Calcite

Jordan Halterman

"The common use cases of Spark SQL include ad hoc analysis, logical warehouse, query federation, and ETL processing. Spark SQL also powers the other Spark libraries, including structured streaming for stream processing, MLlib for machine learning, and GraphFrame for graph-parallel computation. For boosting the speed of your Spark applications, you can perform the optimization efforts on the queries prior employing to the production systems. Spark query plans and Spark UIs provide you insight on the performance of your queries. This talk discloses how to read and tune the query plans for enhanced performance. It will also cover the major related features in the recent and upcoming releases of Apache Spark. "

Understanding Query Plans and Spark UIs

Databricks

Apache Spark Core – Practical Optimization

Databricks

The transaction log is key to understanding Delta Lake because it is the common thread that runs through many of its most important features, including ACID transactions, scalable metadata handling, time travel, and more. In this session, we’ll explore what the Delta Lake transaction log is, how it works at the file level, and how it offers an elegant solution to the problem of multiple concurrent reads and writes.

Diving into Delta Lake: Unpacking the Transaction Log

Databricks

Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.

Azure Synapse Analytics Overview (r2)

James Serra

Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -

Yoshiyasu SAEKI

An informational, or statistical, constraint is a constraint such as a unique, primary key, foreign key, or check constraint that can be used by Apache Spark to improve query performance. Informational constraints are not enforced by the Spark SQL engine; rather, they are used by Catalyst to optimize the query processing. Informational constraints will be primarily targeted to applications that load and analyze data that originated from a data warehouse. For such applications, the conditions for a given constraint are known to be true, so the constraint does not need to be enforced during data load operations. This session will cover the support for primary and foreign key (referential integrity) constraints in Spark. You’ll learn about the constraint specification, metastore storage, constraint validation and maintenance. You’ll also see examples of query optimizations that utilize referential integrity constraints, such as Join and Distinct elimination and Star Schema detection.

Informational Referential Integrity Constraints Support in Apache Spark with ...

Databricks

Query Compilation in Impala

Cloudera, Inc.

On Improving Broadcast Joins in Apache Spark SQL

Databricks

Bucketing is a partitioning technique that can improve performance in certain data transformations by avoiding data shuffling and sorting. The general idea of bucketing is to partition, and optionally sort, the data based on a subset of columns while it is written out (a one-time cost), while making successive reads of the data more performant for downstream jobs if the SQL operators can make use of this property. Bucketing can enable faster joins (i.e. single stage sort merge join), the ability to short circuit in FILTER operation if the file is pre-sorted over the column in a filter predicate, and it supports quick data sampling. In this session, you’ll learn how bucketing is implemented in both Hive and Spark. In particular, Patil will describe the changes in the Catalyst optimizer that enable these optimizations in Spark for various bucketing scenarios. Facebook’s performance tests have shown bucketing to improve Spark performance from 3-5x faster when the optimization is enabled. Many tables at Facebook are sorted and bucketed, and migrating these workloads to Spark have resulted in a 2-3x savings when compared to Hive. You’ll also hear about real-world applications of bucketing, like loading of cumulative tables with daily delta, and the characteristics that can help identify suitable candidate jobs that can benefit from bucketing.

Hive Bucketing in Apache Spark with Tejas Patil

Databricks

In data analytics frameworks such as Spark it is important to detect and avoid scanning data that is irrelevant to the executed query, an optimization which is known as partition pruning. Dynamic partition pruning occurs when the optimizer is unable to identify at parse time the partitions it has to eliminate. In particular, we consider a star schema which consists of one or multiple fact tables referencing any number of dimension tables. In such join operations, we can prune the partitions the join reads from a fact table by identifying those partitions that result from filtering the dimension tables. In this talk we present a mechanism for performing dynamic partition pruning at runtime by reusing the dimension table broadcast results in hash joins and we show significant improvements for most TPCDS queries.

Dynamic Partition Pruning in Apache Spark

Databricks

Bucketing 2.0: Improve Spark SQL Performance by Removing Shuffle

Databricks

Apache Spark 2.2 ships with a state-of-art cost-based optimization framework that collects and leverages a variety of per-column data statistics (e.g., cardinality, number of distinct values, NULL values, max/min, avg/max length, etc.) to improve the quality of query execution plans. Leveraging these reliable statistics helps Spark to make better decisions in picking the most optimal query plan. Examples of these optimizations include selecting the correct build side in a hash-join, choosing the right join type (broadcast hash-join vs. shuffled hash-join) or adjusting a multi-way join order, among others. In this talk, we’ll take a deep dive into Spark’s cost based optimizer and discuss how we collect/store these statistics, the query optimizations it enables, and its performance impact on TPC-DS benchmark queries.

Cost-Based Optimizer in Apache Spark 2.2

Databricks

Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark

Bo Yang

Join operations in Apache Spark is often the biggest source of performance problems and even full-blown exceptions in Spark. After this talk, you will understand the two most basic methods Spark employs for joining DataFrames – to the level of detail of how Spark distributes the data within the cluster. You’ll also find out how to work out common errors and even handle the trickiest corner cases we’ve encountered! After this talk, you should be able to write performance joins in Spark SQL that scale and are zippy fast! This session will cover different ways of joining tables in Apache Spark. Speaker: Vida Ha This talk was originally presented at Spark Summit East 2017.

Optimizing Apache Spark SQL Joins

Databricks

At Spark Summit 2017, we described our framework to migrate production Hive workload to Spark with minimal user intervention. After a year of migration, Spark now powers an important part of our batch processing workload. The migration framework supports syntax compatibility analysis, offline/online shadowing, and data validation. In this session, we first introduce new features and improvements in the migration framework to support bucketed tables and increase automation. Next, we will deep dive into the top technical challenges we encountered and how we addressed them. We improved the the syntax compatibility between Hive and Spark from around 51% to 85% by identifying/developing top missing features, fixing incompatible UDFs, and implementing a UDF testing framework. In addition, we developed reliable join operators to improve Spark stability in production when leveraging optimizations such as ShuffledHashJoin. Finally, we will share an update on our overall migration effort and examples of migrations wins. For example, we were able to migrate one of the most complicated workloads in Facebook from Hive to Spark with more than 2.5X performance gain.

Migrating Apache Hive Workload to Apache Spark: Bridge the Gap with Zhan Zhan...

Databricks

A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)

Hadoop / Spark Conference Japan

Understanding and Improving Code Generation

Databricks

What's hot (20)

A Thorough Comparison of Delta Lake, Iceberg and Hudi

Designing Structured Streaming Pipelines—How to Architect Things Right

Introduction to Apache Calcite

Understanding Query Plans and Spark UIs

Apache Spark Core – Practical Optimization

Diving into Delta Lake: Unpacking the Transaction Log

Azure Synapse Analytics Overview (r2)

Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -

Informational Referential Integrity Constraints Support in Apache Spark with ...

Query Compilation in Impala

On Improving Broadcast Joins in Apache Spark SQL

Hive Bucketing in Apache Spark with Tejas Patil

Dynamic Partition Pruning in Apache Spark

Bucketing 2.0: Improve Spark SQL Performance by Removing Shuffle

Cost-Based Optimizer in Apache Spark 2.2

Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark

Optimizing Apache Spark SQL Joins

Migrating Apache Hive Workload to Apache Spark: Bridge the Gap with Zhan Zhan...

A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)

Understanding and Improving Code Generation

Viewers also liked

JessicaKleinresume

Jessica Klein

Straight edge Tic

Alejandro Ramirez

Enterprise resource planning p point

hendricks89

Fall Seminar Brochure 2014

Jennifer Mackall

Ovit_Brochure

Drivas Kostas

Consultant profile

Arijit Basu

Privatization Performance over Transition

GRAPE

Guia de-observación-a-un-barrio

Ernesto Sánchez Suárez

ICS2208 lecture2

Vanessa Camilleri

Timo Honkela: Turning quantity into quality and making concepts visible using...

Timo Honkela

Moviments forces i màquines

Eva Puertes

Female access to the labor market and wages over transition

GRAPE

Polityczna (nie)stabilność reform systemów emerytalnych

GRAPE

день святого валентина

alic_o

Viewers also liked (14)

JessicaKleinresume

Straight edge Tic

Enterprise resource planning p point

Fall Seminar Brochure 2014

Ovit_Brochure

Consultant profile

Privatization Performance over Transition

Guia de-observación-a-un-barrio

ICS2208 lecture2

Timo Honkela: Turning quantity into quality and making concepts visible using...

Moviments forces i màquines

Female access to the labor market and wages over transition

Polityczna (nie)stabilność reform systemów emerytalnych

день святого валентина

Similar to End-to-End Data Pipelines with Apache Spark

Spark Hsinchu meetup

Yung-An He

Databricks Meetup @ Los Angeles Apache Spark User Group

Paco Nathan

Media_Entertainment_Veriticals

Peyman Mohajerian

An Insider’s Guide to Maximizing Spark SQL Performance

Takuya UESHIN

Fighting Fraud with Apache Spark

Miklos Christine

Getting started with SparkSQL - Desert Code Camp 2016

clairvoyantllc

In this one-hour webinar, you will be introduced to Spark, the data engineering that supports it, and the data science advances that it has spurned. You’ll discover the interesting story of its academic origins and then get an overview of the organizations who are using the technology. After being briefed on some impressive Spark case studies, you’ll come to know of the next-generation Spark 2.0 (to be released in just a few months). We will also tell you about the tremendous impact that learning Spark can have upon your current salary, and the best ways to get trained in this ground-breaking new technology.

Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...

Lillian Pierson

Strata EU 2014: Spark Streaming Case Studies

Paco Nathan

Building and deploying an analytic service on Cloud is a challenge. A bigger challenge is to maintain the service. In a world where users are gravitating towards a model where cluster instances are to provisioned on the fly, in order for these to be used for analytics or other purposes, and then to have these cluster instances shut down when the jobs get done, the relevance of containers and container orchestration is more important than ever. In short Customers are looking for Serverless Spark Clusters. The Intent of this presentation is to share what is Serverless Spark and what are the benefits of running Spark in serverless manner.

Serverless spark

MamathaBusi

Apache Spark 2.0 and subsequent releases of Spark 2.1 and 2.2 have laid the foundation for many new features and functionality. Its main three themes—easier, faster, and smarter—are pervasive in its unified and simplified high-level APIs for Structured data. In this introductory part lecture and part hands-on workshop, you’ll learn how to apply some of these new APIs using Databricks Community Edition. In particular, we will cover the following areas: Agenda: • Overview of Spark Fundamentals & Architecture • What’s new in Spark 2.x • Unified APIs: SparkSessions, SQL, DataFrames, Datasets • Introduction to DataFrames, Datasets and Spark SQL • Introduction to Structured Streaming Concepts • Four Hands On Labs You will use Databricks Community Edition, which will give you unlimited free access to a ~6 GB Spark 2.x local mode cluster. And in the process, you will learn how to create a cluster, navigate in Databricks, explore a couple of datasets, perform transformations and ETL, save your data as tables and parquet files, read from these sources, and analyze datasets using DataFrames/Datasets API and Spark SQL. Level: Beginner to intermediate, not for advanced Spark users. Prerequisite: You will need a laptop with Chrome or Firefox browser installed with at least 8 GB. Introductory or basic knowledge Scala or Python is required, since the Notebooks will be in Scala; Python is optional. Bio: Jules S. Damji is an Apache Spark Community Evangelist with Databricks. He is a hands-on developer with over 15 years of experience and has worked at leading companies, such as Sun Microsystems, Netscape, LoudCloud/Opsware, VeriSign, Scalix, and ProQuest, building large-scale distributed systems. Before joining Databricks, he was a Developer Advocate at Hortonworks.

Jump Start on Apache® Spark™ 2.x with Databricks

Databricks

In this introductory part lecture and part hands-on workshop, you’ll learn how to apply some of these new APIs using Databricks Community Edition. In particular, we will cover the following areas: Agenda: • Overview of Spark Fundamentals & Architecture • What’s new in Spark 2.x • Unified APIs: SparkSessions, SQL, DataFrames, Datasets • Introduction to DataFrames, Datasets and Spark SQL • Introduction to Structured Streaming Concepts • Four Hands On Labs You will use Databricks Community Edition, which will give you unlimited free access to a ~6 GB Spark 2.x local mode cluster. And in the process, you will learn how to create a cluster, navigate in Databricks, explore a couple of datasets, perform transformations and ETL, save your data as tables and parquet files, read from these sources, and analyze datasets using DataFrames/Datasets API and Spark SQL. Level: Beginner to intermediate, not for advanced Spark users. Prerequisite: You will need a laptop with Chrome or Firefox browser installed with at least 8 GB. Introductory or basic knowledge Scala or Python is required, since the Notebooks will be in Scala; Python is optional. Bio: Jules S. Damji is an Apache Spark Community Evangelist with Databricks. He is a hands-on developer with over 15 years of experience and has worked at leading companies, such as Sun Microsystems, Netscape, LoudCloud/Opsware, VeriSign, Scalix, and ProQuest, building large-scale distributed systems. Before joining Databricks, he was a Developer Advocate at Hortonworks.

Jumpstart on Apache Spark 2.2 on Databricks

Databricks

Yao Yao Mooyoung Lee https://github.com/yaowser/learn-spark/tree/master/Final%20project https://www.youtube.com/watch?v=IVMbSDS4q3A https://www.academia.edu/35646386/Teaching_Apache_Spark_Demonstrations_on_the_Databricks_Cloud_Platform https://www.slideshare.net/YaoYao44/teaching-apache-spark-demonstrations-on-the-databricks-cloud-platform-86063070/ Apache Spark is a fast and general engine for big data analytics processing with libraries for SQL, streaming, and advanced analytics Cloud Computing, Structured Streaming, Unified Analytics Integration, End-to-End Applications

Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform

Yao Yao

Interest is growing in the Apache Spark community in using Deep Learning techniques and in the Deep Learning community in scaling algorithms with Apache Spark. A few of them to note include: · Databrick’s efforts in scaling Deep learning with Spark · Intel announcing the BigDL: A Deep learning library for Spark · Yahoo’s recent efforts to opensource TensorFlowOnSpark In this lecture we will discuss the key use cases and developments that have emerged in the last year in using Deep Learning techniques with Spark.

Deep learning and Apache Spark

QuantUniversity

As presented at the CloudBrew 2019 conference in Dec 14, 2019. Love cognitive services but not sure how to use them at scale? Enjoy working with Apache Spark but always searching for a way to integrate AI and better machine learning algorithms? Now you can do it all. Run Azure Cognitive Services within Azure Databricks. Curious how? Come to this talk and learn how, what does it mean, performance tuning and best practices.

AI at Scale

Adi Polak

Spark meetup2 final (Taboola)

tsliwowicz

In this in-depth workshop you will gain hands on experience with using Spark and Cassandra inside the DataStax Enterprise Platform. The focus of the workshop will be working through data analytics exercises to understand the major developer developer considerations. You will also gain an understanding of the internals behind the integration that allow for large scale data loading and analysis. It will also review some of the major machine learning libraries in Spark as an example of data analysis. The workshop will start with a review the basics of how Spark and Cassandra are integrated. Then we will work through a series of exercises that will show how to perform large scale Data Analytics with Spark and Cassandra. A major part of the workshop will be to understand effective data modeling techniques in Cassandra that allow for fast parallel loading of the data into Spark to perform large scale analytics on that data. The exercises will also look at how to how to use the open source Spark Notebook to run interactive data analytics with the DataStax Enterprise Platform.

DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...

DataStax Academy

Splice Machine is an ANSI-SQL Relational Database Management System (RDBMS) on Apache Spark. It has proven low-latency transactional processing (OLTP) as well as analytical processing (OLAP) at petabyte scale. It uses Spark for all analytical computations and leverages HBase for persistence. This talk highlights a new Native Spark Datasource - which enables seamless data movement between Spark Data Frames and Splice Machine tables without serialization and deserialization. This Spark Datasource makes machine learning libraries such as MLlib native to the Splice RDBMS . Splice Machine has now integrated MLflow into its data platform, creating a flexible Data Science Workbench with an RDBMS at its core. The transactional capabilities of Splice Machine integrated with the plethora of DataFrame-compatible libraries and MLflow capabilities manages a complete, real-time workflow of data-to-insights-to-action. In this presentation we will demonstrate Splice Machine's Data Science Workbench and how it leverages Spark and MLflow to create powerful, full-cycle machine learning capabilities on an integrated platform, from transactional updates to data wrangling, experimentation, and deployment, and back again.

Splice Machine's use of Apache Spark and MLflow

Databricks

Introduction to Apache Spark 2.0

Knoldus Inc.

Combining Machine Learning frameworks with Apache Spark

DataWorks Summit/Hadoop Summit

Apache Spark - A High Level overview

Karan Alang

Similar to End-to-End Data Pipelines with Apache Spark (20)

Spark Hsinchu meetup

Databricks Meetup @ Los Angeles Apache Spark User Group

Media_Entertainment_Veriticals

An Insider’s Guide to Maximizing Spark SQL Performance

Fighting Fraud with Apache Spark

Getting started with SparkSQL - Desert Code Camp 2016

Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...

Strata EU 2014: Spark Streaming Case Studies

Serverless spark

Jump Start on Apache® Spark™ 2.x with Databricks

Jumpstart on Apache Spark 2.2 on Databricks

Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform

Deep learning and Apache Spark

AI at Scale

Spark meetup2 final (Taboola)

DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...

Splice Machine's use of Apache Spark and MLflow

Introduction to Apache Spark 2.0

Combining Machine Learning frameworks with Apache Spark

Apache Spark - A High Level overview

Recently uploaded

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

Philip Schwarz

WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...

WSO2

%in Soweto+277-882-255-28 abortion pills for sale in soweto

masabamasaba

WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation

WSO2

In the past six months, the AI landscape has undergone a massive transformation, ushering in a new era of productivity with the latest in Large Language Models (LLMs) and AI technology. This deep dive unlocks how to: Create CustomGPT Models: No coding needed to tailor AI for your unique projects. Integrate your own data, including PDFs and Excel sheets, making information handling a breeze. Plus, discover how to call your own actions/integrations for even more personalized utility. Navigate Advanced Prompting: Overcome AI's memory limits and utilize Retrieval-Augmented Generation for accessing your personalized data, streamlining how you interact with AI. Stay Ahead with AI Trends: Peek into the evolving world of LLMs, featuring newcomers like Google Gemini, Anthropic Claude, Open Sora, and Twitter Grok, and understand what their advancements mean for your productivity. Witness Real-Life Transformations: Through examples and prompt demonstrations, see firsthand how these AI strategies revolutionize routine tasks, from data analysis to content creation. Learn to leverage image output and input for advanced practical use cases, adding a new dimension to your productivity toolkit. No previous coding or AI experience is needed for this talk. Stay ahead in the fast-evolving world of work. Embrace the AI revolution and transform your workflow with advanced LLM techniques. Join us to ensure you're not left behind in the productivity race.

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques

VictorSzoltysek

Craft an AI & Machine Learning Pitch with our Editable Professional PowerPoint Template. Ignite your AI & Machine Learning pitch with our cutting-edge PowerPoint template tailored for the industry. Perfect for AI conferences, investor presentations, sales pitches to tech-focused companies, training sessions, and educational programs. - 20+ editable slides: Get a variety of options to choose from for your presentation. - Time-saving solution: Download, replace text/images with a few clicks. - User-friendly customization: Easy to use and personalize. - Modern and attractive design: Captivating visuals, sleek layout. - Tailored to your requirements: Fully alterable for customization. - Well-organized slides: Complete control over content. - Thematic specificity: Reflects healthcare industry with relevant graphics. - Showcase your business idea: Communicate value proposition effectively.

AI & Machine Learning Presentation Template

Presentation.STUDIO

VTU technical seminar 8Th Sem on Scikit-learn

AmarnathKambale

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview

masabamasaba

WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...

WSO2

%in kempton park+277-882-255-28 abortion pills for sale in kempton park

masabamasaba

We specialize in Psychic Readings, Psychic Love Spells, Binding Love Spells, Obsession Spells, Voodoo Spells, Lottery Spells, Marriage Spells, Black Magic Spells, Palm Readings & much more. Are you depressed? We perform this come-to-me love spell that works instantly with the aim of bringing back the victim to the person performing the magic. Have you lost your lover? We perform this come-to-me love spell that works instantly with the aim of bringing back the victim to the person performing the magic. Have you lost your lover? Do u need to solve any relationship problem? Contact the powerful spells caster chief kule with love spells that work overnight and love spells that really work. Have you found yourself infatuated with a special someone you think could be the one? Are you looking for a spell to provide them with a nudge in the right direction? Or maybe the spell you cast didn’t achieve the results you were hoping for? Whether you’re new or versed in the ways of spell casting, we’re here to help. Today we’re going to provide you with a detailed guide on the types of love spells to cast. Not only that but there’s something for those who wish to find outside advice from more advanced spell casters. We’re also going to provide you with the top sites available to help you with your dilemma. Let’s begin our journey by educating ourselves on love magic and what a real love caster looks like. Love Magic and Love Casters Love magic made its first appearance back in Ancient Egypt and has been an active practice since. This type of magic is a branch of traditional magic and can be practiced in various ways. Typically the more common use of love magic is through the work of spells, but other methods look like Charms Rituals-LOVE Potions-Dolls and even Amulets If you are interested in becoming a love caster, be prepared for what’s to come. A genuine love caster knows that the art of love casting is no easy feat and shouldn’t be done casually. You should know that not only does it require you to be gifted spiritually, but you must be ready to serve others. Someone who is considered a real love caster has experience in all manner of spells, no matter the difficulty. Training yourself in attraction, commitment, and marriage spells is an excellent place to start. But this by no means will make you a professional. Practice your craft and expand your knowledge; understand that you will possess the ability to help others in time truly. Types of Love Spells What better way to start broadening your experiences with love spells than by learning more about them? These spells work like just about any other spell. Simply apply your intention, use a medium (sigils, mantras, candles, or charm bags), and top it off with establishing the belief that you will receive what you want. So what kind of spells are available and which ones suit your needs the best? Let’s take a look at the many options you have at your disposal.

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...

masabamasaba

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in Tembisa ● Abortion Pills For Sale in Tembisa ● Tembisa 🏥🚑!! Abortion Clinic Near Me Cost, Price, Women's Clinic Near Me, Abortion Clinic Near, Abortion Doctors Near me, Abortion Services Near Me, Abortion Pills Over The Counter, Abortion Pill Doctors' Offices, Abortion Clinics, Abortion Places Near Me, Cheap Abortion Places Near Me, Medical Abortion & Surgical Abortion, approved cyctotec pills and womb cleaning pills too plus all the instructions needed This Discrete women’s Termination Clinic offers same day services that are safe and pain free, we use approved pills and we clean the womb so that no side effects are present. Our main goal is that of preventing unintended pregnancies and unwanted births every day to enable more women to have children by choice, not chance. We offer Terminations by Pill and The Morning After Pill.” Our Private VIP Abortion Service offers the ultimate in privacy, efficiency and discretion. we do safe and same day termination and we do also womb cleaning as well its done from 1 week up to 28 weeks. We do delivery of our services world wide SAFE ABORTION CLINICS/PILLS ON SALE WE DO DELIVERY OF PILLS ALSO Abortion clinic at very low costs, 100% Guaranteed and it’s safe, pain free and a same day service. It Is A 45 Minutes Procedure, we use tested abortion pills and we do womb cleaning as well. Alternatively the medical abortion pill and womb cleansing !!!

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...

Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg

%in ivory park+277-882-255-28 abortion pills for sale in ivory park

masabamasaba

ADR, or Architecture Decision Record, is a valuable tool in software development for several reasons. It provides a centralized location for documenting and tracking architectural decisions, aiding both current and future team members. ADRs enhance communication among team members by documenting the rationale behind architectural decisions, especially beneficial during onboarding of new team members or when revisiting decisions. They serve as a knowledge base, enabling teams to learn from past decisions and refine their decision-making process. Additionally, ADRs contribute to transparency by helping stakeholders understand the reasons behind specific architectural choices. As with any other tool or process, introducing them into an organization can face several obstacles, and overcoming these challenges is crucial for successful implementation. In this talk I go through some common problems and our way of solving them.

Architecture decision records - How not to get lost in the past

Papp Krisztián

Software Quality Assurance Interview Questions

Arshad QA

Announcing Codolex 2.0 from GDK Software

Jim McKeeth

We specialize in Psychic Readings, Psychic Love Spells, Binding Love Spells, Obsession Spells, Voodoo Spells, Lottery Spells, Marriage Spells, Black Magic Spells, Palm Readings & much more. Are you depressed? We perform this come-to-me love spell that works instantly with the aim of Winnipeg back the victim to the person performing the magic. Have you lost your lover? We perform this come-to-me love spell that works instantly with the aim of bringing back the victim to the person performing the magic. Have you lost your lover? Do u need to solve any relationship problem? Contact the powerful spells caster chief kule with love spells that work overnight and love spells that really work. Have you found yourself infatuated with a special someone you think could be the one? Are you looking for a spell to provide them with a nudge in the right direction? Or maybe the spell you cast didn’t achieve the results you were hoping for? Whether you’re new or versed in the ways of spell casting, we’re here to help. Today we’re going to provide you with a detailed guide on the types of love spells to cast. Not only that but there’s something for those who wish to find outside advice from more advanced spell casters. We’re also going to provide you with the top sites available to help you with your dilemma. Let’s begin our journey by educating ourselves on love magic and what a real love caster looks like. Love Magic and Love Casters Love magic made its first appearance back in Ancient Egypt and has been an active practice since. This type of magic is a branch of traditional magic and can be practiced in various ways. Typically the more common use of love magic is through the work of spells, but other methods look like Charms Rituals-LOVE Potions-Dolls and even Amulets If you are interested in becoming a love caster, be prepared for what’s to come. A genuine love caster knows that the art of love casting is no easy feat and shouldn’t be done casually. You should know that not only does it require you to be gifted spiritually, but you must be ready to serve others. Someone who is considered a real love caster has experience in all manner of spells, no matter the difficulty. Training yourself in attraction, commitment, and marriage spells is an excellent place to start. But this by no means will make you a professional. Practice your craft and expand your knowledge; understand that you will possess the ability to help others in time truly. Types of Love Spells What better way to start broadening your experiences with love spells than by learning more about them? These spells work like just about any other spell. Simply apply your intention, use a medium (sigils, mantras, candles, or charm bags), and top it off with establishing the belief that you will receive what you want. So what kind of spells are available and which ones suit your needs the best? Let’s take a look at the many options you have at your disposal. Attraction Spells

%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...

masabamasaba

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...

masabamasaba

In today's dynamic e-commerce landscape, the payment gateway emerges as a linchpin, ensuring smooth and secure transactions between buyers and sellers. In this discourse, we delve into the meticulous process of devising test cases tailored for scrutinizing payment gateways. Crafting precise test cases for payment gateways is a quintessential responsibility for testers operating within the service industry. This article meticulously explores pivotal scenarios integral to how to test payment gateways, coupled with essential guidelines for drafting effective test cases.

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf

kalichargn70th171

Conference: Engage2024 in Antwerp Type: Workshop Speakers: Florian Vogler, Henning Kunz, Christoph Adler Title: Navigating the Future with The Hitchhiker's Guide to Notes and Domino 14 Abstract: Embark on an exhilarating journey with industry trailblazers Florian Vogler, Henning Kunz, and Christoph Adler in this not-to-be-missed workshop at the forefront of the tech universe. Get ready for a thrilling kick-off as we navigate the current state of the HCL universe, setting the stage for an exploration of the groundbreaking Notes and Domino 14. Discover the latest enhancements and revolutionary features that will redefine your experience. In this interactive session, unlock a treasure trove of tips and tricks to elevate your utilization of version 14, both with and without the game-changing panagenda MarvelClient. Brace yourself for also diving into Nomad, Nomad Web, and VoltMX, expanding your horizons in the expansive HCL landscape. Be a part of this exclusive opportunity to stay ahead in the ever-evolving world of HCL technologies. Your journey to mastering Notes and Domino 14 begins here. And remember, in the spirit of intergalactic exploration, don't forget to bring your towel!

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...

panagenda

Recently uploaded (20)

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...

%in Soweto+277-882-255-28 abortion pills for sale in soweto

WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques

AI & Machine Learning Presentation Template

VTU technical seminar 8Th Sem on Scikit-learn

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview

WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...

%in kempton park+277-882-255-28 abortion pills for sale in kempton park

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...

%in ivory park+277-882-255-28 abortion pills for sale in ivory park

Architecture decision records - How not to get lost in the past

Software Quality Assurance Interview Questions

Announcing Codolex 2.0 from GDK Software

%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...

End-to-End Data Pipelines with Apache Spark

1. End-to-End Data Pipelines with Apache Spark Burak Yavuz December 27, 2015

2. Who Am I? • Software Engineer at Databricks • MS Management Science & Eng. @ Stanford University • BS Mechanical Eng. @ Bogazici University, Istanbul • Contributor to Spark Core, MLlib, SQL, and Streaming • Maintainer of Spark Packages 2

3. Outline • Intro - Spark & Ecosystem • Build an End-to-End Data Product • Step 1: Understand your Data • SparkSQL - DataFrames • Step 2: Build your Service • SparkMLlib - ML Pipelines • Step 3: Monitor your Service • Spark Streaming • Kafka 3

4. Timeline of Spark • 2010: a research paper • 2010-13: a project under github/mesos • 2013-14: Apache incubating -> TLP • 2014: the most active project in the ASF 4

5. Apache Spark 5

6. Spark Ecosystem • 770 contributors • 6000+ forks on GitHub • 14000+ commits! 6 https://github.com/apache/spark

7. 7 http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf

8. 8 http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf

9. 9 http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf

10. 10

11. • a community index of 3rd-party packages • helps users find packages • helps package developers meet users • users provide feedback through voting and commenting • index maintained by Databricks 11 3rd Party Packages Community Spark Packages http://spark-packages.org

12. Types of Packages Currently Available • Data Source Connectors • spark-avro, spark-redshift, spark-mongodb, spark- sequoiadb, spark-cassandra-connector, … • Deployment Scripts • spark_azure, spark_gce, sbt-spark-ec2 • Machine Learning Algorithms • spark-hash, spark-mrmr-feature-selection, streaming- matrix-factorization, generalized-kmeans-clustering • and many more… 12

13. What’s new in Spark 1.6 • Dataset API • Automatic memory configuration • Optimized state storage in Spark Streaming • Pipeline persistence in Spark ML 13

14. Demo Source Code: http://brkyvz.github.io/spark-pipeline Scenario: As an e-commerce company, we would like to recommend products that users may like in order to increase sales and profit. Dataset: http://jmcauley.ucsd.edu/data/amazon/ - 18 GB - 82.83 million reviews We will use a subset with 24 million reviews 14

15. 15

16. 16

17. Recommendation Engines • Finding Similar Items • Clustering using: • Metadata • Matrix Factorization • Frequent Itemsets • Ranking • Rating Prediction using: • Matrix Factorization 17

18. Architecture 18 Web Service 1 Web Service 2 Web Service 3 Cassandra Sales Data Database Spark Sales + Ratings Rating Data ML Model Recommendations Request

19. 19 Step 1: Understand your Data

20. 20 Step 2: Build your Service

21. Solution Proposal Use Matrix Factorization to understand customers and items. Then: 1) Predict the rating for a product for a given user 2) Find similar products, and show top k 21

22. Matrix Factorization 22 https://databricks-training.s3.amazonaws.com/slides/Spark_Summit_MLlib_070214_v2.pdf

23. Matrix Factorization 23 https://databricks-training.s3.amazonaws.com/slides/Spark_Summit_MLlib_070214_v2.pdf

24. 24 https://databricks-training.s3.amazonaws.com/slides/Spark_Summit_MLlib_070214_v2.pdf

25. 25 Step 3: Monitor your Service

26. • Distributed messaging system • High-throughput • Fast • Scalable • Durable • http://kafka.apache.org/ 26 Apache Kafka

27. Architecture 27 Web Service 1 Web Service 2 Web Service 3 Kafka Spark Streaming

28. Architecture 28 Web Service 1 Web Service 2 Web Service 3 Kafka Spark Streaming

29. Thank you. burak@databricks.com