Apache Spark for .Net Devs

•

0 likes•33 views

The document discusses Apache Spark, a unified analytics engine for large-scale data processing. It introduces .Net for Apache Spark, which provides .Net language bindings for Spark. It also mentions using the MovieLens dataset with Spark on Azure Synapse Analytics. Key components of Spark include RDDs, DataFrames, SparkSession, and transformations/actions. The document provides an overview of Spark and demonstrates it through a movie recommendation example on Azure Synapse Analytics.

Technology

Nilesh Gule
@nileshgule | www.HandsOnArchitect.com
Big Data for .Net Devs
with
Apache Spark

$whoami
{
“name” : “Nilesh Gule”,
“website” : “https://www.HandsOnArchitect.com",
“github” : “https://github.com/NileshGule"
“twitter” : “@nileshgule”,
“linkedin” : “https://www.linkedin.com/in/nileshgule”,
“likes” : “Technical Evangelism, Cricket”,
“co-organizer” : “Azure Singapore UG”
}

What is Apache Spark
https://spark.apache.org/

Apache Spark Data Sources
https://posts.specterops.io/threat-hunting-with-
jupyter-notebooks-part-3-querying-elasticsearch-
via-apache-spark-670054cd9d47

Benefits of using Apache Spark
• Speed
• Up to 100x faster compared to Map Reduce
• Ease of use
• Easy to use API’s
• Multi language support
• 100+ operators
• Unified engine
• Higher level libraries & support for SQL Queries,
streaming data, machine learning and graph
processing
• Runs everywhere
• Hadoop, standalone, Mesos, Kubernetes, cloud
https://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html

Apache Spark Components
• Dataset, DataFrame, RDD
• Distributed collection of data
• SparkSession
• Entry point into Spark API
• SparkContext, SQLContext, StreamingContext unified
into one
• Executors
• Handles distributed processing
• Transformations & Actions
• Transformations – lazy operations that returns
immutable data structures
• Actions – apply operations and return value or write
data to external storage

Spark Common Transformations
• map
• flatMap
• filter
• Distinct
• Sample(withReplacement, ..)
• Union
• Intersection
• Subtract
• cartesian
• reduceByKey
• groupByKey
• sortByKey
• Join
• repartition

Spark Common Actions
• collect
• count
• countByValue
• Take(num)
• Top(num)
• Reduce(func)
• Fold(zero)(func)
• saveAsTextFile(path)
• saveAsSequenceFile(path)
• countByKey()

What is .Net for Apache Spark
• .Net bindings for Spark written on
Spark interop layer
• Provides high performance bindings
for C# and F#
• Compliant with .Net standard
https://devblogs.microsoft.com/dotnet/introducing-net-for-apache-spark/#performance

Demo
• MovieLens Datatset
• CSV files in Azure Data Lake Storage
• Spark pools using Azure Synapse analytics

Summary
• Apache Spark is great for Big Data Analytics
• .Net for Apache Spark provides .Net language bindings
to Spark
• Azure Synapse Analytics has native support for C#

 Apache Spark
 .Net for Apache Spark
 MovieLens datasets
 Azure Synapse Analytics

https://youtu.be/KhMKXQkIzKw https://channel9.msdn.com/Series/NET-for-Apache-Spark-101

Thank you very much
Code with Passion and Strive for Excellence
https://www.slideshare.net/nileshgule/presentations
https://speakerdeck.com/nileshgule/

Nilesh Gule
ARCHITECT | MICROSOFT MVP
“Code with Passion and
Strive for Excellence”
nileshgule @nileshgule Nilesh Gule
NileshGule
www.handsonarchitect.com

What's hot

Event driven autoscaling with KEDANilesh Gule

Autoscaling containers with event driven workloadsNilesh Gule

Improve monitoring and observability for kubernetes with oss toolsNilesh Gule

Scaling containers with kedaNilesh Gule

Building cloud native apps with .net core 3.0 and kubernetesNilesh Gule

Improve Monitoring and Observability for Kubernetes with OSS toolsNilesh Gule

KEDA OverviewJeff Hollan

Resillient microservices with AKSNilesh Gule

Cncf event driven autoscaling with kedaJurajHantk

Application Autoscaling Made Easy with Kubernetes Event-Driven Autoscaling (K...Codit

Tu non puoi passare! Policy compliance con OPA Gatekeeper | Niccolò RaspaKCDItaly

Azuresatpn19 - An Introduction To Azure Data FactoryRiccardo Perico

Building an intelligent big data application in 30 minutesClaudiu Barbura

AZUG Lightning Talk - Application autoscaling on Kubernetes with Kubernetes E...Tom Kerkhove

Global Azure Virtual - Application Autoscaling with KEDATom Kerkhove

Migrating SSIS to the cloudKoenVerbeeck

Integrate UK 2019 - Adventures of building a (multi-tenant) PaaS on Microsoft...Tom Kerkhove

Container orchestration k8s azure kubernetes servicesRajesh Kolla

Tokyo Azure Meetup #29 AKSKenichiro Nakamura

TIAD : Automate everything with Google CloudThe Incredible Automation Day

What's hot (20)

Event driven autoscaling with KEDA

Autoscaling containers with event driven workloads

Improve monitoring and observability for kubernetes with oss tools

Scaling containers with keda

Building cloud native apps with .net core 3.0 and kubernetes

Improve Monitoring and Observability for Kubernetes with OSS tools

KEDA Overview

Resillient microservices with AKS

Cncf event driven autoscaling with keda

Application Autoscaling Made Easy with Kubernetes Event-Driven Autoscaling (K...

Tu non puoi passare! Policy compliance con OPA Gatekeeper | Niccolò Raspa

Azuresatpn19 - An Introduction To Azure Data Factory

Building an intelligent big data application in 30 minutes

AZUG Lightning Talk - Application autoscaling on Kubernetes with Kubernetes E...

Global Azure Virtual - Application Autoscaling with KEDA

Migrating SSIS to the cloud

Integrate UK 2019 - Adventures of building a (multi-tenant) PaaS on Microsoft...

Container orchestration k8s azure kubernetes services

Tokyo Azure Meetup #29 AKS

TIAD : Automate everything with Google Cloud

Similar to Apache Spark for .Net Devs

Mining public datasets using opensource tools: Zeppelin, Spark and Jujuseoul_engineer

Getting Started With Azure Container Apps.pdfNilesh Gule

Build Secure Portable Applications using AKS and its ecosystemNilesh Gule

DevSecCon Singapore 2018 - in graph we trust By Imran MohammedDevSecCon

In graph we trust: Microservices, GraphQL and security challengesMohammed A. Imran

Portable Multi-cloud Microservices with Dapr .pptxNilesh Gule

Portable Multi-cloud Microservices with Dapr .pdfNilesh Gule

Big data workloads using Apache Sparkon HDInsightNilesh Gule

Portable Multi-cloud Microservices with Dapr .pdfNilesh Gule

Part 3 - Modern Data Warehouse with Azure SynapseNilesh Gule

Why contribute to open source projectsKranti Parisa

Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Timothy Spann

Autoscale applications based on external events with KEDA.pdfNilesh Gule

Modern Data Warehouse using Azure.pdfNilesh Gule

Building a Dev/Test Cloud with Apache CloudStackke4qqq

Neos CMS and SEOSebastian Helzle

Put iOS and Android on the same Wavelength with Serverless MicroservicesNeil Power

CI CD with Docker and Kubernetes Nilesh Gule

Spark Hsinchu meetupYung-An He

ApacheCon NA 2019 : Customer segmentation and personalization using apache unomiSerge Huber

Similar to Apache Spark for .Net Devs (20)

Mining public datasets using opensource tools: Zeppelin, Spark and Juju

Getting Started With Azure Container Apps.pdf

Build Secure Portable Applications using AKS and its ecosystem

DevSecCon Singapore 2018 - in graph we trust By Imran Mohammed

In graph we trust: Microservices, GraphQL and security challenges

Portable Multi-cloud Microservices with Dapr .pptx

Portable Multi-cloud Microservices with Dapr .pdf

Big data workloads using Apache Sparkon HDInsight

Portable Multi-cloud Microservices with Dapr .pdf

Part 3 - Modern Data Warehouse with Azure Synapse

Why contribute to open source projects

Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31

Autoscale applications based on external events with KEDA.pdf

Modern Data Warehouse using Azure.pdf

Building a Dev/Test Cloud with Apache CloudStack

Neos CMS and SEO

Put iOS and Android on the same Wavelength with Serverless Microservices

CI CD with Docker and Kubernetes

Spark Hsinchu meetup

ApacheCon NA 2019 : Customer segmentation and personalization using apache unomi

Recently uploaded

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh

AI as an Interface for Commercial BuildingsMemoori

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

How to Remove Document Management Hurdles with X-Docs?XfilesPro

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4

Slack Application Development 101 Slidespraypatel2

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Recently uploaded (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

Benefits Of Flutter Compared To Other Frameworks

Injustice - Developers Among Us (SciFiDevCon 2024)

Breaking the Kubernetes Kill Chain: Host Path Mount

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi

AI as an Interface for Commercial Buildings

Unblocking The Main Thread Solving ANRs and Frozen Frames

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Presentation on how to chat with PDF using ChatGPT code interpreter

08448380779 Call Girls In Civil Lines Women Seeking Men

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

How to Remove Document Management Hurdles with X-Docs?

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Azure Monitor & Application Insight to monitor Infrastructure & Application

Slack Application Development 101 Slides

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Apache Spark for .Net Devs

1. Nilesh Gule @nileshgule | www.HandsOnArchitect.com Big Data for .Net Devs with Apache Spark

2. $whoami { “name” : “Nilesh Gule”, “website” : “https://www.HandsOnArchitect.com", “github” : “https://github.com/NileshGule" “twitter” : “@nileshgule”, “linkedin” : “https://www.linkedin.com/in/nileshgule”, “likes” : “Technical Evangelism, Cricket”, “co-organizer” : “Azure Singapore UG” }

4. What is Apache Spark https://spark.apache.org/

5. Apache Spark Data Sources https://posts.specterops.io/threat-hunting-with- jupyter-notebooks-part-3-querying-elasticsearch- via-apache-spark-670054cd9d47

6. Benefits of using Apache Spark • Speed • Up to 100x faster compared to Map Reduce • Ease of use • Easy to use API’s • Multi language support • 100+ operators • Unified engine • Higher level libraries & support for SQL Queries, streaming data, machine learning and graph processing • Runs everywhere • Hadoop, standalone, Mesos, Kubernetes, cloud https://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html

7. Apache Spark Components • Dataset, DataFrame, RDD • Distributed collection of data • SparkSession • Entry point into Spark API • SparkContext, SQLContext, StreamingContext unified into one • Executors • Handles distributed processing • Transformations & Actions • Transformations – lazy operations that returns immutable data structures • Actions – apply operations and return value or write data to external storage

8. Spark Common Transformations • map • flatMap • filter • Distinct • Sample(withReplacement, ..) • Union • Intersection • Subtract • cartesian • reduceByKey • groupByKey • sortByKey • Join • repartition

9. Spark Common Actions • collect • count • countByValue • Take(num) • Top(num) • Reduce(func) • Fold(zero)(func) • saveAsTextFile(path) • saveAsSequenceFile(path) • countByKey()

10. What is .Net for Apache Spark • .Net bindings for Spark written on Spark interop layer • Provides high performance bindings for C# and F# • Compliant with .Net standard https://devblogs.microsoft.com/dotnet/introducing-net-for-apache-spark/#performance

11. Demo • MovieLens Datatset • CSV files in Azure Data Lake Storage • Spark pools using Azure Synapse analytics

12. Summary • Apache Spark is great for Big Data Analytics • .Net for Apache Spark provides .Net language bindings to Spark • Azure Synapse Analytics has native support for C#

13.  Apache Spark  .Net for Apache Spark  MovieLens datasets  Azure Synapse Analytics

14. https://youtu.be/KhMKXQkIzKw https://channel9.msdn.com/Series/NET-for-Apache-Spark-101

15.

16. Thank you very much Code with Passion and Strive for Excellence https://www.slideshare.net/nileshgule/presentations https://speakerdeck.com/nileshgule/

17. Nilesh Gule ARCHITECT | MICROSOFT MVP “Code with Passion and Strive for Excellence” nileshgule @nileshgule Nilesh Gule NileshGule www.handsonarchitect.com

18. Q&A

Apache Spark for .Net Devs

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Apache Spark for .Net Devs

Similar to Apache Spark for .Net Devs (20)

More from Nilesh Gule

More from Nilesh Gule (20)

Recently uploaded

Recently uploaded (20)

Apache Spark for .Net Devs