SlideShare a Scribd company logo
Spark SQL
Code Examples
Background
• Spark SQL is Spark's module for
working with structured data.
• Spark SQL lets you query structured
data inside Spark programs, using
either SQL or a familiar DataFrame API.
Usable in Java, Scala, Python and R.
• Born out of Shark project at Berkeley
Assumptions
These slides and examples assume you
already have at least a basic understanding
of Spark constructs such as RDDs, Actions,
Transformers.
Resources
To learn more about Spark, checkout
supergloo’s free Spark Tutorials
Introduction
• DataFrames are a kind of Resilient Distributed Data Set
• DataFrames are composed of Row objects accompanied
with schema which describes the data types of each
column.
• A DataFrame may be considered similar to a table in a
traditional relational database
1. $SPARK_HOME/bin/spark-shell --packages
com.databricks:spark-csv_2.10:1.3.0
2. scala>val baby_names =
sqlContext.read.format("com.databricks.spark.csv").option("he
ader", "true").option("inferSchema",
“true").load("baby_names.csv")
3. scala> baby_names.registerTempTable(“names")
4. scala> val distinctYears = sqlContext.sql("select distinct Year
from names”)
5. scala> distinctYears.collect.foreach(println)
Spark SQL with CSV
JSON in following examples:
{"first_name":"James", "last_name":"Butterburg", "address":
{"street": "6649 N Blue Gum St", "city": "New Orleans","state":
"LA", "zip": "70116" }}
{"first_name":"Josephine", "last_name":"Darakjy", "address":
{"street": "4 B Blue Ridge Blvd", "city": "Brighton","state": "MI",
"zip": "48116" }}
{"first_name":"Art", "last_name":"Chemel", "address": {"street": "8
W Cerritos Ave #54", "city": "Bridgeport","state": "NJ", "zip":
"08014" }}
Spark SQL with JSON (slide 1 of 2)
1. $SPARK_HOME/bin/spark-shell
2. scala> val customers =
sqlContext.jsonFile(“customers.json")
3. scala> customers.registerTempTable(“customers")
4. scala> val firstCityState = sqlContext.sql("SELECT
first_name, address.city, address.state FROM
customers")
Spark SQL with JSON (slide 2 of 2)
Requirements
1. MySQL instance
2. MySQL JDBC driver
Spark SQL with JDBC mySQL (slide 1 of 2)
1. $SPARK_HOME/bin/spark-shell –jars mysql-connector-
java-5.1.26.jar
2. val dataframe_mysql = sqlContext.read.format("jdbc").option("url",
"jdbc:mysql://localhost/sparksql").option("driver",
"com.mysql.jdbc.Driver").option("dbtable",
"baby_names").option("user", "root").option("password",
“root").load()
3. scala> dataframe_mysql.registerTempTable(“names")
4. scala> dataframe_mysql.sqlContext.sql("select * from
names”).collect.foreach(println)
Spark SQL with JDBC mySQL (slide 2 of 2)
Conclusion
For more Spark SQL and other Spark tutorials visit:
http://www.supergloo.com/
Credit
Title slide image: https://flic.kr/p/8wFrUX

More Related Content

What's hot

Spark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to KnowSpark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to Know
Kristian Alexander
 
Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5
Haoyuan Li
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
Caserta
 
Spark meetup v2.0.5
Spark meetup v2.0.5Spark meetup v2.0.5
Spark meetup v2.0.5
Yan Zhou
 
DataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL WorkshopDataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL Workshop
Hakka Labs
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
Spark Summit
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
Databricks
 
Spark sql
Spark sqlSpark sql
Spark sql
Freeman Zhang
 
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on TutorialsSparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Databricks
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
jeykottalam
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
Databricks
 
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Databricks
 
Spark sql
Spark sqlSpark sql
Spark sql
Zahra Eskandari
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
Joud Khattab
 
Pivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew RayPivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew Ray
Spark Summit
 
Spark etl
Spark etlSpark etl
Spark etl
Imran Rashid
 
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
 Lessons from the Field, Episode II: Applying Best Practices to Your Apache S... Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
Databricks
 
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerDeep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Sachin Aggarwal
 
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Edureka!
 
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Jason L Brugger
 

What's hot (20)

Spark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to KnowSpark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to Know
 
Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Spark meetup v2.0.5
Spark meetup v2.0.5Spark meetup v2.0.5
Spark meetup v2.0.5
 
DataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL WorkshopDataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL Workshop
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Spark sql
Spark sqlSpark sql
Spark sql
 
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on TutorialsSparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
 
Spark sql
Spark sqlSpark sql
Spark sql
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Pivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew RayPivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew Ray
 
Spark etl
Spark etlSpark etl
Spark etl
 
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
 Lessons from the Field, Episode II: Applying Best Practices to Your Apache S... Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
 
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerDeep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
 
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
 
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
 

Viewers also liked

Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]
knowbigdata
 
10 Popular Hadoop Technical Interview Questions
10 Popular Hadoop Technical Interview Questions10 Popular Hadoop Technical Interview Questions
10 Popular Hadoop Technical Interview Questions
ZaranTech LLC
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questions
Kalyan Hadoop
 
Top 10 architect interview questions and answers
Top 10 architect interview questions and answersTop 10 architect interview questions and answers
Top 10 architect interview questions and answers
WhitneyHouston012
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
Edureka!
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
Kalyan Hadoop
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
Kalyan Hadoop
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Cloudera, Inc.
 

Viewers also liked (8)

Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]
 
10 Popular Hadoop Technical Interview Questions
10 Popular Hadoop Technical Interview Questions10 Popular Hadoop Technical Interview Questions
10 Popular Hadoop Technical Interview Questions
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questions
 
Top 10 architect interview questions and answers
Top 10 architect interview questions and answersTop 10 architect interview questions and answers
Top 10 architect interview questions and answers
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
 

Similar to Spark SQL with Scala Code Examples

Learning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQLLearning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQL
phanleson
 
Building highly scalable data pipelines with Apache Spark
Building highly scalable data pipelines with Apache SparkBuilding highly scalable data pipelines with Apache Spark
Building highly scalable data pipelines with Apache Spark
Martin Toshev
 
Intro to Spark
Intro to SparkIntro to Spark
Intro to Spark
Kyle Burke
 
Spark core
Spark coreSpark core
Spark core
Prashant Gupta
 
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and REnabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and R
Databricks
 
Let's start with Spark
Let's start with SparkLet's start with Spark
Let's start with Spark
Milos Milovanovic
 
Spark from the Surface
Spark from the SurfaceSpark from the Surface
Spark from the Surface
Josi Aranda
 
Cassandra Lunch #89: Semi-Structured Data in Cassandra
Cassandra Lunch #89: Semi-Structured Data in CassandraCassandra Lunch #89: Semi-Structured Data in Cassandra
Cassandra Lunch #89: Semi-Structured Data in Cassandra
Anant Corporation
 
Apache spark online training - GoLogica
Apache spark online training - GoLogicaApache spark online training - GoLogica
Apache spark online training - GoLogica
GoLogica Technologies
 
Introduction to SparkR
Introduction to SparkRIntroduction to SparkR
Introduction to SparkR
Olgun Aydın
 
Introduction to SparkR
Introduction to SparkRIntroduction to SparkR
Introduction to SparkR
Ankara Big Data Meetup
 
Strata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache SparkStrata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache Spark
Databricks
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
sudhakara st
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Simplilearn
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Databricks
 
실시간 Streaming using Spark and Kafka 강의교재
실시간 Streaming using Spark and Kafka 강의교재실시간 Streaming using Spark and Kafka 강의교재
실시간 Streaming using Spark and Kafka 강의교재
hkyoon2
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
Farzad Nozarian
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
Ahmet Bulut
 
Machine Learning with SparkR
Machine Learning with SparkRMachine Learning with SparkR
Machine Learning with SparkR
Olgun Aydın
 

Similar to Spark SQL with Scala Code Examples (20)

Learning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQLLearning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQL
 
Building highly scalable data pipelines with Apache Spark
Building highly scalable data pipelines with Apache SparkBuilding highly scalable data pipelines with Apache Spark
Building highly scalable data pipelines with Apache Spark
 
Intro to Spark
Intro to SparkIntro to Spark
Intro to Spark
 
Spark core
Spark coreSpark core
Spark core
 
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and REnabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and R
 
Let's start with Spark
Let's start with SparkLet's start with Spark
Let's start with Spark
 
Spark from the Surface
Spark from the SurfaceSpark from the Surface
Spark from the Surface
 
Cassandra Lunch #89: Semi-Structured Data in Cassandra
Cassandra Lunch #89: Semi-Structured Data in CassandraCassandra Lunch #89: Semi-Structured Data in Cassandra
Cassandra Lunch #89: Semi-Structured Data in Cassandra
 
Apache spark online training - GoLogica
Apache spark online training - GoLogicaApache spark online training - GoLogica
Apache spark online training - GoLogica
 
Introduction to SparkR
Introduction to SparkRIntroduction to SparkR
Introduction to SparkR
 
Introduction to SparkR
Introduction to SparkRIntroduction to SparkR
Introduction to SparkR
 
Strata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache SparkStrata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache Spark
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
실시간 Streaming using Spark and Kafka 강의교재
실시간 Streaming using Spark and Kafka 강의교재실시간 Streaming using Spark and Kafka 강의교재
실시간 Streaming using Spark and Kafka 강의교재
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
Machine Learning with SparkR
Machine Learning with SparkRMachine Learning with SparkR
Machine Learning with SparkR
 

Recently uploaded

DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 

Recently uploaded (20)

DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 

Spark SQL with Scala Code Examples

  • 2. Background • Spark SQL is Spark's module for working with structured data. • Spark SQL lets you query structured data inside Spark programs, using either SQL or a familiar DataFrame API. Usable in Java, Scala, Python and R. • Born out of Shark project at Berkeley
  • 3. Assumptions These slides and examples assume you already have at least a basic understanding of Spark constructs such as RDDs, Actions, Transformers.
  • 4. Resources To learn more about Spark, checkout supergloo’s free Spark Tutorials
  • 5. Introduction • DataFrames are a kind of Resilient Distributed Data Set • DataFrames are composed of Row objects accompanied with schema which describes the data types of each column. • A DataFrame may be considered similar to a table in a traditional relational database
  • 6. 1. $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.3.0 2. scala>val baby_names = sqlContext.read.format("com.databricks.spark.csv").option("he ader", "true").option("inferSchema", “true").load("baby_names.csv") 3. scala> baby_names.registerTempTable(“names") 4. scala> val distinctYears = sqlContext.sql("select distinct Year from names”) 5. scala> distinctYears.collect.foreach(println) Spark SQL with CSV
  • 7. JSON in following examples: {"first_name":"James", "last_name":"Butterburg", "address": {"street": "6649 N Blue Gum St", "city": "New Orleans","state": "LA", "zip": "70116" }} {"first_name":"Josephine", "last_name":"Darakjy", "address": {"street": "4 B Blue Ridge Blvd", "city": "Brighton","state": "MI", "zip": "48116" }} {"first_name":"Art", "last_name":"Chemel", "address": {"street": "8 W Cerritos Ave #54", "city": "Bridgeport","state": "NJ", "zip": "08014" }} Spark SQL with JSON (slide 1 of 2)
  • 8. 1. $SPARK_HOME/bin/spark-shell 2. scala> val customers = sqlContext.jsonFile(“customers.json") 3. scala> customers.registerTempTable(“customers") 4. scala> val firstCityState = sqlContext.sql("SELECT first_name, address.city, address.state FROM customers") Spark SQL with JSON (slide 2 of 2)
  • 9. Requirements 1. MySQL instance 2. MySQL JDBC driver Spark SQL with JDBC mySQL (slide 1 of 2)
  • 10. 1. $SPARK_HOME/bin/spark-shell –jars mysql-connector- java-5.1.26.jar 2. val dataframe_mysql = sqlContext.read.format("jdbc").option("url", "jdbc:mysql://localhost/sparksql").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "baby_names").option("user", "root").option("password", “root").load() 3. scala> dataframe_mysql.registerTempTable(“names") 4. scala> dataframe_mysql.sqlContext.sql("select * from names”).collect.foreach(println) Spark SQL with JDBC mySQL (slide 2 of 2)
  • 11. Conclusion For more Spark SQL and other Spark tutorials visit: http://www.supergloo.com/
  • 12. Credit Title slide image: https://flic.kr/p/8wFrUX