What is Spark?
Features of Apache Spark
Speed
Supports multiple languages
Advanced Analytics
Spark Built on Hadoop
Components of Spark
What is RDD?
Iterative Operations on Spark RDD
Interactive Operations on Spark
RDD
Spark Streaming
What is SparkContext?
Initializing a SparkContext in
Scala
 Import org.apache.spark.SparkConf
 Import org.apache.spark.SparkContext
 Import org.apache.spark.SparkContext._
 Val conf = new
SparkConf().setMaster("local").setAppN
ame("My App") valsc = new
SparkContext(conf)
Limitation of Hive
 HIVE uses Map-Reduce which lags in
performance with medium and small
sized data-sets(<200GB).
 No resume capability.
 Hive content drop encrypted Databases.
(SPARK SQL WAS BUILT TO
OVERCOME THE LIMITATIONS OF
APACHE HIVE RUNNING ON TOP OF
SPARK.)
What is Spark SQL?
Features of Spark SQL
Integrated
Unified Data Access
Hive Compatibility
Standard Connectivity
Scalability
Advantages of Spark SQL
Spark SQL Architecture
Spark SQL-DataFrames
Spark SQLContext
Spark SQL – Data Sources
A DataFrame interface allows different Data Sources to
work on Spark SQL. It is a temporary table and can be
operated as a normal RDD. Registering a DataFrame as a
table allows you to run SQL queries over its data.
There are different types of data sources available in
Spark SQL, some of which are listed below –
 JSON Datasets
 Hive Tables
 Parquet Files
Data Analysis Flow Diagram
Fig: Spark SQL Flow Chart
SWITCHING TO SPARK-
SHELL
 SWITCHING FROM SPARK-CONTEXT TO
SQL
IMPORTING ALL PACKAGES(TO
CONVERT RDD TO DATA-SET)
READING THE FILE AND SCHEMA
CHECKING DONE
Scala>df.registerTemoTable(“Terror”)
We have to register this table class as a temp table
Classify attacks on the basis of gang name.
Classify the motive of attack on India.
Conclusion
References
 Useful Links on Spark SQL
 Spark SQL Wiki - Wikipedia Reference for Spark
SQL.
 https://data-flair.training/blogs/spark-sql-tutorial/
 Useful Books on Spark SQL
 O’Reilly- Learning Spark by Holden Karau, Andy
Konwinski, Patric Wendell &MateiZaharia
 O’Reilly- Advanced Analytics with Spark by Sandy
Ryza, Uri Laserson, Sean Owen & Josh Wills
Spark

Spark