Spark is an open-source cluster computing framework that allows processing of large datasets in parallel. It supports multiple languages and provides advanced analytics capabilities. Spark SQL was built to overcome limitations of Apache Hive by running on Spark and providing a unified data access layer, SQL support, and better performance on medium and small datasets. Spark SQL uses DataFrames and a SQLContext to allow SQL queries on different data sources like JSON, Hive tables, and Parquet files. It provides a scalable architecture and integrates with Spark's RDD API.