Spark is an in-memory cluster computing framework that allows processing of large datasets across clusters of computers using simple programming models. It was developed at UC Berkeley in 2009 and became an Apache project in 2013. Spark is now the most active big data project within the Apache Software Foundation and provides APIs for Scala, Java, Python and an interface for SQL queries. Spark is up to 100 times faster than Hadoop for iterative/interactive jobs and can run up to 10 times faster on disk due to its in-memory computing capabilities.