In the data analytics space none can argue that Spark has become the preferred tool for the Data Scientist, Business Analyst and for the Developer. At Intel, Spark is widely used across the organization to interact with Hive, to process streaming data, to ingest data from diverse sources to be used in machine learning or data analytics. In this presentation, we want to share how reusable ingestion components using Spark-sql has accelerated our application development phase. We will be discussing the challenges we faced at Intel when running Spark-on-yarn applications. Also have you spent time wondering why your Spark-sql query was running very slowly or pondering different methods for ingesting data faster from an RDBMs? We will review Spark-on-yarn deployment and configuration. We will also describe the challenges posed by handling and processing large data-sets. Finally, we will share recommendations on how to tune spark jobs to optimize job performance by properly allocating resources.