Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Published on
This talk was presented at Spark Summit East 2016. Below is the abstract:
In this talk, we’ll discuss the challenges of analyzing large-scale time series data sets and introduce the TS-for-Spark library. Whether we need to build models over data coming in every second from thousands of sensors of dig into the histories of millions of financial instruments, large scale time series data shows up in a variety of domains. Time series data has an innate structure not found in other data sets, and thus presents both unique challenges and opportunities. The open source Spark-TS package provides both Scala and Python APIs for munging, manipulating, and modeling time series data, on top of Spark. We'll cover its core abstractions, like the TimeSeriesRDD and DateTimeIndex, as well as some of the statistical modeling functionality it provides on top of them.