SlideShare a Scribd company logo
1 of 49
Download to read offline
www.twosigma.com
Huohua 火花
Distributed Time Series Analysis
Framework For Spark
August 28, 2017
Wenbo Zhao
Spark Summit 2016
About Me
August 28, 2017
 Software Engineer @
 Focus on analytics related tools, libraries and Systems
$0.0
$500.0
$1,000.0
$1,500.0
$2,000.0
$2,500.0
1/3/1950
1/3/1953
1/3/1956
1/3/1959
1/3/1962
1/3/1965
1/3/1968
1/3/1971
1/3/1974
1/3/1977
1/3/1980
1/3/1983
1/3/1986
1/3/1989
1/3/1992
1/3/1995
1/3/1998
1/3/2001
1/3/2004
1/3/2007
1/3/2010
1/3/2013
1/3/2016
S&P 500
We view everything as a time series
August 28, 2017
 Stock market prices
 Temperatures
 Sensor logs
 Presidential polls
 …
50°F
55°F
60°F
65°F
70°F
75°F
80°F
85°F
90°F
95°F
100°F
New York
San Francisco
What is a time series?
August 28, 2017
 A sequence of observations obtained in successive time order
 Our goal is to forecast future values given past observations
$8.90
$8.95
$8.90
$9.06
$9.10
8:00 11:00 14:00 17:00 20:00
corn price
?
Multivariate time series
August 28, 2017
 We can forecast better by joining multiple time series
 Temporal join is a fundamental operation for time series analysis
 Huohua enables fast distributed temporal join of large scale unaligned time series
$8.90
$8.95
$8.90
$9.06
$9.10
8:00 11:00 14:00 17:00 20:00
corn price
75°F
72°F
71°F
72°F
68°F
67°F
65°F
temperature
What is temporal join?
August 28, 2017
 A particular join function defined by a matching criteria over time
 Examples of criteria
 look-backward – find the most recent observation in the past
 look-forward – find the closest observation in the future
time series 1 time series 2
look-forward
time series 1 time series 2
look-backward
observation
Temporal join with look-backward criteria
August 28, 2017
time weather
08:00 AM 60 °F
10:00 AM 70 °F
12:00 AM 80 °F
time corn price
08:00 AM
11:00 AM
time weather corn price
08:00 AM
10:00 AM
12:00 AM
Temporal join with look-backward criteria
August 28, 2017
time weather
08:00 AM
10:00 AM
12:00 AM
time corn price
08:00 AM
11:00 AM
time weather corn price
08:00 AM 60 °F
10:00 AM
12:00 AM
time weather
08:00 AM 60 °F
10:00 AM 70 °F
12:00 AM 80 °F
Temporal join with look-backward criteria
August 28, 2017
time weather
08:00 AM 60 °F
10:00 AM 70 °F
12:00 AM 80 °F
time corn price
08:00 AM
11:00 AM
time weather corn price
08:00 AM 60 °F
10:00 AM 70 °F
12:00 AM
Temporal join with look-backward criteria
August 28, 2017
time weather
08:00 AM 60 °F
10:00 AM 70 °F
12:00 AM 80 °F
time corn price
08:00 AM
11:00 AM
time weather corn price
08:00 AM 60 °F
10:00 AM 70 °F
12:00 AM 80 °F
time corn price
08:00 AM
11:00 AM
time corn price
08:00 AM
11:00 AM
time corn price
08:00 AM
11:00 AM
time weather
08:00 AM 60 °F
10:00 AM 70 °F
12:00 AM 80 °F
time weather
08:00 AM 60 °F
10:00 AM 70 °F
12:00 AM 80 °F
time weather
08:00 AM 60 °F
10:00 AM 70 °F
12:00 AM 80 °F
Temporal join with look-backward criteria
August 28, 2017
time weather
08:00 AM 60 °F
10:00 AM 70 °F
12:00 AM 80 °F
time corn price
08:00 AM
11:00 AM
time weather corn price
08:00 AM 60 °F
10:00 AM 70 °F
12:00 AM 80 °F
…
…
Hundreds of thousands of data
sources with unaligned timestamps
Thousands of market data sets
We need fast and scalable distributed temporal join
Issues with existing solutions
August 28, 2017
 A single time series may not fit into a single machine
 Forecasting may involve hundreds of time series
 Existing packages don’t support temporal join or can’t handle large time series
 MatLab, R, SAS, Pandas
 Even Spark based solutions fall short
 PairRDDFunctions, DataFrame/Dataset, spark-ts
Huohua – a new time series library for Spark
August 28, 2017
 Goal
 provide a collection of functions to manipulate and analyze time series at scale
 group, temporal join, summarize, aggregate …
 How
 build a time series aware data structure
 extending RDD to TimeSeriesRDD
 optimize using temporal locality
 reduce shuffling
 reduce memory pressure by streaming
What is a TimeSeriesRDD in Huohua?
August 28, 2017
 TimeSeriesRDD extends RDD to represent time series data
 associates a time range to each partition
 tracks partitions’ time-ranges through operations
 preserves the temporal order
TimeSeriesRDD
operations
time series
functions
TimeSeriesRDD– an RDD representing time series
August 28, 2017
time temperature
6:00 AM 60°F
6:01 AM 61°F
… …
7:00 AM 70°F
7:01 AM 71°F
… …
8:00 AM 80°F
8:01 AM 81°F
… …
(6:00 AM, 60°F)
(6:01 AM, 61°F)
…
RDD
(7:00 AM, 70°F)
(7:01 AM, 71°F)
…
(8:00 AM, 80°F)
(8:01 AM, 81°F)
…
TimeSeriesRDD– an RDD representing time series
August 28, 2017
range:
[06:00 AM, 07:00 AM)
range:
[07:00 AM, 8:00 AM)
range:
[8:00 AM, ∞)
TimeSeriesRDDtime temperature
6:00 AM 60°F
6:01 AM 61°F
… …
7:00 AM 70°F
7:01 AM 71°F
… …
8:00 AM 80°F
8:01 AM 81°F
… …
(6:00 AM, 60°F)
(6:01 AM, 61°F)
…
(7:00 AM, 70°F)
(7:01 AM, 71°F)
…
(8:00 AM, 80°F)
(8:01 AM, 81°F)
…
Group function
August 28, 2017
 A group function groups rows with exactly the same timestamps
time city temperature
1:00 PM New York 70°F
1:00 PM San Francisco 60°F
2:00 PM New York 71°F
2:00 PM San Francisco 61°F
3:00 PM New York 72°F
3:00 PM San Francisco 62°F
4:00 PM New York 73°F
4:00 PM San Francisco 63°F
group 1
group 2
group 3
group 4
Group function
August 28, 2017
 A group function groups rows with nearby timestamps
time city temperature
1:00 PM New York 70°F
1:00 PM San Francisco 60°F
2:00 PM New York 71°F
2:00 PM San Francisco 61°F
3:00 PM New York 72°F
3:00 PM San Francisco 62°F
4:00 PM New York 73°F
4:00 PM San Francisco 63°F
group 1
group 2
Group in Spark
August 28, 2017
 Groups rows with exactly the same timestamps
RDD
1:00PM
2:00PM
2:00PM
1:00PM
3:00PM
3:00PM
4:00PM
4:00PM
 Data is shuffled and materialized
Group in Spark
August 28, 2017
RDD
groupBy
RDD
1:00PM
2:00PM
2:00PM
1:00PM
3:00PM
3:00PM
4:00PM
4:00PM
Group in Spark
August 28, 2017
 Data is shuffled and materialized
RDD
groupBy
RDD
1:00PM 1:00PM
3:00PM 3:00PM
2:00PM
4:00PM
2:00PM
4:00PM
Group in Spark
August 28, 2017
 Data is shuffled and materialized
RDD
groupBy
RDD
1:00PM 1:00PM
2:00PM 2:00PM
3:00PM 3:00PM
4:00PM 4:00PM
Group in Spark
August 28, 2017
 Temporal order is not preserved
RDD
groupBy
RDD
1:00PM
2:00PM
2:00PM
1:00PM
3:00PM
3:00PM
4:00PM
4:00PM
1:00PM 1:00PM
2:00PM 2:00PM
3:00PM 3:00PM
4:00PM 4:00PM
Group in Spark
August 28, 2017
 Another sort is required
RDD
groupBy sortBy
RDD RDD
1:00PM
2:00PM
2:00PM
1:00PM
3:00PM
3:00PM
4:00PM
4:00PM
1:00PM 1:00PM
2:00PM 2:00PM
3:00PM 3:00PM
4:00PM 4:00PM
Group in Spark
August 28, 2017
 Another sort is required
RDD
groupBy sortBy
RDD RDD
1:00PM
2:00PM
2:00PM
1:00PM
3:00PM
3:00PM
4:00PM
4:00PM
2:00PM 2:00PM
4:00PM 4:00PM
1:00PM 1:00PM
3:00PM 3:00PM
Group in Spark
August 28, 2017
 Back to correct temporal order
RDD
groupBy sortBy
RDD RDD
1:00PM
2:00PM
2:00PM
1:00PM
3:00PM
3:00PM
4:00PM
4:00PM
1:00PM 1:00PM
2:00PM 2:00PM
3:00PM 3:00PM
4:00PM 4:00PM
Group in Spark
August 28, 2017
 Back to temporal order
RDD
groupBy sortBy
RDD RDD
1:00PM
2:00PM
2:00PM
1:00PM
3:00PM
3:00PM
4:00PM
4:00PM
1:00PM 1:00PM
2:00PM 2:00PM
3:00PM 3:00PM
4:00PM 4:00PM
1:00PM 1:00PM
2:00PM 2:00PM
3:00PM 3:00PM
4:00PM 4:00PM
Group in Huohua
August 28, 2017
 Data is grouped locally as streams
TimeSeriesRDD
1:00PM
2:00PM
2:00PM
1:00PM
3:00PM
3:00PM
4:00PM
4:00PM
Group in Huohua
August 28, 2017
 Data is grouped locally as streams
TimeSeriesRDD
1:00PM
2:00PM
2:00PM
1:00PM
3:00PM
3:00PM
4:00PM
4:00PM
Group in Huohua
August 28, 2017
 Data is grouped locally as streams
TimeSeriesRDD
1:00PM
2:00PM
1:00PM
3:00PM 3:00PM
4:00PM
4:00PM
2:00PM
Group in Huohua
August 28, 2017
 Data is grouped locally as streams
TimeSeriesRDD
1:00PM
2:00PM
1:00PM
3:00PM 3:00PM
4:00PM 4:00PM
2:00PM
Benchmark for group
August 28, 2017
 Running time of count after group
 16 executors (10G memory and 4 cores per executor)
 data is read from HDFS
0s
20s
40s
60s
80s
100s
20M 40M 60M 80M 100M
RDD DataFrame TimeseriesRDD
Temporal join
August 28, 2017
 A temporal join function is defined by a matching criteria over time
 A typical matching criteria has two parameters
 direction – whether it should look-backward or look-forward
 window - how much it should look-backward or look-forward
look-backward temporal join
window
Temporal join
August 28, 2017
 A temporal join function is defined by a matching criteria over time
 A typical matching criteria has two parameters
 direction – whether it should look-backward or look-forward
 window - how much it should look-backward or look-forward
look-backward temporal join
window
Temporal join
August 28, 2017
 Temporal join with criteria look-back and window of length 1
2:00AM
1:00AM
4:00AM
5:00AM
1:00AM
3:00AM
5:00AM
time series time series
Temporal join
August 28, 2017
 Temporal join with criteria look-back and window of length 1
 How do we do temporal join in TimeSeriesRDD?
TimeSeriesRDD TimeSeriesRDD
2:00AM
1:00AM
4:00AM
5:00AM
1:00AM
3:00AM
5:00AM
Temporal join in Huohua
August 28, 2017
 Temporal join with criteria look-back and window of length 1
 partition time space into disjoint intervals
TimeSeriesRDD TimeSeriesRDDjoined
2:00AM
1:00AM
4:00AM
5:00AM
1:00AM
3:00AM
5:00AM
Temporal join in Huohua
August 28, 2017
 Temporal join with criteria look-back and window of length 1
 Build dependency graph for the joined TimeSeriesRDD
TimeSeriesRDD TimeSeriesRDDjoined
2:00AM
1:00AM
4:00AM
5:00AM
1:00AM
3:00AM
5:00AM
Temporal join in Huohua
August 28, 2017
 Temporal join with criteria look-back and window 1
 Join data as streams per partition
1:00AM 1
TimeSeriesRDD TimeSeriesRDDjoined
1:00AM 1:00AM1:00AM
2:00AM
4:00AM
5:00AM
3:00AM
5:00AM
Temporal join in Huohua
August 28, 2017
 Temporal join with criteria look-back and window 1
 Join data as streams
2:00AM
1:00AM
4:00AM
5:00AM
1:00AM
3:00AM
5:00AM
TimeSeriesRDD TimeSeriesRDDjoined
1:00AM 1:00AM1:00AM
2:00AM
Temporal join in Huohua
August 28, 2017
 Temporal join with criteria look-back and window 1
 Join data as streams
2:00AM
1:00AM
4:00AM
5:00AM
1:00AM
3:00AM
5:00AM
TimeSeriesRDD TimeSeriesRDDjoined
1:00AM
1:00AM
1:00AM
2:00AM
4:00AM
3:00AM
Temporal join in Huohua
August 28, 2017
 Temporal join with criteria look-back and window 1
 Join data as streams
2:00AM
1:00AM
4:00AM
5:00AM
1:00AM
3:00AM
5:00AM
TimeSeriesRDD TimeSeriesRDDjoined
1:00AM
1:00AM
1:00AM
2:00AM
4:00AM 3:00AM
5:00AM 5:00AM
Benchmark for temporal join
August 28, 2017
 Running time of count after temporal join
 16 executors (10G memory and 4 cores per executor)
 data is read from HDFS
0s
20s
40s
60s
80s
100s
20M 40M 60M 80M 100M
RDD DataFrame TimeseriesRDD
Functions over TimeSeriesRDD
August 28, 2017
 group functions such as window, intervalization etc.
 temporal joins such as look-forward, look-backward etc.
 summarizers such as average, variance, z-score etc. over
 windows
 Intervals
 cycles
Open Source
August 28, 2017
 Not quite yet …
 https://github.com/twosigma
Future work
August 28, 2017
 Dataframe / Dataset integration
 Speed up
 Richer APIs
 Python bindings
 More summarizers
Key contributors
August 28, 2017
 Christopher Aycock
 Jonathan Coveney
 Jin Li
 David Medina
 David Palaitis
 Ris Sawyer
 Leif Walsh
 Wenbo Zhao
Thank you
August 28, 2017
 QA
This document is being distributed for informational and educational purposes only and is not an offer to sell or the solicitation of an offer to buy
any securities or other instruments. The information contained herein is not intended to provide, and should not be relied upon for investment
advice. The views expressed herein are not necessarily the views of Two Sigma Investments, LP or any of its affiliates (collectively, “Two Sigma”).
Such views reflect significant assumptions and subjective of the author(s) of the document and are subject to change without notice. The
document may employ data derived from third-party sources. No representation is made as to the accuracy of such information and the use of
such information in no way implies an endorsement of the source of such information or its validity.
The copyrights and/or trademarks in some of the images, logos or other material used herein may be owned by entities other than Two Sigma. If
so, such copyrights and/or trademarks are most likely owned by the entity that created the material and are used purely for identification and
comment as fair use under international copyright and/or trademark laws. Use of such image, copyright or trademark does not imply any
association with such organization (or endorsement of such organization) by Two Sigma, nor vice versa.

More Related Content

More from Two Sigma

Archival Storage at Two Sigma - Josh Leners
Archival Storage at Two Sigma - Josh LenersArchival Storage at Two Sigma - Josh Leners
Archival Storage at Two Sigma - Josh LenersTwo Sigma
 
Smooth Storage - A distributed storage system for managing structured time se...
Smooth Storage - A distributed storage system for managing structured time se...Smooth Storage - A distributed storage system for managing structured time se...
Smooth Storage - A distributed storage system for managing structured time se...Two Sigma
 
The Language of Compression - Leif Walsh
The Language of Compression - Leif WalshThe Language of Compression - Leif Walsh
The Language of Compression - Leif WalshTwo Sigma
 
Identifying Emergent Behaviors in Complex Systems - Jane Adams
Identifying Emergent Behaviors in Complex Systems - Jane AdamsIdentifying Emergent Behaviors in Complex Systems - Jane Adams
Identifying Emergent Behaviors in Complex Systems - Jane AdamsTwo Sigma
 
Algorithmic Data Science = Theory + Practice
Algorithmic Data Science = Theory + PracticeAlgorithmic Data Science = Theory + Practice
Algorithmic Data Science = Theory + PracticeTwo Sigma
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowTwo Sigma
 
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...Two Sigma
 
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...Two Sigma
 
Graph Summarization with Quality Guarantees
Graph Summarization with Quality GuaranteesGraph Summarization with Quality Guarantees
Graph Summarization with Quality GuaranteesTwo Sigma
 
Rademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeRademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeTwo Sigma
 
Credit-Implied Volatility
Credit-Implied VolatilityCredit-Implied Volatility
Credit-Implied VolatilityTwo Sigma
 
Principles of REST API Design
Principles of REST API DesignPrinciples of REST API Design
Principles of REST API DesignTwo Sigma
 

More from Two Sigma (12)

Archival Storage at Two Sigma - Josh Leners
Archival Storage at Two Sigma - Josh LenersArchival Storage at Two Sigma - Josh Leners
Archival Storage at Two Sigma - Josh Leners
 
Smooth Storage - A distributed storage system for managing structured time se...
Smooth Storage - A distributed storage system for managing structured time se...Smooth Storage - A distributed storage system for managing structured time se...
Smooth Storage - A distributed storage system for managing structured time se...
 
The Language of Compression - Leif Walsh
The Language of Compression - Leif WalshThe Language of Compression - Leif Walsh
The Language of Compression - Leif Walsh
 
Identifying Emergent Behaviors in Complex Systems - Jane Adams
Identifying Emergent Behaviors in Complex Systems - Jane AdamsIdentifying Emergent Behaviors in Complex Systems - Jane Adams
Identifying Emergent Behaviors in Complex Systems - Jane Adams
 
Algorithmic Data Science = Theory + Practice
Algorithmic Data Science = Theory + PracticeAlgorithmic Data Science = Theory + Practice
Algorithmic Data Science = Theory + Practice
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache Arrow
 
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
 
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
 
Graph Summarization with Quality Guarantees
Graph Summarization with Quality GuaranteesGraph Summarization with Quality Guarantees
Graph Summarization with Quality Guarantees
 
Rademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeRademacher Averages: Theory and Practice
Rademacher Averages: Theory and Practice
 
Credit-Implied Volatility
Credit-Implied VolatilityCredit-Implied Volatility
Credit-Implied Volatility
 
Principles of REST API Design
Principles of REST API DesignPrinciples of REST API Design
Principles of REST API Design
 

Recently uploaded

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 

Recently uploaded (20)

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 

HUOHUA: A Distributed Time Series Analysis Framework For Spark

  • 1. www.twosigma.com Huohua 火花 Distributed Time Series Analysis Framework For Spark August 28, 2017 Wenbo Zhao Spark Summit 2016
  • 2. About Me August 28, 2017  Software Engineer @  Focus on analytics related tools, libraries and Systems
  • 3. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 1/3/1950 1/3/1953 1/3/1956 1/3/1959 1/3/1962 1/3/1965 1/3/1968 1/3/1971 1/3/1974 1/3/1977 1/3/1980 1/3/1983 1/3/1986 1/3/1989 1/3/1992 1/3/1995 1/3/1998 1/3/2001 1/3/2004 1/3/2007 1/3/2010 1/3/2013 1/3/2016 S&P 500 We view everything as a time series August 28, 2017  Stock market prices  Temperatures  Sensor logs  Presidential polls  … 50°F 55°F 60°F 65°F 70°F 75°F 80°F 85°F 90°F 95°F 100°F New York San Francisco
  • 4. What is a time series? August 28, 2017  A sequence of observations obtained in successive time order  Our goal is to forecast future values given past observations $8.90 $8.95 $8.90 $9.06 $9.10 8:00 11:00 14:00 17:00 20:00 corn price ?
  • 5. Multivariate time series August 28, 2017  We can forecast better by joining multiple time series  Temporal join is a fundamental operation for time series analysis  Huohua enables fast distributed temporal join of large scale unaligned time series $8.90 $8.95 $8.90 $9.06 $9.10 8:00 11:00 14:00 17:00 20:00 corn price 75°F 72°F 71°F 72°F 68°F 67°F 65°F temperature
  • 6. What is temporal join? August 28, 2017  A particular join function defined by a matching criteria over time  Examples of criteria  look-backward – find the most recent observation in the past  look-forward – find the closest observation in the future time series 1 time series 2 look-forward time series 1 time series 2 look-backward observation
  • 7. Temporal join with look-backward criteria August 28, 2017 time weather 08:00 AM 60 °F 10:00 AM 70 °F 12:00 AM 80 °F time corn price 08:00 AM 11:00 AM time weather corn price 08:00 AM 10:00 AM 12:00 AM
  • 8. Temporal join with look-backward criteria August 28, 2017 time weather 08:00 AM 10:00 AM 12:00 AM time corn price 08:00 AM 11:00 AM time weather corn price 08:00 AM 60 °F 10:00 AM 12:00 AM time weather 08:00 AM 60 °F 10:00 AM 70 °F 12:00 AM 80 °F
  • 9. Temporal join with look-backward criteria August 28, 2017 time weather 08:00 AM 60 °F 10:00 AM 70 °F 12:00 AM 80 °F time corn price 08:00 AM 11:00 AM time weather corn price 08:00 AM 60 °F 10:00 AM 70 °F 12:00 AM
  • 10. Temporal join with look-backward criteria August 28, 2017 time weather 08:00 AM 60 °F 10:00 AM 70 °F 12:00 AM 80 °F time corn price 08:00 AM 11:00 AM time weather corn price 08:00 AM 60 °F 10:00 AM 70 °F 12:00 AM 80 °F
  • 11. time corn price 08:00 AM 11:00 AM time corn price 08:00 AM 11:00 AM time corn price 08:00 AM 11:00 AM time weather 08:00 AM 60 °F 10:00 AM 70 °F 12:00 AM 80 °F time weather 08:00 AM 60 °F 10:00 AM 70 °F 12:00 AM 80 °F time weather 08:00 AM 60 °F 10:00 AM 70 °F 12:00 AM 80 °F Temporal join with look-backward criteria August 28, 2017 time weather 08:00 AM 60 °F 10:00 AM 70 °F 12:00 AM 80 °F time corn price 08:00 AM 11:00 AM time weather corn price 08:00 AM 60 °F 10:00 AM 70 °F 12:00 AM 80 °F … … Hundreds of thousands of data sources with unaligned timestamps Thousands of market data sets We need fast and scalable distributed temporal join
  • 12. Issues with existing solutions August 28, 2017  A single time series may not fit into a single machine  Forecasting may involve hundreds of time series  Existing packages don’t support temporal join or can’t handle large time series  MatLab, R, SAS, Pandas  Even Spark based solutions fall short  PairRDDFunctions, DataFrame/Dataset, spark-ts
  • 13. Huohua – a new time series library for Spark August 28, 2017  Goal  provide a collection of functions to manipulate and analyze time series at scale  group, temporal join, summarize, aggregate …  How  build a time series aware data structure  extending RDD to TimeSeriesRDD  optimize using temporal locality  reduce shuffling  reduce memory pressure by streaming
  • 14. What is a TimeSeriesRDD in Huohua? August 28, 2017  TimeSeriesRDD extends RDD to represent time series data  associates a time range to each partition  tracks partitions’ time-ranges through operations  preserves the temporal order TimeSeriesRDD operations time series functions
  • 15. TimeSeriesRDD– an RDD representing time series August 28, 2017 time temperature 6:00 AM 60°F 6:01 AM 61°F … … 7:00 AM 70°F 7:01 AM 71°F … … 8:00 AM 80°F 8:01 AM 81°F … … (6:00 AM, 60°F) (6:01 AM, 61°F) … RDD (7:00 AM, 70°F) (7:01 AM, 71°F) … (8:00 AM, 80°F) (8:01 AM, 81°F) …
  • 16. TimeSeriesRDD– an RDD representing time series August 28, 2017 range: [06:00 AM, 07:00 AM) range: [07:00 AM, 8:00 AM) range: [8:00 AM, ∞) TimeSeriesRDDtime temperature 6:00 AM 60°F 6:01 AM 61°F … … 7:00 AM 70°F 7:01 AM 71°F … … 8:00 AM 80°F 8:01 AM 81°F … … (6:00 AM, 60°F) (6:01 AM, 61°F) … (7:00 AM, 70°F) (7:01 AM, 71°F) … (8:00 AM, 80°F) (8:01 AM, 81°F) …
  • 17. Group function August 28, 2017  A group function groups rows with exactly the same timestamps time city temperature 1:00 PM New York 70°F 1:00 PM San Francisco 60°F 2:00 PM New York 71°F 2:00 PM San Francisco 61°F 3:00 PM New York 72°F 3:00 PM San Francisco 62°F 4:00 PM New York 73°F 4:00 PM San Francisco 63°F group 1 group 2 group 3 group 4
  • 18. Group function August 28, 2017  A group function groups rows with nearby timestamps time city temperature 1:00 PM New York 70°F 1:00 PM San Francisco 60°F 2:00 PM New York 71°F 2:00 PM San Francisco 61°F 3:00 PM New York 72°F 3:00 PM San Francisco 62°F 4:00 PM New York 73°F 4:00 PM San Francisco 63°F group 1 group 2
  • 19. Group in Spark August 28, 2017  Groups rows with exactly the same timestamps RDD 1:00PM 2:00PM 2:00PM 1:00PM 3:00PM 3:00PM 4:00PM 4:00PM
  • 20.  Data is shuffled and materialized Group in Spark August 28, 2017 RDD groupBy RDD 1:00PM 2:00PM 2:00PM 1:00PM 3:00PM 3:00PM 4:00PM 4:00PM
  • 21. Group in Spark August 28, 2017  Data is shuffled and materialized RDD groupBy RDD 1:00PM 1:00PM 3:00PM 3:00PM 2:00PM 4:00PM 2:00PM 4:00PM
  • 22. Group in Spark August 28, 2017  Data is shuffled and materialized RDD groupBy RDD 1:00PM 1:00PM 2:00PM 2:00PM 3:00PM 3:00PM 4:00PM 4:00PM
  • 23. Group in Spark August 28, 2017  Temporal order is not preserved RDD groupBy RDD 1:00PM 2:00PM 2:00PM 1:00PM 3:00PM 3:00PM 4:00PM 4:00PM 1:00PM 1:00PM 2:00PM 2:00PM 3:00PM 3:00PM 4:00PM 4:00PM
  • 24. Group in Spark August 28, 2017  Another sort is required RDD groupBy sortBy RDD RDD 1:00PM 2:00PM 2:00PM 1:00PM 3:00PM 3:00PM 4:00PM 4:00PM 1:00PM 1:00PM 2:00PM 2:00PM 3:00PM 3:00PM 4:00PM 4:00PM
  • 25. Group in Spark August 28, 2017  Another sort is required RDD groupBy sortBy RDD RDD 1:00PM 2:00PM 2:00PM 1:00PM 3:00PM 3:00PM 4:00PM 4:00PM 2:00PM 2:00PM 4:00PM 4:00PM 1:00PM 1:00PM 3:00PM 3:00PM
  • 26. Group in Spark August 28, 2017  Back to correct temporal order RDD groupBy sortBy RDD RDD 1:00PM 2:00PM 2:00PM 1:00PM 3:00PM 3:00PM 4:00PM 4:00PM 1:00PM 1:00PM 2:00PM 2:00PM 3:00PM 3:00PM 4:00PM 4:00PM
  • 27. Group in Spark August 28, 2017  Back to temporal order RDD groupBy sortBy RDD RDD 1:00PM 2:00PM 2:00PM 1:00PM 3:00PM 3:00PM 4:00PM 4:00PM 1:00PM 1:00PM 2:00PM 2:00PM 3:00PM 3:00PM 4:00PM 4:00PM 1:00PM 1:00PM 2:00PM 2:00PM 3:00PM 3:00PM 4:00PM 4:00PM
  • 28. Group in Huohua August 28, 2017  Data is grouped locally as streams TimeSeriesRDD 1:00PM 2:00PM 2:00PM 1:00PM 3:00PM 3:00PM 4:00PM 4:00PM
  • 29. Group in Huohua August 28, 2017  Data is grouped locally as streams TimeSeriesRDD 1:00PM 2:00PM 2:00PM 1:00PM 3:00PM 3:00PM 4:00PM 4:00PM
  • 30. Group in Huohua August 28, 2017  Data is grouped locally as streams TimeSeriesRDD 1:00PM 2:00PM 1:00PM 3:00PM 3:00PM 4:00PM 4:00PM 2:00PM
  • 31. Group in Huohua August 28, 2017  Data is grouped locally as streams TimeSeriesRDD 1:00PM 2:00PM 1:00PM 3:00PM 3:00PM 4:00PM 4:00PM 2:00PM
  • 32. Benchmark for group August 28, 2017  Running time of count after group  16 executors (10G memory and 4 cores per executor)  data is read from HDFS 0s 20s 40s 60s 80s 100s 20M 40M 60M 80M 100M RDD DataFrame TimeseriesRDD
  • 33. Temporal join August 28, 2017  A temporal join function is defined by a matching criteria over time  A typical matching criteria has two parameters  direction – whether it should look-backward or look-forward  window - how much it should look-backward or look-forward look-backward temporal join window
  • 34. Temporal join August 28, 2017  A temporal join function is defined by a matching criteria over time  A typical matching criteria has two parameters  direction – whether it should look-backward or look-forward  window - how much it should look-backward or look-forward look-backward temporal join window
  • 35. Temporal join August 28, 2017  Temporal join with criteria look-back and window of length 1 2:00AM 1:00AM 4:00AM 5:00AM 1:00AM 3:00AM 5:00AM time series time series
  • 36. Temporal join August 28, 2017  Temporal join with criteria look-back and window of length 1  How do we do temporal join in TimeSeriesRDD? TimeSeriesRDD TimeSeriesRDD 2:00AM 1:00AM 4:00AM 5:00AM 1:00AM 3:00AM 5:00AM
  • 37. Temporal join in Huohua August 28, 2017  Temporal join with criteria look-back and window of length 1  partition time space into disjoint intervals TimeSeriesRDD TimeSeriesRDDjoined 2:00AM 1:00AM 4:00AM 5:00AM 1:00AM 3:00AM 5:00AM
  • 38. Temporal join in Huohua August 28, 2017  Temporal join with criteria look-back and window of length 1  Build dependency graph for the joined TimeSeriesRDD TimeSeriesRDD TimeSeriesRDDjoined 2:00AM 1:00AM 4:00AM 5:00AM 1:00AM 3:00AM 5:00AM
  • 39. Temporal join in Huohua August 28, 2017  Temporal join with criteria look-back and window 1  Join data as streams per partition 1:00AM 1 TimeSeriesRDD TimeSeriesRDDjoined 1:00AM 1:00AM1:00AM 2:00AM 4:00AM 5:00AM 3:00AM 5:00AM
  • 40. Temporal join in Huohua August 28, 2017  Temporal join with criteria look-back and window 1  Join data as streams 2:00AM 1:00AM 4:00AM 5:00AM 1:00AM 3:00AM 5:00AM TimeSeriesRDD TimeSeriesRDDjoined 1:00AM 1:00AM1:00AM 2:00AM
  • 41. Temporal join in Huohua August 28, 2017  Temporal join with criteria look-back and window 1  Join data as streams 2:00AM 1:00AM 4:00AM 5:00AM 1:00AM 3:00AM 5:00AM TimeSeriesRDD TimeSeriesRDDjoined 1:00AM 1:00AM 1:00AM 2:00AM 4:00AM 3:00AM
  • 42. Temporal join in Huohua August 28, 2017  Temporal join with criteria look-back and window 1  Join data as streams 2:00AM 1:00AM 4:00AM 5:00AM 1:00AM 3:00AM 5:00AM TimeSeriesRDD TimeSeriesRDDjoined 1:00AM 1:00AM 1:00AM 2:00AM 4:00AM 3:00AM 5:00AM 5:00AM
  • 43. Benchmark for temporal join August 28, 2017  Running time of count after temporal join  16 executors (10G memory and 4 cores per executor)  data is read from HDFS 0s 20s 40s 60s 80s 100s 20M 40M 60M 80M 100M RDD DataFrame TimeseriesRDD
  • 44. Functions over TimeSeriesRDD August 28, 2017  group functions such as window, intervalization etc.  temporal joins such as look-forward, look-backward etc.  summarizers such as average, variance, z-score etc. over  windows  Intervals  cycles
  • 45. Open Source August 28, 2017  Not quite yet …  https://github.com/twosigma
  • 46. Future work August 28, 2017  Dataframe / Dataset integration  Speed up  Richer APIs  Python bindings  More summarizers
  • 47. Key contributors August 28, 2017  Christopher Aycock  Jonathan Coveney  Jin Li  David Medina  David Palaitis  Ris Sawyer  Leif Walsh  Wenbo Zhao
  • 48. Thank you August 28, 2017  QA
  • 49. This document is being distributed for informational and educational purposes only and is not an offer to sell or the solicitation of an offer to buy any securities or other instruments. The information contained herein is not intended to provide, and should not be relied upon for investment advice. The views expressed herein are not necessarily the views of Two Sigma Investments, LP or any of its affiliates (collectively, “Two Sigma”). Such views reflect significant assumptions and subjective of the author(s) of the document and are subject to change without notice. The document may employ data derived from third-party sources. No representation is made as to the accuracy of such information and the use of such information in no way implies an endorsement of the source of such information or its validity. The copyrights and/or trademarks in some of the images, logos or other material used herein may be owned by entities other than Two Sigma. If so, such copyrights and/or trademarks are most likely owned by the entity that created the material and are used purely for identification and comment as fair use under international copyright and/or trademark laws. Use of such image, copyright or trademark does not imply any association with such organization (or endorsement of such organization) by Two Sigma, nor vice versa.