Jaipaul Agonus & Daniel Monteiro, FINRA Technology
SCALING VISUALIZATION
FOR BIG DATA AND ANALYTICS
IN THE CLOUD
1
Investor
Protection
Market
Integrity
2
Brokers
12
Firms
3,800 634,000
Markets/
Exchanges
Events on average
each day
37 billion
3
of storageevents per day
25+pb 100s
of nodes and
edges
of surveillance
programs
trillions100 billion
5000+
running instances
150+
applications
Up to
Session Takeaways
• How FINRA Leverages AWS Infrastructure
• Scaling Techniques for Cloud Resources
• Data Visualization Principles
• Challenges and Strategies in Big Data Visualization
• Data Visualization in Market Surveillance
4
7
Surveillance Analyst Needs
Interactive Data Access
• Drill down into data by using
hierarchical datasets models
• Export datasets to excel using
custom formatting.
Visual Analysis
• Highlight interesting events and
outliers
• Support for contextual visualization
Review and Feedback
• Workflow for reviewing datasets
• Support for adding feedback “tags”
and comments on datasets
Visual display of data plays a
fundamental role in articulating ideas
and knowledge.
The ability to “see” the data can enhance
and transform our perception of it.
Scaling data visualization is not only
about displaying more pixels, but the
right ones, in the right context.
“15000 Galaxies In One Image.”
Image credit: NASA, esa, p. oesch of the university of Geneva, and M. Montes of the
university of New South Wales.
12
13
Anaximander ( ~500 BC)
John Mansley Robinson,
An Introduction to Early Greek Philosophy,
Houghton and Mifflin, 1968.
Mercator (1570)
Atlas of Europe, British Library
Google maps (2019)
google.com
“A map does not just chart, it unlocks and formulates meaning; it forms bridges between here
and there, between disparate ideas that we did not know were previously connected.”
Reif Larsen, The Selected Works of T.S. Spivet
14
“And what is the use of a book,” thought Alice, “without pictures or conversations?”
Alice in Wonderland
Insights through visually apparent patterns and trends
Problem
• Question
• Theory
• Goal
Model
• Algorithms
• Experimentation
• Validation
Data
• Collect
• Explore
• Prepare
Results
• Decisions
• Reports
• Communication
15
Size
Y
XZ
Variable Values
A 1, 2, 3, 4
B Low, High, Medium
C 2018-03-27
2018-03-28
2018-03-29
1 2 3 4
Position
Shape
Color
Size
“It seems that perfection is attained not when there is nothing more to add,
but when there is nothing more to remove..”
Antoine de Saint Exupéry, Terre des Hommes (1939)
Visual Elements
Visual encoding is the process
of transforming data into a
visual element to be displayed
in any kind of visualization.
Data
Events are
positioned
according to
their original
time sequence.
Shape size reflects the event relative volume
Groups (Firms, Symbols, Other Classifiers) are
displayed in different colors.
Event volume
can also be
indicated by its
position.
17
“The First Rule of Data Visualization is that
”
“The Second Rule of Data Visualization is
that you stay true to the data”
“If you don't know where you want to go, then it doesn't matter which path you take.”
The Cheshire Cat, Alice in Wonderland
18
Purpose
• Theory
• Question
• Story
• Exploratory vs Explanatory
Emphasis on Data
• Events over Time
• Relationships
• Patterns
Form follows Functions
• Simplicity
• Meaning
• Context
19
We look across market data where we
can see hot spots and outliers.
Interesting events can be visualized with
additional details and context.
Exploratory
• Patterns
• Trends
• Outliers
Explanatory
• Specific Violations
• Surveillance Oriented
• Context
20
Firms
3,700
Brokers
634,000
12
Markets/
Exchanges
events per
day
100 billion
Up to
Production
&
Experimentation
Perception
&
Interaction
Volume
&
Dimensionality
Scalable
Blueprints
Filtering
Sampling
Aggregation
Data Prep
Navigation
On boarding
"The problem is not the problem.
The problem is your attitude about the problem.“
Jack Sparrow, Pirates of the Caribbean
21
22
Parallel Coordinates
Multi-dimensional
Distribution and concentration of
features
Feature selection and order
Raw and scaled values
Feature distribution “shapes”
Parallel coordinates chart
http://www.math.tau.ac.il/~aiisreal/
23
Horizon Graphs
Time series
comparison
Small multiples
Space efficient
Patterns over time
Hot Spot detection
Horizon graph
panopticon.com
24
Multi-dimensional
Network and Relationships
Added elements for visual
encoding
* Experimental in SuRF
Hive plot
hiveplot.com
25
Multiple contexts
Space efficient
Incremental display
* Experimental in SuRF
https://www.usgs.gov/media/images/gis-data-layers-visualization
Q
A
26

Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud

  • 1.
    Jaipaul Agonus &Daniel Monteiro, FINRA Technology SCALING VISUALIZATION FOR BIG DATA AND ANALYTICS IN THE CLOUD
  • 2.
  • 3.
  • 4.
    3 of storageevents perday 25+pb 100s of nodes and edges of surveillance programs trillions100 billion 5000+ running instances 150+ applications Up to
  • 5.
    Session Takeaways • HowFINRA Leverages AWS Infrastructure • Scaling Techniques for Cloud Resources • Data Visualization Principles • Challenges and Strategies in Big Data Visualization • Data Visualization in Market Surveillance 4
  • 8.
    7 Surveillance Analyst Needs InteractiveData Access • Drill down into data by using hierarchical datasets models • Export datasets to excel using custom formatting. Visual Analysis • Highlight interesting events and outliers • Support for contextual visualization Review and Feedback • Workflow for reviewing datasets • Support for adding feedback “tags” and comments on datasets
  • 13.
    Visual display ofdata plays a fundamental role in articulating ideas and knowledge. The ability to “see” the data can enhance and transform our perception of it. Scaling data visualization is not only about displaying more pixels, but the right ones, in the right context. “15000 Galaxies In One Image.” Image credit: NASA, esa, p. oesch of the university of Geneva, and M. Montes of the university of New South Wales. 12
  • 14.
    13 Anaximander ( ~500BC) John Mansley Robinson, An Introduction to Early Greek Philosophy, Houghton and Mifflin, 1968. Mercator (1570) Atlas of Europe, British Library Google maps (2019) google.com “A map does not just chart, it unlocks and formulates meaning; it forms bridges between here and there, between disparate ideas that we did not know were previously connected.” Reif Larsen, The Selected Works of T.S. Spivet
  • 15.
    14 “And what isthe use of a book,” thought Alice, “without pictures or conversations?” Alice in Wonderland Insights through visually apparent patterns and trends Problem • Question • Theory • Goal Model • Algorithms • Experimentation • Validation Data • Collect • Explore • Prepare Results • Decisions • Reports • Communication
  • 16.
    15 Size Y XZ Variable Values A 1,2, 3, 4 B Low, High, Medium C 2018-03-27 2018-03-28 2018-03-29 1 2 3 4 Position Shape Color Size “It seems that perfection is attained not when there is nothing more to add, but when there is nothing more to remove..” Antoine de Saint Exupéry, Terre des Hommes (1939) Visual Elements Visual encoding is the process of transforming data into a visual element to be displayed in any kind of visualization. Data
  • 17.
    Events are positioned according to theiroriginal time sequence. Shape size reflects the event relative volume Groups (Firms, Symbols, Other Classifiers) are displayed in different colors. Event volume can also be indicated by its position.
  • 18.
    17 “The First Ruleof Data Visualization is that ” “The Second Rule of Data Visualization is that you stay true to the data”
  • 19.
    “If you don'tknow where you want to go, then it doesn't matter which path you take.” The Cheshire Cat, Alice in Wonderland 18 Purpose • Theory • Question • Story • Exploratory vs Explanatory Emphasis on Data • Events over Time • Relationships • Patterns Form follows Functions • Simplicity • Meaning • Context
  • 20.
    19 We look acrossmarket data where we can see hot spots and outliers. Interesting events can be visualized with additional details and context. Exploratory • Patterns • Trends • Outliers Explanatory • Specific Violations • Surveillance Oriented • Context
  • 21.
    20 Firms 3,700 Brokers 634,000 12 Markets/ Exchanges events per day 100 billion Upto Production & Experimentation Perception & Interaction Volume & Dimensionality Scalable Blueprints Filtering Sampling Aggregation Data Prep Navigation On boarding "The problem is not the problem. The problem is your attitude about the problem.“ Jack Sparrow, Pirates of the Caribbean
  • 22.
  • 23.
    22 Parallel Coordinates Multi-dimensional Distribution andconcentration of features Feature selection and order Raw and scaled values Feature distribution “shapes” Parallel coordinates chart http://www.math.tau.ac.il/~aiisreal/
  • 24.
    23 Horizon Graphs Time series comparison Smallmultiples Space efficient Patterns over time Hot Spot detection Horizon graph panopticon.com
  • 25.
    24 Multi-dimensional Network and Relationships Addedelements for visual encoding * Experimental in SuRF Hive plot hiveplot.com
  • 26.
    25 Multiple contexts Space efficient Incrementaldisplay * Experimental in SuRF https://www.usgs.gov/media/images/gis-data-layers-visualization
  • 27.

Editor's Notes

  • #28 Excel charts: color schemas