Information Classification: GENERAL
BIG DATA
(on Data you don’t have)
1
Information Classification: GENERAL
HOW DO WE DEAL WITH INFINITE DIMENSIONAL DATA…..
BY GENERALIZING TRADITIONAL MAP REDUCE
PARADIGM…….
Information Classification: GENERAL
DISCLAIMER
Information Classification: GENERAL
THERE ARE FOUR SOURCES OF DATA
4
Data I have
(traditional
“Big Data”)
Data I can model
Data I can
acquire
Data someone
else can acquire
or model
Information Classification: GENERAL
HOW WE REPRESENT THESE ITEMS
5
Pre-Calculated
Data
Formulas you
have
Services
Things People
can Share with
Me
Information Classification: GENERAL
6
Pre-Calculated
Data
Formulas
Services
Things People
can Share with
Me
MSCI Beon™
Information Classification: GENERAL
7
Jim Burns
David Clark
Information Classification: GENERAL
MSCI PLATFORM – A NEXT GENERATION LEAP
8
Big
Data
Repository
Hadoop / Cloudera etc
Slice/Dice
Traditional Big Data “Data you Have” Paradigm
Beon
New Front End
NEW Big Data Paradigm
Calculation and Data Services
On
Demand
Data
Expressions
The
Morning
Load
Virtual
fields
Dynamic
new data
Information Classification: GENERAL
COMPLEX QUESTIONS
9
Information Classification: GENERAL
WHAT IS A COMPLEX QUESTION VERSUS A SPECIFIC QUESTION?
10
Specific questions can be hard, for example:
• What happens to sea level if the temperature goes up 1.5 degrees by 2035?
• What properties are on the beach and over x meters above sea level in Marbella?
• What are the biggest real estate bargains in a portfolio.
Complex questions are combinations of specific questions.
• What should I buy if I believe that temperatures are going to raise 1.5 degrees by 2035 and I
only want property that will be at least 1 meter above sea level in 2035 but still on the beach.
Information Classification: GENERAL
HOW TO ANSWER A COMPLEX QUESTION
11
So to answer a complex question you need something that can answer this
Let Portfolio = All the houses in Marbella
safeHouses = Filter( SeaLevel >= 1.0 + seaLevelRise(1.5 c)) Portfolio
BestBargains = BargainFinder safeHouses
It does this by calling the services below for certain calculations.
Platform
Marbella
Houses
Planet
Simulator
Sea Level RaiseHouse Database
Execute the question
above, Filtering, etc..
Bargain
Finder
Information Classification: GENERAL
GENERALIZING MAP-REDUCE
UH OH – SOME MATH……
12
Information Classification: GENERAL
𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ → ℝ
Just for simplicity, lets assume we only care about real numbers (obviously, we could have tuples, strings,
dictionaries, any valid type honestly…)
Standard map reduce, Gamma is your class object/structure/thing
𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, ℂ → {ℝ, ℂ}
First things first, we need a context.
Information Classification: GENERAL
?
Yesterday Today
My Portfolio is worth $43 My Portfolio is worth $40
Result
I lost $3 
I lost $3/1.1 = € 2.72
My Portfolio is worth €
35.83
My Portfolio is worth €
36.36
I made € .53 
The reason for the error is that this is a lie. You DID NOT LOSE $3.
The answer is “I have made or lost ($40 in todays context - $43 in yesterdays context)”
Information Classification: GENERAL
Now we also toss in some services…….
𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, ℂ → {ℝ, ℂ}
Becomes
𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, 𝑆, ℂ → {ℝ, ℂ}
𝑤ℎ𝑒𝑟𝑒 𝑆 = 𝑆1, 𝑆2, … , 𝑆𝑛 𝑜𝑢𝑟 𝑠𝑒𝑟𝑣𝑖𝑐𝑒𝑠
But what our services??? This is a functional language conference soooo, we use functions
to access services.
𝑙𝑒𝑡 Ϝ = Ϝ𝑖, 𝑗 𝑎𝑙𝑙 𝑖, 𝑗 𝑤𝑖𝑡ℎ Ϝ𝑖, 𝑗: {Γ, 𝑆1,𝑆2,…., 𝑆𝑖,ℂ} → {ℝ, ℂ}
𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, Ϝ, ℂ → {ℝ, ℂ}
So new services
can leverage old
services
Information Classification: GENERAL
𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, Ϝ, ℂ → {ℝ, ℂ}
𝓏: ⊕𝑖=1…𝑚 Γ, Ϝ, ℂ →
𝑘=1…𝑛
{ℝ, ℂ}
Data You Have Data You Can
Acquire
Data You Can
Model
Obvious Extensions…
Information Classification: GENERAL
𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, Ϝ, ℂ → {ℝ, ℂ}
𝑎𝑏𝑠𝑡𝑟𝑎𝑐𝑡 𝑜𝑏𝑗𝑒𝑐𝑡 𝑠𝑝𝑎𝑐𝑒 Γ, Ϝ, ℂ , 𝑏𝑢𝑡 𝑡ℎ𝑒𝑟𝑒 𝑎𝑟𝑒 𝑜𝑡ℎ𝑒𝑟𝑠
Example:
• Γ = Customer Records
• F = purchasesOfWine(tenor)
• ℂ = Date
Example:
• Γ = CountryList +
wineSales
• F =
• weather()
• totalWineSales(tenor)
• ℂ = Date, weather
Customer Space Country Space
TRANSFORM
Information Classification: GENERAL
18
Customer Location Wine purchasesOfWine(tenor)
Bob Spain 1/1/2019 – 3btl
15/3/2019 – 2btl
Mary France 15/1/2019 – 2btl
Juan Spain 12/5/2019 –
6 btl
Edward England 13/4/2019 –
8 btl
TRANSFORM
Country Purchases totalWineSales(tenor) Weather()
Spain 11 bottles
France 2 bottles
England 8 bottles
Γ1, Ϝ1, ℂ1
Γ2, Ϝ2, ℂ2
𝓣 𝟏 Γ2, Ϝ2, ℂ2 = 𝒯1 ∘ Γ1, Ϝ1, ℂ1
Information Classification: GENERAL
𝓏: ⊕𝑖=1…𝑚 Γ, Ϝ, ℂ →
𝑘=1…𝑛
{ℝ, ℂ}
Step 1:
Step 2:
𝒯𝑘: ⊕𝑖=1…𝑚 Γ 𝑘, Ϝ 𝑘, ℂ 𝑘 →⊕𝑖=1…𝑛 Γ 𝑘 + 1, Ϝ 𝑘 + 1, ℂ 𝑘 + 1
Step 1:
Step 2:
⊕𝑖=1…𝑚 Γ 𝑘, Ϝ 𝑘, ℂ 𝑘
= 𝒯𝑘 ∘ 𝒯𝑘−1 ∘ ⋯ ∘ 𝒯1 ∘⊕𝑖=1…𝑛 Γ1, Ϝ1, ℂ1
𝓏: ⊕𝑖=1…𝑚 Γ 𝑘, Ϝ 𝑘, ℂ 𝑘 →
𝑖=1…𝑛
{ℝ, ℂ}
THE FINAL FORMULA
𝑖=1…𝑚
{ℝ, ℂ} = 𝓏 ∘ 𝒯𝑘 ∘ 𝒯𝑘−1 ∘ ⋯ ∘ 𝒯1 ∘⊕𝑖=1…𝑛 Γ1, Ϝ1, ℂ1
Information Classification: GENERAL
WEBSMACK FRAMEWORK
20
Information Classification: GENERAL
21
Information Classification: GENERAL
𝑥 = 𝓏 ∘ 𝒯𝑘 ∘ 𝒯𝑘−1 ∘ ⋯ ∘ 𝒯1: ⊕𝑖=1…𝑛 Γ1, Ϝ1, ℂ1 →
𝑖=1…𝑚
{ℝ, ℂ}
𝑡𝑟𝑎𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙 … . . 𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ → ℝ
𝑁𝐸𝑊 𝐴𝑁𝐷 𝐼𝑀𝑃𝑅𝑂𝑉𝐸𝐷 𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ → ℝ
Evaluation
Evaluation
Information Classification: GENERAL
rootVFP
|> scenario (asOf(15-05-2019))
|> Load “position” (filter(“MSCI USA – Daily”))
|> filter (instrument.ESG.WomenOnBoard = true)
THIS IS NOT AN IMPERATIVE ORDERING!!!!!!!!!!!!
Companies with Women on
Board
MSCI
IBM
Apple
|> scenario (timeseries(Date(1,1,2019),Date(15,5,2019) ) )
Companies with Women on
Board
1/1/2019 – {list of companies}
2/1/2019 – (list of companies)
3/1/2019 – (list of companies)
Information Classification: GENERAL
THIS NATURALLY LETS YOU MAKE A 5TH GENERATION FRONT END
Information Classification: GENERAL
25
Information Classification: GENERAL
HOW THE MACHINE WORKS
26
Information Classification: GENERAL
Service API layer
MSCI BEON – A NEW PARADIGM
27
Framework based on the Beon Engine
Functions Library
Process X
I’m Process X
and I can
provide x
Process Y
I’m Process Y
and I can
provide y
Process S Process T Process C
x -> ProcessX
y -> ProcessY
s -> ProcessS
t -> ProcessT
c -> ProcessC
Beon Engine
a = x + y
b = s / t
Information Classification: GENERAL
Service API layer
MSCI BEON – A NEW PARADIGM
28
Everything starts with a question …
Functions Library
Process X Process Y Process S Process T Process C
x -> ProcessX
y -> ProcessY
s -> ProcessS
t -> ProcessT
c -> ProcessC
Beon Engine
a = x + y
b = s / t
Query API
ResultSpec request
Information Classification: GENERAL
Service API layer
MSCI BEON – A NEW PARADIGM
29
The question is then expanded, compiled into byte code, and then parametrized with a context …
Functions Library
Process X Process Y Process S Process T Process C
x -> ProcessX
y -> ProcessY
s -> ProcessS
t -> ProcessT
c -> ProcessC
Beon Engine
a = x + y
b = s / t
Query API
ResultSpec request
Compiler Execution Enginea
s
w d
t
m o u
c
h
p
a
s
w d
m o
c
h
p
a
s
w d
c
a
s c
Context
Compiler
Information Classification: GENERAL
Service API layer
MSCI BEON – A NEW PARADIGM
30
Then executed against the various data services. Results are then recombined and presented back.
Functions Library
Process X Process Y Process S Process T Process C
x -> ProcessX
y -> ProcessY
s -> ProcessS
t -> ProcessT
c -> ProcessC
Beon Engine
a = x + y
b = s / t
Query API
ResultSpec request
Compiler Execution Enginea
s
w d
t
m o u
c
h
p
a
s
w d
m o
c
h
p
a
s
w d
c
a
s c
Conte
xt
Processing …

Big Data On Data You Don’t Have

  • 1.
    Information Classification: GENERAL BIGDATA (on Data you don’t have) 1
  • 2.
    Information Classification: GENERAL HOWDO WE DEAL WITH INFINITE DIMENSIONAL DATA….. BY GENERALIZING TRADITIONAL MAP REDUCE PARADIGM…….
  • 3.
  • 4.
    Information Classification: GENERAL THEREARE FOUR SOURCES OF DATA 4 Data I have (traditional “Big Data”) Data I can model Data I can acquire Data someone else can acquire or model
  • 5.
    Information Classification: GENERAL HOWWE REPRESENT THESE ITEMS 5 Pre-Calculated Data Formulas you have Services Things People can Share with Me
  • 6.
  • 7.
  • 8.
    Information Classification: GENERAL MSCIPLATFORM – A NEXT GENERATION LEAP 8 Big Data Repository Hadoop / Cloudera etc Slice/Dice Traditional Big Data “Data you Have” Paradigm Beon New Front End NEW Big Data Paradigm Calculation and Data Services On Demand Data Expressions The Morning Load Virtual fields Dynamic new data
  • 9.
  • 10.
    Information Classification: GENERAL WHATIS A COMPLEX QUESTION VERSUS A SPECIFIC QUESTION? 10 Specific questions can be hard, for example: • What happens to sea level if the temperature goes up 1.5 degrees by 2035? • What properties are on the beach and over x meters above sea level in Marbella? • What are the biggest real estate bargains in a portfolio. Complex questions are combinations of specific questions. • What should I buy if I believe that temperatures are going to raise 1.5 degrees by 2035 and I only want property that will be at least 1 meter above sea level in 2035 but still on the beach.
  • 11.
    Information Classification: GENERAL HOWTO ANSWER A COMPLEX QUESTION 11 So to answer a complex question you need something that can answer this Let Portfolio = All the houses in Marbella safeHouses = Filter( SeaLevel >= 1.0 + seaLevelRise(1.5 c)) Portfolio BestBargains = BargainFinder safeHouses It does this by calling the services below for certain calculations. Platform Marbella Houses Planet Simulator Sea Level RaiseHouse Database Execute the question above, Filtering, etc.. Bargain Finder
  • 12.
    Information Classification: GENERAL GENERALIZINGMAP-REDUCE UH OH – SOME MATH…… 12
  • 13.
    Information Classification: GENERAL 𝑚𝑎𝑝𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ → ℝ Just for simplicity, lets assume we only care about real numbers (obviously, we could have tuples, strings, dictionaries, any valid type honestly…) Standard map reduce, Gamma is your class object/structure/thing 𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, ℂ → {ℝ, ℂ} First things first, we need a context.
  • 14.
    Information Classification: GENERAL ? YesterdayToday My Portfolio is worth $43 My Portfolio is worth $40 Result I lost $3  I lost $3/1.1 = € 2.72 My Portfolio is worth € 35.83 My Portfolio is worth € 36.36 I made € .53  The reason for the error is that this is a lie. You DID NOT LOSE $3. The answer is “I have made or lost ($40 in todays context - $43 in yesterdays context)”
  • 15.
    Information Classification: GENERAL Nowwe also toss in some services……. 𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, ℂ → {ℝ, ℂ} Becomes 𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, 𝑆, ℂ → {ℝ, ℂ} 𝑤ℎ𝑒𝑟𝑒 𝑆 = 𝑆1, 𝑆2, … , 𝑆𝑛 𝑜𝑢𝑟 𝑠𝑒𝑟𝑣𝑖𝑐𝑒𝑠 But what our services??? This is a functional language conference soooo, we use functions to access services. 𝑙𝑒𝑡 Ϝ = Ϝ𝑖, 𝑗 𝑎𝑙𝑙 𝑖, 𝑗 𝑤𝑖𝑡ℎ Ϝ𝑖, 𝑗: {Γ, 𝑆1,𝑆2,…., 𝑆𝑖,ℂ} → {ℝ, ℂ} 𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, Ϝ, ℂ → {ℝ, ℂ} So new services can leverage old services
  • 16.
    Information Classification: GENERAL 𝑚𝑎𝑝𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, Ϝ, ℂ → {ℝ, ℂ} 𝓏: ⊕𝑖=1…𝑚 Γ, Ϝ, ℂ → 𝑘=1…𝑛 {ℝ, ℂ} Data You Have Data You Can Acquire Data You Can Model Obvious Extensions…
  • 17.
    Information Classification: GENERAL 𝑚𝑎𝑝𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, Ϝ, ℂ → {ℝ, ℂ} 𝑎𝑏𝑠𝑡𝑟𝑎𝑐𝑡 𝑜𝑏𝑗𝑒𝑐𝑡 𝑠𝑝𝑎𝑐𝑒 Γ, Ϝ, ℂ , 𝑏𝑢𝑡 𝑡ℎ𝑒𝑟𝑒 𝑎𝑟𝑒 𝑜𝑡ℎ𝑒𝑟𝑠 Example: • Γ = Customer Records • F = purchasesOfWine(tenor) • ℂ = Date Example: • Γ = CountryList + wineSales • F = • weather() • totalWineSales(tenor) • ℂ = Date, weather Customer Space Country Space TRANSFORM
  • 18.
    Information Classification: GENERAL 18 CustomerLocation Wine purchasesOfWine(tenor) Bob Spain 1/1/2019 – 3btl 15/3/2019 – 2btl Mary France 15/1/2019 – 2btl Juan Spain 12/5/2019 – 6 btl Edward England 13/4/2019 – 8 btl TRANSFORM Country Purchases totalWineSales(tenor) Weather() Spain 11 bottles France 2 bottles England 8 bottles Γ1, Ϝ1, ℂ1 Γ2, Ϝ2, ℂ2 𝓣 𝟏 Γ2, Ϝ2, ℂ2 = 𝒯1 ∘ Γ1, Ϝ1, ℂ1
  • 19.
    Information Classification: GENERAL 𝓏:⊕𝑖=1…𝑚 Γ, Ϝ, ℂ → 𝑘=1…𝑛 {ℝ, ℂ} Step 1: Step 2: 𝒯𝑘: ⊕𝑖=1…𝑚 Γ 𝑘, Ϝ 𝑘, ℂ 𝑘 →⊕𝑖=1…𝑛 Γ 𝑘 + 1, Ϝ 𝑘 + 1, ℂ 𝑘 + 1 Step 1: Step 2: ⊕𝑖=1…𝑚 Γ 𝑘, Ϝ 𝑘, ℂ 𝑘 = 𝒯𝑘 ∘ 𝒯𝑘−1 ∘ ⋯ ∘ 𝒯1 ∘⊕𝑖=1…𝑛 Γ1, Ϝ1, ℂ1 𝓏: ⊕𝑖=1…𝑚 Γ 𝑘, Ϝ 𝑘, ℂ 𝑘 → 𝑖=1…𝑛 {ℝ, ℂ} THE FINAL FORMULA 𝑖=1…𝑚 {ℝ, ℂ} = 𝓏 ∘ 𝒯𝑘 ∘ 𝒯𝑘−1 ∘ ⋯ ∘ 𝒯1 ∘⊕𝑖=1…𝑛 Γ1, Ϝ1, ℂ1
  • 20.
  • 21.
  • 22.
    Information Classification: GENERAL 𝑥= 𝓏 ∘ 𝒯𝑘 ∘ 𝒯𝑘−1 ∘ ⋯ ∘ 𝒯1: ⊕𝑖=1…𝑛 Γ1, Ϝ1, ℂ1 → 𝑖=1…𝑚 {ℝ, ℂ} 𝑡𝑟𝑎𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙 … . . 𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ → ℝ 𝑁𝐸𝑊 𝐴𝑁𝐷 𝐼𝑀𝑃𝑅𝑂𝑉𝐸𝐷 𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ → ℝ Evaluation Evaluation
  • 23.
    Information Classification: GENERAL rootVFP |>scenario (asOf(15-05-2019)) |> Load “position” (filter(“MSCI USA – Daily”)) |> filter (instrument.ESG.WomenOnBoard = true) THIS IS NOT AN IMPERATIVE ORDERING!!!!!!!!!!!! Companies with Women on Board MSCI IBM Apple |> scenario (timeseries(Date(1,1,2019),Date(15,5,2019) ) ) Companies with Women on Board 1/1/2019 – {list of companies} 2/1/2019 – (list of companies) 3/1/2019 – (list of companies)
  • 24.
    Information Classification: GENERAL THISNATURALLY LETS YOU MAKE A 5TH GENERATION FRONT END
  • 25.
  • 26.
  • 27.
    Information Classification: GENERAL ServiceAPI layer MSCI BEON – A NEW PARADIGM 27 Framework based on the Beon Engine Functions Library Process X I’m Process X and I can provide x Process Y I’m Process Y and I can provide y Process S Process T Process C x -> ProcessX y -> ProcessY s -> ProcessS t -> ProcessT c -> ProcessC Beon Engine a = x + y b = s / t
  • 28.
    Information Classification: GENERAL ServiceAPI layer MSCI BEON – A NEW PARADIGM 28 Everything starts with a question … Functions Library Process X Process Y Process S Process T Process C x -> ProcessX y -> ProcessY s -> ProcessS t -> ProcessT c -> ProcessC Beon Engine a = x + y b = s / t Query API ResultSpec request
  • 29.
    Information Classification: GENERAL ServiceAPI layer MSCI BEON – A NEW PARADIGM 29 The question is then expanded, compiled into byte code, and then parametrized with a context … Functions Library Process X Process Y Process S Process T Process C x -> ProcessX y -> ProcessY s -> ProcessS t -> ProcessT c -> ProcessC Beon Engine a = x + y b = s / t Query API ResultSpec request Compiler Execution Enginea s w d t m o u c h p a s w d m o c h p a s w d c a s c Context Compiler
  • 30.
    Information Classification: GENERAL ServiceAPI layer MSCI BEON – A NEW PARADIGM 30 Then executed against the various data services. Results are then recombined and presented back. Functions Library Process X Process Y Process S Process T Process C x -> ProcessX y -> ProcessY s -> ProcessS t -> ProcessT c -> ProcessC Beon Engine a = x + y b = s / t Query API ResultSpec request Compiler Execution Enginea s w d t m o u c h p a s w d m o c h p a s w d c a s c Conte xt Processing …