SlideShare a Scribd company logo
1 of 22
Download to read offline
Domain Expertise and
Unstructured Data
William D. MacMillan and
Evan A. Schnidman
O P E N
D A T A
S C I E N C E
C O N F E R E N C E_
BOSTON 2015
@opendatasci
▶ Everyone seems to love collecting and mining unstructured data.
▶ How to make decisions based on it?
Big Data -> Consequential Decisions?
Tools to Find Paths
Machine Learning Structural Methods
Forest or Trees?
Data Without
Domain Expertise
Domain Expertise
Without Data
▶ Expertise allows us to impose structure on otherwise messy results.
Imposing Structure
▶ Data is not limited to numerical.
▶ Information not Data
▶ How to analyze:
-Corporate Communications?
-Central Bank Communications?
▶ Need to know things not easily
vectorized.
▶ Dimension reduction by applying
information.
Data is Everywhere
▶ Good Buzzword minus Bad Buzzword == Sentiment
Traditional Sentiment Analysis
▶ Domain expertise
allows for much
more refined analysis
▶ Not a pure data
science solution
▶ Time for experts to
embrace tech and
data science to
utilize experts!
▶ Central Bank communications are complex and important
▶ Focus today is Federal Reserve
Example: Central Banks
The Briefcase Watch
Traditional Fed Watching
Not Much has Changed
Failed Attempts
▶ Experts are biased and fail to be comprehensive
▶ Simple text analysis dictionaries don’t work for
Fed Speak and other complex language
▶ Ex. “modest” v. “moderate”
Necessary Components
▶ Must use expertise to train the system based on
whole communications
▶ Market response matters (Hawkish v. Dovish)
Experts in “Fed Speak”
Scaling Data
+ =
Enough documents
can eliminate bias
Expertise allows scaling
based on whole documents End result is whole
communications scored
in orderly fashion
Resulting Data:
▶ Comprehensive
▶ Unbiased
▶ Quantitative
▶ Fast
Many Possible Uses
▶ Eliminate post-hoc
hedging on CB policy
▶ Forecast based on
established correlations
▶ Add as a signal in
multifactor model
Qual Turned Quant
Trend matters more than value!
▶ Alpha across asset classes, not just Fixed Income
▶ Mitigates downside risk, especially with Equities.
▶ Beats Buy and Hold and Trend Following
▶ Low correlation to commonly used strategies
▶ Better performance with FOREX because both sides of currency pair trade.
Backtested Data
Graph Courtesy
of Mavenomics
▶ Method translates across wide variety of financially important texts
▶ Regulatory and shareholder documents for individual equities
▶ Other regulatory information (Dodd-Frank, FDA, EPA etc.)
Other Applications
Data + Domain Expertise
Prattle AnalyticsTradable Data From Market Chatter
QUESTIONS?
Evan A. Schnidman
eas@prattle-analytics.com
Prattle AnalyticsTradable Data From Market Chatter
EXTRA SLIDES
Evan A. Schnidman
eas@prattle-analytics.com
U.S. Federal Reserve
European Central Bank
Bank of England
Bank of Canada
Bank of Japan
Reserve Bank of Australia
Bank of Korea
Reserve Bank of India
Swedish Riksbank
List of Central Banks
Reserve Bank of New Zealand
Central Bank of Mexico
Central Bank of Brazil
Central Bank of Russia
South African Reserve Bank
Bank of Israel
Central Bank of Turkey
Central Bank of Taiwan
Swiss National Bank
Tree Maps
Tree Maps
Courtesy of
Mavenomics
Forecasting
Backtesting
Independent Backtesting Results
The following results are from a fund that independently tested the Fed Playbook data in January of 2015. This fund primarily
utilized a standard return to volatility futures trading strategy based on a common risk parity model to test the FPSI data from
January 2000 to December 2014. All transactions costs are built into the testing. Their findings indicated the following:
• The FPSI is a superior trade signal to both of the most common trading strategies, “Trend Following” and “Buy and Hold.”
EQUITIES
• Using a simple portfolio of the S&P 500, both Trend Following and Buy and Hold generate returns of roughly 27% over the testing period.
• The FPSI generates risk adjusted returns of 58%, more than double the most commonly used trading strategies.
• FPSI returns were generated with almost perfect long/short balance.
• The FPSI only has a 0.3 correlation to Trend Following and just a 0.1 correlation to Buy and Hold, so the FPSI can be used in
concert with these established strategies to generate even higher returns.
• The FPSI also proved to be a superb indicator of downside risk, even beating Trend Following.
• Optimal holding periods for an equity portfolio traded on FPSI data is 2-3 months.
FOREX
• Examining only the U.S. Dollar and Euro based on just U.S. data indicates that the FPSI outperforms existing currency trading models.
• Trend Following tends to dominate the currency trading space because over the sample period it generated a 55% return.
• Over the same period the FPSI generates over 70% returns.
• The FPSI only has a 0.17 correlation to Trend Following, so these two strategies could be used in concert to generate even higher returns.
• Optimal holding periods for a currency trade based on the FPSI data is 10-15 days.
• These returns are only taking into account Prattle Analytics’ data on the U.S. Federal Reserve, since Prattle also has data on
the European Central Bank (along with more than a dozen other central banks), this information could be used to better
understand the other side of the currency pair trade and generate even greater returns.
Prattle AnalyticsTradable Data From Market Chatter
Using Domain Expertise
To Improve Text Analysis
--Evan A. Schnidman
eas@prattle-analytics.com

More Related Content

Viewers also liked

Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016George Roth
 
The Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the DataThe Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the DataHealth Catalyst
 
Structured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWSStructured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWSAmazon Web Services
 
Lecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data WarehouseLecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data Warehousephanleson
 
Unstructured Data in BI
Unstructured Data in BIUnstructured Data in BI
Unstructured Data in BIMonaheng Diaho
 
The Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and BeyondThe Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and BeyondInside Analysis
 
Semistructured Data Seach
Semistructured Data SeachSemistructured Data Seach
Semistructured Data Seachkrisztianbalog
 
Getting Started with Unstructured Data
Getting Started with Unstructured DataGetting Started with Unstructured Data
Getting Started with Unstructured DataChristine Connors
 
Drive Insight From Unstructured Data With Endeca
Drive Insight From Unstructured Data With EndecaDrive Insight From Unstructured Data With Endeca
Drive Insight From Unstructured Data With EndecaKPI Partners
 
Structured Data Presentation
Structured Data PresentationStructured Data Presentation
Structured Data PresentationShawn Day
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarDatameer
 
Structured and Unstructured Big Data ebook
Structured and Unstructured Big Data ebookStructured and Unstructured Big Data ebook
Structured and Unstructured Big Data ebookEmcien Corporation
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataSeth Grimes
 
BIG DATA: LEVERAGING COMPETITIVE INTELLIGENCE IN RETAIL - Mandar Mutalikdesai...
BIG DATA: LEVERAGING COMPETITIVE INTELLIGENCE IN RETAIL - Mandar Mutalikdesai...BIG DATA: LEVERAGING COMPETITIVE INTELLIGENCE IN RETAIL - Mandar Mutalikdesai...
BIG DATA: LEVERAGING COMPETITIVE INTELLIGENCE IN RETAIL - Mandar Mutalikdesai...Lounge47
 

Viewers also liked (15)

Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016
 
The Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the DataThe Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the Data
 
Structured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWSStructured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWS
 
Lecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data WarehouseLecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data Warehouse
 
Unstructured Data in BI
Unstructured Data in BIUnstructured Data in BI
Unstructured Data in BI
 
The Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and BeyondThe Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and Beyond
 
Semistructured Data Seach
Semistructured Data SeachSemistructured Data Seach
Semistructured Data Seach
 
Getting Started with Unstructured Data
Getting Started with Unstructured DataGetting Started with Unstructured Data
Getting Started with Unstructured Data
 
Drive Insight From Unstructured Data With Endeca
Drive Insight From Unstructured Data With EndecaDrive Insight From Unstructured Data With Endeca
Drive Insight From Unstructured Data With Endeca
 
Structured Data Presentation
Structured Data PresentationStructured Data Presentation
Structured Data Presentation
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop Webinar
 
Structured and Unstructured Big Data ebook
Structured and Unstructured Big Data ebookStructured and Unstructured Big Data ebook
Structured and Unstructured Big Data ebook
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
 
BIG DATA: LEVERAGING COMPETITIVE INTELLIGENCE IN RETAIL - Mandar Mutalikdesai...
BIG DATA: LEVERAGING COMPETITIVE INTELLIGENCE IN RETAIL - Mandar Mutalikdesai...BIG DATA: LEVERAGING COMPETITIVE INTELLIGENCE IN RETAIL - Mandar Mutalikdesai...
BIG DATA: LEVERAGING COMPETITIVE INTELLIGENCE IN RETAIL - Mandar Mutalikdesai...
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Similar to Domain Expertise and Unstructured Data

Using Domain Expertise to Improve Text Analysis, Evan Schnidman, Founder and ...
Using Domain Expertise to Improve Text Analysis, Evan Schnidman, Founder and ...Using Domain Expertise to Improve Text Analysis, Evan Schnidman, Founder and ...
Using Domain Expertise to Improve Text Analysis, Evan Schnidman, Founder and ...Quantopian
 
Why Predictive Analytics Should Be Part of Your 2015 Strategy Final
Why Predictive Analytics Should Be Part of Your 2015 Strategy FinalWhy Predictive Analytics Should Be Part of Your 2015 Strategy Final
Why Predictive Analytics Should Be Part of Your 2015 Strategy FinalJoe Brandenburg
 
Fundamental analysis of stocks
Fundamental analysis of stocks Fundamental analysis of stocks
Fundamental analysis of stocks IFMC Institute
 
Swo afp april 2015 - treasury & fp&a driving decisions (final)
Swo afp   april 2015 - treasury & fp&a driving decisions (final)Swo afp   april 2015 - treasury & fp&a driving decisions (final)
Swo afp april 2015 - treasury & fp&a driving decisions (final)Jeff Wuest
 
How to Determine the Right Measure of Loss
How to Determine the Right Measure of LossHow to Determine the Right Measure of Loss
How to Determine the Right Measure of LossLibby Bierman
 
The increasing importance of Genomic Surveillance on August 8th 2022
The increasing importance of Genomic Surveillance on August 8th 2022The increasing importance of Genomic Surveillance on August 8th 2022
The increasing importance of Genomic Surveillance on August 8th 2022decodemai
 
Linking Strategic Planning with Operational Planning, Thomson Reuters
Linking Strategic Planning with Operational Planning, Thomson ReutersLinking Strategic Planning with Operational Planning, Thomson Reuters
Linking Strategic Planning with Operational Planning, Thomson ReutersInnovation Enterprise
 
A 'Cancel' service is much needed, June 28th 2022
A 'Cancel' service is much needed, June 28th 2022A 'Cancel' service is much needed, June 28th 2022
A 'Cancel' service is much needed, June 28th 2022decodemai
 
Building a systematic stock portfolio in only a few hours per year
Building a systematic stock portfolio in only a few hours per yearBuilding a systematic stock portfolio in only a few hours per year
Building a systematic stock portfolio in only a few hours per yearStockopedia
 
032119 -als--best practices prsn
032119 -als--best practices prsn032119 -als--best practices prsn
032119 -als--best practices prsnCarol Buckmann
 
Start With Why: Build Product Progress with a Strong Data Culture
Start With Why: Build Product Progress with a Strong Data CultureStart With Why: Build Product Progress with a Strong Data Culture
Start With Why: Build Product Progress with a Strong Data CultureAggregage
 
Start With Why: Build Product Progress with a Strong Data Culture
Start With Why: Build Product Progress with a Strong Data CultureStart With Why: Build Product Progress with a Strong Data Culture
Start With Why: Build Product Progress with a Strong Data CultureBrittanyShear
 
Format_for_appraisal_of_microfinance_ins.pdf
Format_for_appraisal_of_microfinance_ins.pdfFormat_for_appraisal_of_microfinance_ins.pdf
Format_for_appraisal_of_microfinance_ins.pdfPatrickObiga1
 
k1t-investor-pres-TS18_final 2016 enhance
k1t-investor-pres-TS18_final 2016 enhancek1t-investor-pres-TS18_final 2016 enhance
k1t-investor-pres-TS18_final 2016 enhancesimon wajcenberg
 
Etude PwC/Economist Intelligence Unit sur l'utilisation des data dans la pris...
Etude PwC/Economist Intelligence Unit sur l'utilisation des data dans la pris...Etude PwC/Economist Intelligence Unit sur l'utilisation des data dans la pris...
Etude PwC/Economist Intelligence Unit sur l'utilisation des data dans la pris...PwC France
 

Similar to Domain Expertise and Unstructured Data (20)

Using Domain Expertise to Improve Text Analysis, Evan Schnidman, Founder and ...
Using Domain Expertise to Improve Text Analysis, Evan Schnidman, Founder and ...Using Domain Expertise to Improve Text Analysis, Evan Schnidman, Founder and ...
Using Domain Expertise to Improve Text Analysis, Evan Schnidman, Founder and ...
 
Why Predictive Analytics Should Be Part of Your 2015 Strategy Final
Why Predictive Analytics Should Be Part of Your 2015 Strategy FinalWhy Predictive Analytics Should Be Part of Your 2015 Strategy Final
Why Predictive Analytics Should Be Part of Your 2015 Strategy Final
 
Fundamental analysis of stocks
Fundamental analysis of stocks Fundamental analysis of stocks
Fundamental analysis of stocks
 
Swo afp april 2015 - treasury & fp&a driving decisions (final)
Swo afp   april 2015 - treasury & fp&a driving decisions (final)Swo afp   april 2015 - treasury & fp&a driving decisions (final)
Swo afp april 2015 - treasury & fp&a driving decisions (final)
 
How to Determine the Right Measure of Loss
How to Determine the Right Measure of LossHow to Determine the Right Measure of Loss
How to Determine the Right Measure of Loss
 
The increasing importance of Genomic Surveillance on August 8th 2022
The increasing importance of Genomic Surveillance on August 8th 2022The increasing importance of Genomic Surveillance on August 8th 2022
The increasing importance of Genomic Surveillance on August 8th 2022
 
Linking Strategic Planning with Operational Planning, Thomson Reuters
Linking Strategic Planning with Operational Planning, Thomson ReutersLinking Strategic Planning with Operational Planning, Thomson Reuters
Linking Strategic Planning with Operational Planning, Thomson Reuters
 
SIP ppt - Jeremy (1) (2).pptx
SIP ppt - Jeremy (1) (2).pptxSIP ppt - Jeremy (1) (2).pptx
SIP ppt - Jeremy (1) (2).pptx
 
A 'Cancel' service is much needed, June 28th 2022
A 'Cancel' service is much needed, June 28th 2022A 'Cancel' service is much needed, June 28th 2022
A 'Cancel' service is much needed, June 28th 2022
 
Building a systematic stock portfolio in only a few hours per year
Building a systematic stock portfolio in only a few hours per yearBuilding a systematic stock portfolio in only a few hours per year
Building a systematic stock portfolio in only a few hours per year
 
032119 -als--best practices prsn
032119 -als--best practices prsn032119 -als--best practices prsn
032119 -als--best practices prsn
 
Iact Scm Assignment
Iact Scm AssignmentIact Scm Assignment
Iact Scm Assignment
 
Feb 12 2010 Sap
Feb 12 2010 SapFeb 12 2010 Sap
Feb 12 2010 Sap
 
Feb 12 2010 Sap
Feb 12 2010 SapFeb 12 2010 Sap
Feb 12 2010 Sap
 
Senate presentation june 2015
Senate presentation june 2015Senate presentation june 2015
Senate presentation june 2015
 
Start With Why: Build Product Progress with a Strong Data Culture
Start With Why: Build Product Progress with a Strong Data CultureStart With Why: Build Product Progress with a Strong Data Culture
Start With Why: Build Product Progress with a Strong Data Culture
 
Start With Why: Build Product Progress with a Strong Data Culture
Start With Why: Build Product Progress with a Strong Data CultureStart With Why: Build Product Progress with a Strong Data Culture
Start With Why: Build Product Progress with a Strong Data Culture
 
Format_for_appraisal_of_microfinance_ins.pdf
Format_for_appraisal_of_microfinance_ins.pdfFormat_for_appraisal_of_microfinance_ins.pdf
Format_for_appraisal_of_microfinance_ins.pdf
 
k1t-investor-pres-TS18_final 2016 enhance
k1t-investor-pres-TS18_final 2016 enhancek1t-investor-pres-TS18_final 2016 enhance
k1t-investor-pres-TS18_final 2016 enhance
 
Etude PwC/Economist Intelligence Unit sur l'utilisation des data dans la pris...
Etude PwC/Economist Intelligence Unit sur l'utilisation des data dans la pris...Etude PwC/Economist Intelligence Unit sur l'utilisation des data dans la pris...
Etude PwC/Economist Intelligence Unit sur l'utilisation des data dans la pris...
 

More from odsc

Understanding the Chief Data Officer
Understanding the Chief Data Officer Understanding the Chief Data Officer
Understanding the Chief Data Officer odsc
 
Machine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge DiscoveryMachine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge Discoveryodsc
 
API Driven Development
API Driven Development API Driven Development
API Driven Development odsc
 
Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata AnalysisMobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata Analysisodsc
 
Productionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground UpProductionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground Upodsc
 
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and HiveBig Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hiveodsc
 
Think Breadth, Not Depth
Think Breadth, Not DepthThink Breadth, Not Depth
Think Breadth, Not Depthodsc
 
Data Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and InformationData Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and Informationodsc
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet odsc
 
Building a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure MLBuilding a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure MLodsc
 
Beyond Names
Beyond NamesBeyond Names
Beyond Namesodsc
 
How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500odsc
 
Kaggle The Home of Data Science
Kaggle The Home of Data ScienceKaggle The Home of Data Science
Kaggle The Home of Data Scienceodsc
 
Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions odsc
 
Machine Learning with scikit-learn
Machine Learning with scikit-learnMachine Learning with scikit-learn
Machine Learning with scikit-learnodsc
 
Bridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source ToolsBridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source Toolsodsc
 
Top 10 Signs of the Textpocalypse
Top 10 Signs of the TextpocalypseTop 10 Signs of the Textpocalypse
Top 10 Signs of the Textpocalypseodsc
 
The Art of Data Science
The Art of Data Science The Art of Data Science
The Art of Data Science odsc
 
Frontiers of Open Data Science Research
Frontiers of Open Data Science ResearchFrontiers of Open Data Science Research
Frontiers of Open Data Science Researchodsc
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering odsc
 

More from odsc (20)

Understanding the Chief Data Officer
Understanding the Chief Data Officer Understanding the Chief Data Officer
Understanding the Chief Data Officer
 
Machine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge DiscoveryMachine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge Discovery
 
API Driven Development
API Driven Development API Driven Development
API Driven Development
 
Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata AnalysisMobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
 
Productionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground UpProductionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground Up
 
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and HiveBig Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
 
Think Breadth, Not Depth
Think Breadth, Not DepthThink Breadth, Not Depth
Think Breadth, Not Depth
 
Data Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and InformationData Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and Information
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet
 
Building a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure MLBuilding a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure ML
 
Beyond Names
Beyond NamesBeyond Names
Beyond Names
 
How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500
 
Kaggle The Home of Data Science
Kaggle The Home of Data ScienceKaggle The Home of Data Science
Kaggle The Home of Data Science
 
Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions
 
Machine Learning with scikit-learn
Machine Learning with scikit-learnMachine Learning with scikit-learn
Machine Learning with scikit-learn
 
Bridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source ToolsBridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source Tools
 
Top 10 Signs of the Textpocalypse
Top 10 Signs of the TextpocalypseTop 10 Signs of the Textpocalypse
Top 10 Signs of the Textpocalypse
 
The Art of Data Science
The Art of Data Science The Art of Data Science
The Art of Data Science
 
Frontiers of Open Data Science Research
Frontiers of Open Data Science ResearchFrontiers of Open Data Science Research
Frontiers of Open Data Science Research
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 

Recently uploaded

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 

Recently uploaded (20)

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 

Domain Expertise and Unstructured Data

  • 1. Domain Expertise and Unstructured Data William D. MacMillan and Evan A. Schnidman O P E N D A T A S C I E N C E C O N F E R E N C E_ BOSTON 2015 @opendatasci
  • 2. ▶ Everyone seems to love collecting and mining unstructured data. ▶ How to make decisions based on it? Big Data -> Consequential Decisions?
  • 3. Tools to Find Paths Machine Learning Structural Methods
  • 4. Forest or Trees? Data Without Domain Expertise Domain Expertise Without Data
  • 5. ▶ Expertise allows us to impose structure on otherwise messy results. Imposing Structure
  • 6. ▶ Data is not limited to numerical. ▶ Information not Data ▶ How to analyze: -Corporate Communications? -Central Bank Communications? ▶ Need to know things not easily vectorized. ▶ Dimension reduction by applying information. Data is Everywhere
  • 7. ▶ Good Buzzword minus Bad Buzzword == Sentiment Traditional Sentiment Analysis ▶ Domain expertise allows for much more refined analysis ▶ Not a pure data science solution ▶ Time for experts to embrace tech and data science to utilize experts!
  • 8. ▶ Central Bank communications are complex and important ▶ Focus today is Federal Reserve Example: Central Banks
  • 9. The Briefcase Watch Traditional Fed Watching Not Much has Changed
  • 10. Failed Attempts ▶ Experts are biased and fail to be comprehensive ▶ Simple text analysis dictionaries don’t work for Fed Speak and other complex language ▶ Ex. “modest” v. “moderate” Necessary Components ▶ Must use expertise to train the system based on whole communications ▶ Market response matters (Hawkish v. Dovish) Experts in “Fed Speak”
  • 11. Scaling Data + = Enough documents can eliminate bias Expertise allows scaling based on whole documents End result is whole communications scored in orderly fashion
  • 12. Resulting Data: ▶ Comprehensive ▶ Unbiased ▶ Quantitative ▶ Fast Many Possible Uses ▶ Eliminate post-hoc hedging on CB policy ▶ Forecast based on established correlations ▶ Add as a signal in multifactor model Qual Turned Quant Trend matters more than value!
  • 13. ▶ Alpha across asset classes, not just Fixed Income ▶ Mitigates downside risk, especially with Equities. ▶ Beats Buy and Hold and Trend Following ▶ Low correlation to commonly used strategies ▶ Better performance with FOREX because both sides of currency pair trade. Backtested Data Graph Courtesy of Mavenomics
  • 14. ▶ Method translates across wide variety of financially important texts ▶ Regulatory and shareholder documents for individual equities ▶ Other regulatory information (Dodd-Frank, FDA, EPA etc.) Other Applications
  • 15. Data + Domain Expertise
  • 16. Prattle AnalyticsTradable Data From Market Chatter QUESTIONS? Evan A. Schnidman eas@prattle-analytics.com
  • 17. Prattle AnalyticsTradable Data From Market Chatter EXTRA SLIDES Evan A. Schnidman eas@prattle-analytics.com
  • 18. U.S. Federal Reserve European Central Bank Bank of England Bank of Canada Bank of Japan Reserve Bank of Australia Bank of Korea Reserve Bank of India Swedish Riksbank List of Central Banks Reserve Bank of New Zealand Central Bank of Mexico Central Bank of Brazil Central Bank of Russia South African Reserve Bank Bank of Israel Central Bank of Turkey Central Bank of Taiwan Swiss National Bank
  • 21. Backtesting Independent Backtesting Results The following results are from a fund that independently tested the Fed Playbook data in January of 2015. This fund primarily utilized a standard return to volatility futures trading strategy based on a common risk parity model to test the FPSI data from January 2000 to December 2014. All transactions costs are built into the testing. Their findings indicated the following: • The FPSI is a superior trade signal to both of the most common trading strategies, “Trend Following” and “Buy and Hold.” EQUITIES • Using a simple portfolio of the S&P 500, both Trend Following and Buy and Hold generate returns of roughly 27% over the testing period. • The FPSI generates risk adjusted returns of 58%, more than double the most commonly used trading strategies. • FPSI returns were generated with almost perfect long/short balance. • The FPSI only has a 0.3 correlation to Trend Following and just a 0.1 correlation to Buy and Hold, so the FPSI can be used in concert with these established strategies to generate even higher returns. • The FPSI also proved to be a superb indicator of downside risk, even beating Trend Following. • Optimal holding periods for an equity portfolio traded on FPSI data is 2-3 months. FOREX • Examining only the U.S. Dollar and Euro based on just U.S. data indicates that the FPSI outperforms existing currency trading models. • Trend Following tends to dominate the currency trading space because over the sample period it generated a 55% return. • Over the same period the FPSI generates over 70% returns. • The FPSI only has a 0.17 correlation to Trend Following, so these two strategies could be used in concert to generate even higher returns. • Optimal holding periods for a currency trade based on the FPSI data is 10-15 days. • These returns are only taking into account Prattle Analytics’ data on the U.S. Federal Reserve, since Prattle also has data on the European Central Bank (along with more than a dozen other central banks), this information could be used to better understand the other side of the currency pair trade and generate even greater returns.
  • 22. Prattle AnalyticsTradable Data From Market Chatter Using Domain Expertise To Improve Text Analysis --Evan A. Schnidman eas@prattle-analytics.com