apidays LIVE LONDON - The Road to Embedded Finance, Banking and Insurance with APIs
Enriching Decision with News Data
Miguel Ballesteros, Customer Solutions Engineer at Dow Jones
2. |
● Computer Scientist + Data Scientist
● Translating business needs to technical
solutions
● Based in Barcelona, and covering EMEA
miguel.ballesteros@dowjones.com
| Who am I?
Customer Solutions Engineer
4. | News data have proved valuable to...
● Better understand the business
context
● Assess the risk of the whole
economy or even particular entities
● Monitor the changing conditions
around entities
● Anticipating economic trends
● Anticipating the potential
performance of a security
● Deliver high-speed facts that later
are confirmed with reports or
official announcements (aka
“Nowcasting”)
5. | Quality News
● Follow strict editorial guidelines
● Reduce noise and misinformation
● Use reliable sources
● Some sources focus on Economic
and Financial events
Available behind a paywall !
6. | Uncertainty boosts news consumption
● Higher need to anticipate trends and risks
● Reliable source to explain behaviours
7. | Use Cases
● Credit Risk Assessment
● Credit Risk Monitoring
● Economic Research
● Due Diligence
● Compliance
● Securities Sentiment
● Insurance Risk Assessment
● ...
9. | Consolidate Multiple Premium Sources
● Simplify subscription management
● Simplify workflows
● Unify data structure
● Single point of access
● Data enrichment with unified criteria
10. | API - Data Schema
title
snippet
body
subject_codes
company_codes
region_codes
word_count
an
byline
modification_datetime
,m11,mcroil,c1522,m14,mcat,mnasdq,ncmac,nenac,neqac,nfiac,
,eurcb,eurcb,fed,fed,jyskb,jyskb,nbkden,nbkden,rryce,rryce,
,saarab,usa,asiaz,gulfstz,meastz,namz,wasiaz,
525
...
DJDN000020190920ef9k001ey
ISIN
CUSIP
SEDOL
Market+Ticker
13. | Human-Driven Content Consumption
Disadvantages
● Not scalable in volume or speed
● Operational decisions require human
intervention
● Hard to combine with proprietary or
third-party data
18. | Low-Volume API Use Cases
● Corporate Credit Risk Assessment
● Corporate Credit Risk Monitoring
● Due Diligence
● Compliance
● CRM Integration
● Insurance Risk Assessment
● Platforms B2B4C (e.g. )
Workflow
Automation
Low-Volume
API
Aggregator DB
19. | Low-Volume API - Corp. Credit Risk Assessment
On-Demand
Data
Aggregator
● Simplifies the data collection for research
● Highlight relevant results according to
predefined criteria
● Increase Productivity
Proprietary
Data
Other Data
Providers
Low-Volume
API
Aggregator DB
20. | Low-Volume API - Corp. Credit Risk Monitoring
● Automatic process checking for specific
conditions
● Require attention only when necessary
Proprietary
Data
Other Data
Providers
Low-Volume
API
Automated
Monitor
Aggregator DB
21. | Low-Volume API - News Radar Visualisation
# c11="Corporate Strategy/Planning"
# c15="Financial Performance"
# c16="Bankruptcy"
entities = [ 'aapl', 'msft', 'tsla']
subjects = [ 'c11', 'c15', 'c16']
symbology = 'DJTicker'
nro = newsradar.get_ex(entities, subjects, symbology)
# nro contains volumes for different timeframes for
# each entity and subject
Not only article data!
23. | High-Volume API Use Cases
NLP
Aggregate
Statistics
Business
Workflow
X-aaS
ML Models
● Algorithmic Trading (No online HFT)
● Portfolio Management
● Economic Research
● Platforms B2B4C
● Asset Management
● ESG Scoring (e.g. )
● Credit Risk Assessment / Monitoring
High-Volume
API
Aggregator DB*
Extracted
News DB
24. | High-Volume API - Portfolio Management
Alpha Signals
Risk Signals
Quant
Workflow
● Fits well in automated Quant Workflows
● Updated in near real-time
● Allows Machine-Processing
High-Volume
API
Aggregator DB* Extracted
News DB
ML Models + NLP
25. | High-Volume API - 2 Services
Now
Snapshots Streams
● Train Models
● Understand past events
● Summarise facts and events
● Identify patterns
● Backtesting
Delivered as files (AVRO, CSV, JSON)
● Predict
● Monitor in near real time
● Get notifications
● Robo-advice
● Calculate signals
Delivered as messages
26. | High-Volume - Long Running Operation Pattern
Snapshot
● Submits a job instead of
receiving an immediate
response
● Require to check for the
job until completed
● Results are collected by
downloading the
generated files
where_statement = "publication_date >= '2020-09-01 00:00:00' AND
language_code IN ('en')"
snp = Snapshot(query=where_statement)
snp.submit_extraction_job()
snp.get_extraction_job_results()
while(True):
if snp.last_extraction_job.job_state == 'JOB_STATE_DONE':
break
...
else:
time.sleep(const.API_JOB_ACTIVE_WAIT_SPACING)
snp.download_extraction_files()
# Files are downloaded to a folder named as the snapshot ID
28. | More information at
This Slide Deck
http://bit.ly/hackzurichcscdata
Q&A Channel
#HackZurichCSCData
Snippets + Samples
https://github.com/dowjones/developer-platform
Developer Portal
https://developer.dowjones.com
Miguel Ballesteros
miguel.ballesteros@dowjones.com