Big Data in Today’s Businesses
Presenter: Salman Jaffer, CFA
March 22nd 2018
REUTERS / Salman Jaffer, CFA
2Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
Contents
• Introduction
• What is Big Data?
• Big, Open and Linked Data (BOLD)
• Application Programming Interfaces
3 Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
Introduction
4Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
Introduction
Salman Jaffer, CFA
Singapore Technology Lead, TMS
Office: (65) 6870 3563
salman.jaffer@thomsonreuters.com
Salman is a Chartered Financial Analyst and has led many financial services risk, trading and technology
implementations from concept to finish across the globe for over 15 years.
Previously, as Head of Data Science at Sentifi, Salman combined his rare skill set of strong knowledge in
technology and finance to formulate and deliver unique solutions to classical machine learning and
advanced deep learning problems. Salman holds a degree and a number of professional qualifications in
the fields of Computer Science, Machine Learning, Big Data and Finance.
He is currently the Head of TMS / BOLD in Singapore focusing on NLP Problems for clients.
In his spare time, Salman likes rock climbing, Muay Thai and running.
Domain
Expertise
Statistical
Skills
Software
Engineering Skills
Data Science + some other skills
Figure 1: What is Data Science?
5
What is Big Data?
• 2.5 million price updates per second
• More than 3,000 data experts managing TR Data Globally
• Over 12,000 software engineers, systems architects, operations
experts, information security specialists, technical support analysts
and data scientists
• Over 30 Billion triples in our Knowledge Graph
• 1 Billion people worldwide read or see Reuters news every day
• Over 30 years of expertise managing People data such as PEPs
• Training Intelligent Tagging Models since 2007
• FX Trading Community of 4,000+ institutions and 15,000+ users in
more than
120 countries.
• 2,500 journalists in 200 locations worldwide in 16 languages
• 60,000 terabytes of data in our data centers (The U.S. Library of
Congress contains 200 terabytes of data, and the total size of
Wikipedia is 3 terabytes).
• Over 50,000 developers use our APIs globally
• 850,000 photos and images are captured and published by
Reuters every year
• Thomson Reuters Regulatory Intelligence includes global
coverage of over 750 regulatory bodies
Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
6
Big, Open and Linked Data
• BOLD stands for Big, Open and Linked Data
• Big. Large sets of data. Reuters has over 60,000 TBs of data
• Open. Publicly available data such as Google, Reddit, Stanford Core NLP, Thomson Reuters NewsScope Data
• Linked. Using PermID and methodologies to link Organizations, People, Topics Events and Facts
• Data. News Feeds, Analyst Research, Global Filings, Call Transcripts
TRIT
• Open Calais
• PermID
• Contextual Tagging
• DIY
Thomson Reuters Intelligent
Tagging
Knowledge Graph
• Subject-Object-Predicate
• Draw Relationships
• Distance and Relevance
• Provide via a Graph Feed
Knowledge Graph in RDF available
via the Graph Feed
Data Fusion
• Map. Public and Private Data
• Stitch. Un/Structured Data
• Tag. Using TRIT
• Index. Speed and Scale
Graph management, integration
and analytics platform
Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
7
Application Programming Interfaces
Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
Big Data in Businesses Today :
Salman JAFFER, CFA
Technology Lead, TMS
Thomson Reuters
Big Data Landscape
2018
Big Data in Today’s Businesses
Presenter: Salman Jaffer, CFA
March 22nd 2018
REUTERS / Salman Jaffer, CFA
Contents
• Why Now for Big Data?
• Trends in Big Data
• Big Data Technologies
• Applications of Big Data Technologies
https://xkcd.com/1897/
Why Now for Big Data?
Evolution of Storage, Processing, Networks
and Bandwidth
Evolution of Mobile Hardware
Evolution of Internet of Things
Trends in Big Data
Trends in Big Data – Mobile Technology
• Accelerometer
• Gyroscope
• Linear Acceleration
• Magnetometer
• GPS
• Barometer
• Proximity Sensor
• Ambient Light Sensor
• Infrared Sensor
• Ambient Temperature
• Relative Humidity
• Fingerprint
• Microphone
• Camera
Combine sensor data from:
• Watches
• Phones
• Computers
• Home
• Transport
• Work
• Environment
• Interactions
• Internet
Internet vs. Artificial Intelligence Era
• Strategic Data Acquisition - Almost any reputable Data Science Team can get their hands on some great
computing power via Nvidia, AWS, GCP or Azure. Papers are widely published on various approaches to
develop the deepest and widest neural network but the one thing AI companies such as Google, WeChat,
Baidu and Facebook have done, is to build a moat around themselves, capturing their users data
• Many of the ways in which organizations need to approach these problems has changed, in the same way
the shift from waterfall to agile took a generation, in the same way the shift from the internet era to the
AI era will also take time
Internet Era AI Era
A/B Testing Strategic Data Acquisition
Bricks and Mortar -> eCommerce Tech Company -> AI Company
Decision making from CxOs Decisions made by Product Managers and Engineers
Internet Company AI Company
Many Databases Unified Data Lakes
Short Cycles Training Epochs
Traditional Job Descriptions New Job Descriptions
Wireframes Design Thinking
Big Data Technologies
Big Data Technologies
Big Data Technologies – Elastic Search
• Distributed search and analytics engine design for horizontal scalability
• Important Terms
• Cluster – Collection of nodes
• Node – Single server, part of a cluster
• Index – Collection of shards akin to a `database`
• Shard – Collection of documents
• Type – Category within an index akin to a `database table`
• Document – A record, JSON object
More Info:
slideshare.net/duydo/elasticsearch-for-data-
engineers
Relational Non-Relational
SQL No-SQL
SQL Server, Oracle BigTable, ElasticSearch
Enterprise Open Source
Pre-defined Sizes Elastic Scalability
Column-Row Store Document Store
Pre-defined Data Model Dynamic Mapping
ACID
• Atomicity
• Consistency
• Isolation
• Durability
Brewer’s CAP
• Consistency
• Availability
• Partition Tolerance
Applications of
Big Data Technologies
Applications of Big Data Technologies
Computer Vision
• Captcha
• OCR
• Images
• Video
Speech Recognition
• Speech to Text
• Sentiment Analysis
• Security
Natural Language
Processing
• Named Entity
Recognition
• Sentiment Analysis
• Natural Language
Generation
Robotics
• Warehouse
Logistics
• Assisted Living
• Drones
Robot and Frank, 2012
www.loseit.com
Applications of Big Data in Financial Services
• Satellite Images
• Credit Card Transaction Data
• Trade Processing
• High Frequency Trading
• Algorithmic Trading
• Investment decision making support
• Customer churn prediction
• Retail sales trends analysis and prediction
• Research Automation
• Fraud Analysis
How can I learn more about Big Data?
• Big Data Leaders
• Andrew Ng
• Fei Fei Li
• Yann LeCun
• Richard Soucher
• Demis Hassabis
• Research Papers, Blogs and Videos
• CB Insights.com
• arxiv.org - Recent trends in Deep Learning Based NLP
• NLP News
• Online Courses and Competitions
• Coursera
• Stanford Online
• Kaggle
Question
I want to learn more about…
– Jurgen Schmidhuber
– Alex Pentland
– Corinna Cortes
– Daphne Koller
– Hilary Mason
– Doug Cutting
– Kirk Borne
– Gilberto Titericz Jr.
– Stanislav Semenov
– Monica Rogati
– Heroes of Deep Learning - YouTube
– Thomson Reuters – Harvard Business Publishing
– Stanford Unversity School of Engineering
• Tools
– Jupyter Notebook
– Google Cloud Platform
– Amazon SageMaker
– Chris Manning
– Yoshua Bengio
– Geoff Hinton
– David Blei
– Nando de Freitas
– Andrej Karpathy
– Ian Goodfellow
– Ilya Sutskever
– Daniela Rus
– Yoav Goldberg
– Data Camp
– HortonWorks
– CodeCademy
THANK YOU

Big Data Landscape 2018

  • 1.
    Big Data inToday’s Businesses Presenter: Salman Jaffer, CFA March 22nd 2018 REUTERS / Salman Jaffer, CFA
  • 2.
    2Big Data inToday’s Businesses – Salman Jaffer, CFA. Thomson Reuters Contents • Introduction • What is Big Data? • Big, Open and Linked Data (BOLD) • Application Programming Interfaces
  • 3.
    3 Big Datain Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters Introduction
  • 4.
    4Big Data inToday’s Businesses – Salman Jaffer, CFA. Thomson Reuters Introduction Salman Jaffer, CFA Singapore Technology Lead, TMS Office: (65) 6870 3563 salman.jaffer@thomsonreuters.com Salman is a Chartered Financial Analyst and has led many financial services risk, trading and technology implementations from concept to finish across the globe for over 15 years. Previously, as Head of Data Science at Sentifi, Salman combined his rare skill set of strong knowledge in technology and finance to formulate and deliver unique solutions to classical machine learning and advanced deep learning problems. Salman holds a degree and a number of professional qualifications in the fields of Computer Science, Machine Learning, Big Data and Finance. He is currently the Head of TMS / BOLD in Singapore focusing on NLP Problems for clients. In his spare time, Salman likes rock climbing, Muay Thai and running. Domain Expertise Statistical Skills Software Engineering Skills Data Science + some other skills Figure 1: What is Data Science?
  • 5.
    5 What is BigData? • 2.5 million price updates per second • More than 3,000 data experts managing TR Data Globally • Over 12,000 software engineers, systems architects, operations experts, information security specialists, technical support analysts and data scientists • Over 30 Billion triples in our Knowledge Graph • 1 Billion people worldwide read or see Reuters news every day • Over 30 years of expertise managing People data such as PEPs • Training Intelligent Tagging Models since 2007 • FX Trading Community of 4,000+ institutions and 15,000+ users in more than 120 countries. • 2,500 journalists in 200 locations worldwide in 16 languages • 60,000 terabytes of data in our data centers (The U.S. Library of Congress contains 200 terabytes of data, and the total size of Wikipedia is 3 terabytes). • Over 50,000 developers use our APIs globally • 850,000 photos and images are captured and published by Reuters every year • Thomson Reuters Regulatory Intelligence includes global coverage of over 750 regulatory bodies Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
  • 6.
    6 Big, Open andLinked Data • BOLD stands for Big, Open and Linked Data • Big. Large sets of data. Reuters has over 60,000 TBs of data • Open. Publicly available data such as Google, Reddit, Stanford Core NLP, Thomson Reuters NewsScope Data • Linked. Using PermID and methodologies to link Organizations, People, Topics Events and Facts • Data. News Feeds, Analyst Research, Global Filings, Call Transcripts TRIT • Open Calais • PermID • Contextual Tagging • DIY Thomson Reuters Intelligent Tagging Knowledge Graph • Subject-Object-Predicate • Draw Relationships • Distance and Relevance • Provide via a Graph Feed Knowledge Graph in RDF available via the Graph Feed Data Fusion • Map. Public and Private Data • Stitch. Un/Structured Data • Tag. Using TRIT • Index. Speed and Scale Graph management, integration and analytics platform Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
  • 7.
    7 Application Programming Interfaces BigData in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
  • 8.
    Big Data inBusinesses Today : Salman JAFFER, CFA Technology Lead, TMS Thomson Reuters Big Data Landscape 2018
  • 9.
    Big Data inToday’s Businesses Presenter: Salman Jaffer, CFA March 22nd 2018 REUTERS / Salman Jaffer, CFA
  • 10.
    Contents • Why Nowfor Big Data? • Trends in Big Data • Big Data Technologies • Applications of Big Data Technologies https://xkcd.com/1897/
  • 11.
    Why Now forBig Data?
  • 12.
    Evolution of Storage,Processing, Networks and Bandwidth
  • 13.
  • 14.
  • 15.
  • 16.
    Trends in BigData – Mobile Technology • Accelerometer • Gyroscope • Linear Acceleration • Magnetometer • GPS • Barometer • Proximity Sensor • Ambient Light Sensor • Infrared Sensor • Ambient Temperature • Relative Humidity • Fingerprint • Microphone • Camera Combine sensor data from: • Watches • Phones • Computers • Home • Transport • Work • Environment • Interactions • Internet
  • 17.
    Internet vs. ArtificialIntelligence Era • Strategic Data Acquisition - Almost any reputable Data Science Team can get their hands on some great computing power via Nvidia, AWS, GCP or Azure. Papers are widely published on various approaches to develop the deepest and widest neural network but the one thing AI companies such as Google, WeChat, Baidu and Facebook have done, is to build a moat around themselves, capturing their users data • Many of the ways in which organizations need to approach these problems has changed, in the same way the shift from waterfall to agile took a generation, in the same way the shift from the internet era to the AI era will also take time Internet Era AI Era A/B Testing Strategic Data Acquisition Bricks and Mortar -> eCommerce Tech Company -> AI Company Decision making from CxOs Decisions made by Product Managers and Engineers Internet Company AI Company Many Databases Unified Data Lakes Short Cycles Training Epochs Traditional Job Descriptions New Job Descriptions Wireframes Design Thinking
  • 18.
  • 19.
  • 20.
    Big Data Technologies– Elastic Search • Distributed search and analytics engine design for horizontal scalability • Important Terms • Cluster – Collection of nodes • Node – Single server, part of a cluster • Index – Collection of shards akin to a `database` • Shard – Collection of documents • Type – Category within an index akin to a `database table` • Document – A record, JSON object More Info: slideshare.net/duydo/elasticsearch-for-data- engineers Relational Non-Relational SQL No-SQL SQL Server, Oracle BigTable, ElasticSearch Enterprise Open Source Pre-defined Sizes Elastic Scalability Column-Row Store Document Store Pre-defined Data Model Dynamic Mapping ACID • Atomicity • Consistency • Isolation • Durability Brewer’s CAP • Consistency • Availability • Partition Tolerance
  • 21.
  • 22.
    Applications of BigData Technologies Computer Vision • Captcha • OCR • Images • Video Speech Recognition • Speech to Text • Sentiment Analysis • Security Natural Language Processing • Named Entity Recognition • Sentiment Analysis • Natural Language Generation Robotics • Warehouse Logistics • Assisted Living • Drones Robot and Frank, 2012 www.loseit.com
  • 23.
    Applications of BigData in Financial Services • Satellite Images • Credit Card Transaction Data • Trade Processing • High Frequency Trading • Algorithmic Trading • Investment decision making support • Customer churn prediction • Retail sales trends analysis and prediction • Research Automation • Fraud Analysis
  • 24.
    How can Ilearn more about Big Data? • Big Data Leaders • Andrew Ng • Fei Fei Li • Yann LeCun • Richard Soucher • Demis Hassabis • Research Papers, Blogs and Videos • CB Insights.com • arxiv.org - Recent trends in Deep Learning Based NLP • NLP News • Online Courses and Competitions • Coursera • Stanford Online • Kaggle Question I want to learn more about… – Jurgen Schmidhuber – Alex Pentland – Corinna Cortes – Daphne Koller – Hilary Mason – Doug Cutting – Kirk Borne – Gilberto Titericz Jr. – Stanislav Semenov – Monica Rogati – Heroes of Deep Learning - YouTube – Thomson Reuters – Harvard Business Publishing – Stanford Unversity School of Engineering • Tools – Jupyter Notebook – Google Cloud Platform – Amazon SageMaker – Chris Manning – Yoshua Bengio – Geoff Hinton – David Blei – Nando de Freitas – Andrej Karpathy – Ian Goodfellow – Ilya Sutskever – Daniela Rus – Yoav Goldberg – Data Camp – HortonWorks – CodeCademy
  • 25.