SlideShare a Scribd company logo
Submit Search
Upload
Analyzing Power of Tweets in Predicting Commodity Futures
Report
Srivatsan Ramanujam
TitleDirector of Software Engineering, ML@Salesforce Einstein at Salesforce
Follow
•
3 likes
•
1,165 views
1
of
20
Analyzing Power of Tweets in Predicting Commodity Futures
•
3 likes
•
1,165 views
Download Now
Download to read offline
Report
Data & Analytics
Extracting signals from tweets to predict commodity futures.
Read more
Srivatsan Ramanujam
TitleDirector of Software Engineering, ML@Salesforce Einstein at Salesforce
Follow
Recommended
All thingspython@pivotal
Srivatsan Ramanujam
2.6K views
•
72 slides
Python Powered Data Science at Pivotal (PyData 2013)
Srivatsan Ramanujam
8.5K views
•
30 slides
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
Srivatsan Ramanujam
8.3K views
•
32 slides
Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action
EMC
3.1K views
•
42 slides
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
Srivatsan Ramanujam
2.9K views
•
37 slides
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Srivatsan Ramanujam
2.8K views
•
52 slides
More Related Content
What's hot
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
PivotalOpenSourceHub
934 views
•
36 slides
Graph Gurus Episode 1: Enterprise Graph
TigerGraph
178 views
•
24 slides
The MADlib Analytics Library
EMC
2.2K views
•
14 slides
Open source analytics
Ajay Ohri
3.9K views
•
49 slides
Machine Learning with Hadoop
Sangchul Song
7.7K views
•
22 slides
Machine Learning and Hadoop
Josh Patterson
6.8K views
•
35 slides
What's hot
(20)
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
PivotalOpenSourceHub
•
934 views
Graph Gurus Episode 1: Enterprise Graph
TigerGraph
•
178 views
The MADlib Analytics Library
EMC
•
2.2K views
Open source analytics
Ajay Ohri
•
3.9K views
Machine Learning with Hadoop
Sangchul Song
•
7.7K views
Machine Learning and Hadoop
Josh Patterson
•
6.8K views
Graph Databases and Machine Learning | November 2018
TigerGraph
•
494 views
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
TigerGraph
•
140 views
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
MLconf
•
2.3K views
An Introduction to Apache Hadoop, Mahout and HBase
Lukas Vlcek
•
5.4K views
Plume - A Code Property Graph Extraction and Analysis Library
TigerGraph
•
35 views
Graph Gurus Episode 12: Tiger Graph v2.3 Overview
TigerGraph
•
370 views
Kaz Sato, Evangelist, Google at MLconf ATL 2016
MLconf
•
1.1K views
Apache HAWQ and Apache MADlib: Journey to Apache
PivotalOpenSourceHub
•
2.1K views
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
MLconf
•
851 views
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
TigerGraph
•
238 views
A sql implementation on the map reduce framework
eldariof
•
2.4K views
Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Gra...
TigerGraph
•
101 views
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Databricks
•
1.3K views
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Big Data Spain
•
1.5K views
Similar to Analyzing Power of Tweets in Predicting Commodity Futures
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Esther Vasiete
1.1K views
•
26 slides
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Ian Huston
1.2K views
•
23 slides
Predicting Tweet Sentiment
Lucinda Linde
107 views
•
28 slides
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
IRJET Journal
23 views
•
8 slides
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
TigerGraph
203 views
•
24 slides
Talk about Hivemall at Data Scientist Organization on 2015/09/17
Makoto Yui
2.7K views
•
69 slides
Similar to Analyzing Power of Tweets in Predicting Commodity Futures
(20)
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Esther Vasiete
•
1.1K views
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Ian Huston
•
1.2K views
Predicting Tweet Sentiment
Lucinda Linde
•
107 views
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
IRJET Journal
•
23 views
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
TigerGraph
•
203 views
Talk about Hivemall at Data Scientist Organization on 2015/09/17
Makoto Yui
•
2.7K views
ChatGPT and OpenAI.pdf
Sonal Tiwari
•
1.2K views
Db tech show - hivemall
Makoto Yui
•
7.6K views
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Greg Makowski
•
1.9K views
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
PyData
•
2K views
Greenplum Database Open Source December 2015
PivotalOpenSourceHub
•
1.4K views
Deep Learning for Recommender Systems
Nick Pentreath
•
447 views
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Scott Mitchell
•
5.4K views
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
William Markito Oliveira
•
1.4K views
Introduction To R
Spotle.ai
•
279 views
Graph Gurus Episode 29: Using Graph Algorithms for Advanced Analytics Part 3
TigerGraph
•
63 views
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
TigerGraph
•
112 views
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Databricks
•
643 views
Lambda architecture for real time big data
Trieu Nguyen
•
19.3K views
Roy "Accelerating ML/AI Based R&D through Text & Data Mining"
National Information Standards Organization (NISO)
•
429 views
Recently uploaded
“Adoption DSpace 7 and 8 Challenges and Solutions from Real Migration Experie...
4Science
11 views
•
31 slides
Implementing AI Solutions in Business
VICTOR MAESTRE RAMIREZ
6 views
•
1 slide
12312312.pptx
mohmmm1704
14 views
•
12 slides
District heating 2022 in graphs
Energiateollisuus ry - Finnish Energy Industries
32 views
•
29 slides
ERCIM NEWS October 2023 - Climate Resilient Society
mobispacessocial
48 views
•
44 slides
Brighton: Feeding the Machine... Learning.pdf
jroakes
148 views
•
62 slides
Recently uploaded
(20)
“Adoption DSpace 7 and 8 Challenges and Solutions from Real Migration Experie...
4Science
•
11 views
Implementing AI Solutions in Business
VICTOR MAESTRE RAMIREZ
•
6 views
12312312.pptx
mohmmm1704
•
14 views
District heating 2022 in graphs
Energiateollisuus ry - Finnish Energy Industries
•
32 views
ERCIM NEWS October 2023 - Climate Resilient Society
mobispacessocial
•
48 views
Brighton: Feeding the Machine... Learning.pdf
jroakes
•
148 views
La brevetabilité dans le cas des méthodes thérapeutiques et de diagnostics
LIEGE CREATIVE
•
34 views
predictive analysis 1 assignment.pdf
VishalSinghPanwar2
•
7 views
rr_01_artificial_intelligence.pptx
areebullahrana2
•
8 views
SQL-01-Basics.pptx
joeveller
•
6 views
Key-findings-Built-Environment-Well-being-and-Sustainability.pptx
StatsCommunications
•
424 views
3D Gaussian Splatting
taeseon ryu
•
31 views
Power Digital - GA4 & BigQuery - CBUS WAW - Scott Zakrajsek.pdf
Tim Wilson
•
60 views
2023 Disinformation in Society Report
Sarah Jackson
•
442 views
SQL for Business Problems.pptx
Mustafa Ahmed
•
7 views
TEM 431_ Observation Lab.pptx
lzollner
•
5 views
Modeling Climate Change.pptx
Colleen Farrelly
•
44 views
types of speech context
roxannedomingo6
•
7 views
Variable Transformation in P&C Loss Models Based on Monotonic Binning
WenSui Liu
•
18 views
Harnessing the Power of Data
KianJazayeri1
•
5 views
Analyzing Power of Tweets in Predicting Commodity Futures
1.
Analyzing the power
of Tweets in predicting Commodity Futures Mar 17, 2014 @gopivotal @being_bayesian Srivatsan Ramanujam Senior Data Scientist Pivotal © Copyright 2013 Pivotal. All rights reserved. 1
2.
Problem Definition Ÿ
Can we predict Corn, Soybean and Wheat futures based on Social Chatter on Twitter ? Ÿ The Customer: A major Agricultural Cooperative @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 2
3.
@gopivotal @being_bayesian Data
© Copyright 2013 Pivotal. All rights reserved. 3
4.
Obtaining Data Ÿ
Used to fetch 5-years of historical tweets matching any of a list of keywords of interest Tweets Table Poster Information @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 4
5.
GNIP @gopivotal @being_bayesian
Ÿ As plugged-in partners, we’ve worked with GNIP before, experience was great! Ÿ We needed historical data and GNIP’s Historical PowerTrack came in handy Ÿ Clean API, quick quotes, convenient to download results of historical jobs © Copyright 2013 Pivotal. All rights reserved. 5
6.
Grain Futures Vs.
Volume of Tweets @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 6
7.
The Platform @gopivotal
@being_bayesian © Copyright 2013 Pivotal. All rights reserved. 7
8.
Data Science Toolkit
Ÿ Appliance – Full Rack DCA with Greenplum Database Ÿ ETL – Python Ÿ Modeling – SQL – MADlib – PL/Python, PL/Java – Ark-Tweet-NLP1 with PL/Java Wrappers Ÿ Visualization – Tableau 1CMU ARK Twitter Parts-of-Speech tagger : http://www.ark.cs.cmu.edu/TweetNLP (GPL 2) @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 8
9.
Pivotal Greenplum MPP
DB @gopivotal @being_bayesian Think of it as multiple PostGreSQL servers Master Segments/Workers Rows are distributed across segments by a particular field (or randomly) © Copyright 2013 Pivotal. All rights reserved. 9
10.
PL/X : X
in {pgsql, R, Python, Java, Perl, C etc.} • Allows users to write Greenplum/ PostgreSQL functions in the R/Python/ Java, Perl, pgsql or C languages Standby Ÿ The interpreter/VM of the language ‘X’ is installed on each node of the Greenplum Database Cluster • Data Parallelism: - PL/X piggybacks on Greenplum’s MPP architecture @gopivotal @being_bayesian Master Segment Host Segment Segment … Master Host SQL Interconnect Segment Host Segment Segment Segment Host Segment Segment Segment Host Segment Segment © Copyright 2013 Pivotal. All rights reserved. 10
11.
Scalable, in-database ML
• Open Source!https://github.com/madlib/madlib • Works on Greenplum DB and PostgreSQL • Active development by Pivotal • Downloads and Docs: http://madlib.net/ @gopivotal @being_bayesian - Latest Release : 1.4 (Dec 2014) © Copyright 2013 Pivotal. All rights reserved. 11
12.
MADlib In-Database Functions
Predictive Modeling Library Generalized Linear Models • Linear Regression • Logistic Regression • Multinomial Logistic Regression • Cox Proportional Hazards • Regression • Elastic Net Regularization • Sandwich Estimators (Huber white, clustered, marginal effects) Matrix Factorization • Single Value Decomposition (SVD) • Low-Rank @gopivotal @being_bayesian Machine Learning Algorithms • Principal Component Analysis (PCA) • Association Rules (Affinity Analysis, Market Basket) • Topic Modeling (Parallel LDA) • Decision Trees • Ensemble Learners (Random Forests) • Support Vector Machines • Conditional Random Field (CRF) • Clustering (K-means) • Cross Validation Linear Systems • Sparse and Dense Solvers Descriptive Statistics Sketch-based Estimators • CountMin (Cormode- Muthukrishnan) • FM (Flajolet-Martin) • MFV (Most Frequent Values) Correlation Summary Support Modules Array Operations Sparse Vectors Random Sampling Probability Functions © Copyright 2013 Pivotal. All rights reserved. 12
13.
@gopivotal @being_bayesian The
Models © Copyright 2013 Pivotal. All rights reserved. 13
14.
The Approach •
In addition to identifying textual cues in tweets that were correlated with commodity futures, we also wanted to analyze whether tweet sentiment was correlated with commodity futures @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 14
15.
Sentiment Analysis –
Challenges Ÿ Language on Twitter doesn’t adhere to rules of grammar, syntax or spelling Ÿ We don’t have labeled data for our problem. The tweets aren’t tagged with sentiment Ÿ Semi-Supervised Sentiment Prediction can be achieved by dictionary look-ups of tokens in a Tweet, but without Context, Sentiment Prediction is futile! @gopivotal @being_bayesian “Cool” © Copyright 2013 Pivotal. All rights reserved. 15
16.
Sentiment Analysis –
Approach Ÿ Parallelized ArkTweetNLP to achieve fast parts-of-speech tagging on Tweets Ÿ Custom (patent pending) algorithm to extract contextual cues & score sentiment of tweets Semi-Supervised Sentiment Classification Phrase Extraction Break-up Tweets into tokens and tag their parts-of-speech Part-of-speech tagger1 1: Parts-of-speech Tagger : Gp-Ark-Tweet-NLP (http://vatsan.github.io/gp-ark-tweet-nlp/) @gopivotal @being_bayesian Phrasal Polarity Scoring Use learned phrasal polarities to score sentiment of new tweets Sentiment Scored Tweets © Copyright 2013 Pivotal. All rights reserved. 16
17.
Text Analytics Pipeline
with GNIP stream Tweet Stream Stored on HDFS (gpfdist) Loaded as external tables into GPDB Parallel Parsing of JSON and extraction of fields using PL/ Python @gopivotal @being_bayesian Topic Analysis through MADlib pLDA Sentiment Analysis through custom PL/Python functions D3.js © Copyright 2013 Pivotal. All rights reserved. 17
18.
Key Take-Aways There
is significant signal in Tweets in predicting commodity futures Sentiment Analysis of tweets can provide an additional signal in predicting commodity futures. Twitter sentiment was negatively correlated with commodity futures, in the sample we analyzed A blended model of Text Regression, Sentiment Analysis and Tweet Actor information gave us encouraging results and we believe that when combined with market fundamentals like weather or yield will give better models @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 18
19.
What’s in it
for me? @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 19
20.
Pivotal Open Source
Contributions http://gopivotal.com/pivotal-products/open-source-software • MADlib – In-database parallel ML - https://github.com/madlib/madlib • PyMADlib – Python Wrapper for MADlib - https://github.com/gopivotal/pymadlib • PivotalR – R wrapper for MADlib - https://github.com/madlib-internal/PivotalR • Part-of-speech tagger for Twitter via SQL - http://vatsan.github.io/gp-ark-tweet-nlp/ Questions? @being_bayesian @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 20