@MargrietGr
Margriet Groenendijk, PhD
Developer Advocate for IBM Cloud Data Services
Open Data Science Conference UK
9 October 2016, London
How To Analyse Weather Data and Twitter
Sentiment with Spark and Watson
@MargrietGr
@MargrietGr
@MargrietGr
@MargrietGr
Analyse Weather Data and Twitter Sentiment using
Spark and Watson
People love to talk about the weather on Twitter
What insights can you find when combining the data?
What is the weather sentiment related to?
@MargrietGr
Bluemix
Where to find the data?
Insights for Twitter
Weather Company Data
API services
from IBM Bluemix
https://console.ng.bluemix.net/
@MargrietGr
Bluemix
Where to store the data?
Available
from IBM Bluemix
Cloudant NoSQL DB
@MargrietGr
Where to analyse the data?
http://datascience.ibm.com
@MargrietGr
Weather Company
Data
Watson
Tone Analyser
Tweets
Weather
Sentiment
Exploring the options
Bluemix
@MargrietGr
IBM Bluemix
▪ Free trial etc
▪ lots of services etc
Free 30-day trial
“Big Blue Box” containing all IBMs services
https://console.ng.bluemix.net/
@MargrietGr
Add a service in Bluemix
Add a service
Search for weather
Account + spaces
Weather Company Data for IBM
Bluemix
@MargrietGr
Weather Company Data for IBM Bluemix
2 Credentials
3 Ready to
use the
REST APIs
Add service1
@MargrietGr
Your own weather forecast in a Python notebook
@MargrietGr
Weather Company Data API
Show json weather file
@MargrietGr
Your own weather
forecast in a
Python notebook
London
@MargrietGr
https://developer.ibm.com/clouddataservices/2016/10/06/your-own-weather-
forecast-in-a-python-notebook/
Weather in UK on Friday evening 7 October
@MargrietGr
Store weather
data in Cloudant
@MargrietGr
Python script run daily on a Bluemix VM service
https://python-cloudant.readthedocs.io
@MargrietGr
Python script run daily on a Bluemix VM service
Add crontab job to run daily
Cloudant
@MargrietGr
▪ cloudant
▪ etc
@MargrietGr
▪ geospatial index
▪ show map with 100 cities :-)
@MargrietGr
▪ geospatial index
▪ show map with 100 cities :-)
@MargrietGr
Weather Company
Data
Watson
Tone Analyser
Tweets
Weather
Sentiment
Insights for Twitter
@MargrietGr
Insights for Twitter
@MargrietGr
Insights for Twitter
Only a 100…
dashDB
@MargrietGr
Add the dashDB service in Bluemix
Add a
service
Search for
dashDB
@MargrietGr
@MargrietGr
Use an existing service
3
1
2
posted:2016-08-01,2016-10-01
followers_count:3000 friends_count: 3000
(weather OR sun OR sunny OR rain OR hail
OR storm OR rainy OR drought OR flood OR
hurricane OR tornado OR cold OR snow OR
drizzle OR cloudy OR thunder OR lightning
OR wind OR windy OR heatwave)
REST API docs:
https://new-console.ng.bluemix.net/docs/
services/Twitter/
twitter_rest_apis.html#rest_apis
Search for tweets
4 Select table
@MargrietGr
@MargrietGr
Weather Company
Data
Watson
Tone Analyser
Tweets
Weather
Sentiment
Explore the data
IBM Data Science Experience
@MargrietGr
Nested data…
@MargrietGr
@MargrietGr
Load tweets from dashDB with Spark SQL
@MargrietGr
Clean data, summarise and load into pandas DataFrame
@MargrietGr
Add weather to tweets
Weather data is nested, pyspark.sql struggles with that
There is no location data of tweets
Only 10% of all tweets available in the free plan through the
Decahose stream
Weather API only has 24 hours of data available
@MargrietGr
Weather Company
Data
Watson
Tone Analyser
Tweets
Weather
Sentiment
X
@MargrietGr
Weather Company
Data
crontab -e
0 23 * * * /path/to/file/do_something.sh
python do_something.py
Tweets
Weather
Sentiment
Watson Tone Analyser
@MargrietGr
Add sentiment - example
@MargrietGr
@MargrietGr
#Matthew
@MargrietGr
Use an existing service
3
1
2
posted:2016-08-26,2016-10-06
followers_count:1000 friends_count:1000
(matthew OR hurricane matthew OR
hurricane)
REST API docs:
https://new-console.ng.bluemix.net/docs/
services/Twitter/
twitter_rest_apis.html#rest_apis
#matthew tweets
4 Select table
@MargrietGr
@MargrietGr
@MargrietGr
@MargrietGr
Some lessons learned
APIs are great!
Can extend and build on this, as all data is in the Cloud
Weather data only available for 24 hrs, great for weather apps, but
harder to combine weather with historical tweets, need a daily script
Now ready to build a more efficient workflow that will be easily able to
handle millions of tweets
Start a more in depth analysis in the Data Science Experience
@MargrietGr
▪ analyse data!
▪ pretty plots
https://github.com/ibm-cds-labs/pixiedust
@MargrietGr
Margriet Groenendijk, PhD
Developer Advocate for IBM Cloud Data Services
https://developer.ibm.com/clouddataservices/author/mgroenen/
Thank you!
Slides will be available on
http://www.slideshare.net/MargrietGroenendijk

ODSC UK 2016: How To Analyse Weather Data and Twitter Sentiment with Spark and Watson