Weather scraper for your data warehouse

807 views

Published on

This is a fairly easy project, but worth while. I created this scraper using free API to get weather information for a data warehouse. With detailed weather information by date and zip codes at your disposal, you can tie this with location information in your database or data warehouse to do extensive querying/analytics. E.g.
• How does rain affect my sales by region
• How does humidity affect sales
• How does cloud cover affect sales
• How does weather affect tips
• How does weather affect Employee productivity

The sky (pun intended) is virtually the limit on this.

Good Luck.

Published in: Technology, News & Politics
1 Comment
0 Likes
Statistics
Notes
  • A writer at University of St. Thomas wrote an interesting piece about the impact of weather on retail sales. I have pasted the link below, as it could be especially relevant to this topic.
    http://www.stthomas.edu/news/2014/02/28/impact-weather-retail-sales/?utm_source=business-all-rec-app&utm_medium=email&utm_campaign=lyris&utm_content=2%2F28%2F2014%209%3A53%3A52%20AM
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total views
807
On SlideShare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
7
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Weather scraper for your data warehouse

  1. 1. 1 Weather Scraper Get weather Information for your data warehouse and reporting/analytical needs. Create SQL Table to Store the Weather Information: CREATE TABLE [dbo].[Weather]( [ID] [int] IDENTITY(1,1) NOT NULL, [InsertDate] [varchar](255) NULL, [ZipCode] [varchar](255) NULL, [CityID] [varchar](255) NULL, [CityName] [varchar](255) NULL, [CoordLong] [varchar](255) NULL, [CoordLat] [varchar](255) NULL, [Country] [varchar](255) NULL, [SunriseStart] [varchar](255) NULL, [SunriseSet] [varchar](255) NULL, [TemperatureAvg] [varchar](255) NULL, [TemperatureMin] [varchar](255) NULL, [TemperatureMax] [varchar](255) NULL, [TemperatureUnit] [varchar](255) NULL, [HumidityValue] [varchar](255) NULL, [HumidityUnit] [varchar](255) NULL, [PressureValue] [varchar](255) NULL, [PressureUnit] [varchar](255) NULL, [WindSpeedValue] [varchar](255) NULL, [WindSpeedName] [varchar](255) NULL, [WindDirectionValue] [varchar](255) NULL, [WindDirectionCode] [varchar](255) NULL, [WindDirectionName] [varchar](255) NULL, [CloudValue] [varchar](255) NULL, [CloudName] [varchar](255) NULL, [PrecipitationMode] [varchar](255) NULL, [WeatherNumber] [varchar](255) NULL, [WeatherValue] [varchar](255) NULL, [WeatherIcon] [varchar](255) NULL, [LastUpdateValue] [varchar](255) NULL, PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO After Table is created, we will use the http://api.openweathermap.org RESTful API to access and store the Weather Information. You can see sample weather information returned from query by accessing this link: http://api.openweathermap.org/data/2.5/weather?q=55441&mode=xml
  2. 2. 2 SSIS Package The package is very simple. 1. Get List all all the ZipCodes 2. Loop through each ZipCode and Get Current Weather Information For. Details:
  3. 3. 3 SELECT DISTINCT [ZipCode] FROM [dbo].[ZipCodes] order by ZipCode Desc
  4. 4. 4 Store Results in Variable. Loop through each Individual ZipCode in the ForEachLoop Container.
  5. 5. 5 Map the Individual Zip Codes to a ZipCode Variable.
  6. 6. 6 Pass the “loaded” ZipCode variable in the ForEachLoop container to the script task so as to pull the weather information for particular ZipCode.
  7. 7. 7 Edit the Script Task. To see the Code: This is the Key:  Build your URL using the ZipCode in order to get the result  I specify USA in the string to return only US results. There is much documentation on the openweathermap website on how to search for specific data.  http://api.openweathermap.org/API#search_city var url = @"http://api.openweathermap.org/data/2.5/weather?q="+ZipCode+",USA&mode=xml"; The two methods in my implementation are the main() method and the SaveWeatherData()
  8. 8. 8 MainMethod builds URL, makes call to API, and parses out the resulting XML. SaveWeatherData Method, is called by main method. It takes parameter values and persist them in the database table.
  9. 9. 9 Each time the Script tasked is call, the weather data for that ZipCode be returned and inserted into your table. The execution looks like this. Your Results should look like this.
  10. 10. 10 Now, you have detailed weather information with date and zip codes at your disposal. You can tie this with location information in your database or data warehouse to do extensive querying. E.g.  How does rain affect my sales by region  How does humidity affect sales  How does cloud cover affect sales  How does weather affect tips  How does weather affect Employee productivity The job can be scheduled to run hourly, daily, weekly or whatever frequency you want. The sky (pun intended) is virtually the limit on this. Good Luck. About: Fru Louis is a developer, blogger and all around technology enthusiasts. Fru is also a contributor and principal of the BIWizzard blog. He writes and stays abreast with the latest innovative ideas, news, and trends. Have a tip, comment or critic? Email him at fru.louis.gmail.com.

×