Weather exploratory data analysis

5. What’s Weather Prediction • Weather Forecast or Weather Prediction is to look at Modelling based on Past Observation of Weather and Predict a Likelyhood of Weather in Future. Model for Pattern Extraction Prediction Temp <x& wind speed >y – no rain Temp <x & wind <y & precipitation < z- Rain Temp Precipitation Wind speed Current Values, has it rained in Past ? Past Observation Machine Learning is What Features , Groups of them ? What Thresholds ? What Correlations ?

6. Modeling using Machine Learning Exploratory Data Analysis Pattern Mining Modelling

7. Weather Observations – look at the data • Excel Sheet

8. Weka - Distribution

9. Pearson Correlation • Model • Reflection of the data • Data correlation • Multi-collinearity

10. Supervised Learning • Infer from labeled data

11. Algorithm – Pattern Extraction • What do you expect ? • How many patterns?

12. Decision Tree - J48 • Show Rules

13. Weka - Decision Tree • Visualize Tree • Rules • Results – Interpret • TP • FP • Class Accuracy • Precision • Recall

14. Questions?

15. Data and Data Science - What patterns exist ? Numerical Features Classes What patterns exists ?

16. Weather Data Numerical/ Categorical Features Temp Date Wind Humid Pres Weath Rain Snow sunset sunris Classes /32 What patterns might exists ?

17. Excel Sheet – Human Analysis

18. Data - Special Features • Unlabeled • Outliers

19. Algorithm – Pattern Extraction • What do you expect ? • How many patterns?

20. Attribute Selection • N Dimensional Hyper Plane • Very Sparse • Curse of Dimensionality • Dimensionality Reduction

21. Attribute Selection – Feature Extraction Feature Selection search strategy relevance/ predictive power assessment method Standardization normalization discretization Signal enhancement Feature Construction/ Pre- processing

22. Feature Selection - Techniques

23. Weka - Distribution

24. Weka - Decision Tree • Visualize Tree • Rules • Results – Interpret • TP • FP • Class Accuracy • Precision • Recall

25. Data Science? • What's the Expectation ? • Patterns?

26. Augmentation • Date - Encoded

Editor's Notes

Canonical example for data analysis/ machine learning.
Observations can be at various granularities. Many ways to get the weather data. Both commercial entities and national bodies https://weatherspark.com/h/m/145212/2018/11/Historical-Weather-in-November-2018-at-San-Francisco-International-Airport-California-United-States
Quick snapshot of the observations about weather data.
Understand the domain of the problem, data characteristics What are we trying to predict – precipation?
Focus in this workshop is to arrive at a model derived from data analysis rather than physics of the atmosphere etc.
Data Set of values about a subject that describe it qualitatively or quantitatively. Features are the various components of the data. Data science Take data – understand it, process it, extract value from it…then communicate or act on the derived value Pattern recognition Auto-discovery of regularities in data. Once discovered, take action. E.g. classify data into categories. Is it humanly possible to infer all patterns in the data? This is where algorithmic techniques come in
Walkthrough the excel sheet. A look at the various variables and maybe filter and show weathertype column. Are the classes clearly seen? What to do when you see empty columns? Do we discard or do we put a equivalent value? - How do you deal with outliers?
Quick look at the data. See how the variables vary.
Need for pre-processing and cleaning the data understand the relationship between multiple variables and attributes in your dataset. If your dataset has perfectly positive or negative attributes then there is a high chance that the performance of the model will be impacted by a problem called — “Multicollinearity”. Multicollinearity happens when one predictor variable in a multiple regression model can be linearly predicted from the others with a high degree of accuracy. This can lead to skewed or misleading results.
Having seen the weather data what kind of patterns do you see? Talk of supervised vs unsupervised. Where would this problem fit? Why? Talk of classification.
Talk of decision tree - Decision trees and boosted trees algorithms are immune to multicollinearity by nature. When they decide to split, the tree will choose only one of the perfectly correlated features. However, other algorithms like Logistic Regression or Linear Regression are not immune to that problem and you should fix it before training the model. Decision tree J48 is the implementation of algorithm ID3 (Iterative Dichotomiser 3) developed by the WEKA project team. R includes this nice work into package RWeka.
Go through DT, look at rules, see the accuracy, TP etc TP – true positive – correctly predicted a positive (e.g. cat) TN – true negative – correctly predicted a negative (not a cat) FP – incorrectly predicted a positive – (dog as cat) FN – incorrectly predicted a negative (cat predicted as not cat) Precision: What proportion of positive identifications was actually correct? P = TP/ TP + FP Recall: What proportion of actual positives was identified correctly? R = TP / TP + FN
Pattern recognition Auto-discovery of regularities in data. Once discovered, take action. E.g. classify data into categories. Is it humanly possible to infer all patterns in the data? This is where algorithmic techniques come in. Data Set of values about a subject that describe it qualitatively or quantitatively. Features are the various components of the data. Data science Take data – understand it, process it, extract value from it…then communicate or act on the derived value
Techcon ML Challenge – sets the context for the talk https://hpe.sharepoint.com/sites/F5/CTO/Office/tech-con/Pages/2020-tech-con-challenge.aspx Challenge: Given relatively limited historical weather reports such as those available for The San Francisco International Airport up to a certain day, predict whether it will rain on the next day at that location. From a paper on weather modelling: “Making inferences and predictions about weather has been an omnipresent challenge throughout human history. Challenges with accurate meteorological modeling brings to the fore difficulties with reasoning about the complex dynamics of Earth's atmospheric system.”
Show and walkthrough the excel sheet. Show the various variables and maybe filter and show weathertype column. So, folks can see the classes.
What to do when you see empty columns? Do we discard or do we put a equivalent value? How do you deal with outliers?
Having seen the weather data what kind of patterns do you see? Talk of supervised vs unsupervised. Where would this problem fit? Why? Talk of classification.
How many attributes to pick? Do all of them matter? As more features, density reduces and its easier to figure out hyperplanes that separate the classes. But, could result in overfitting.
A technique for dimensionality reduction is feature extraction Pre-processing: standardization, normalization, discretization, signal enhancement, extraction of local features, etc. All 3 below are part of feature selection 2. feature subset generation (search strategy) 3. evaluation criterion defn (relevance/ predictive power) 4. evaluation criterion estimation (assessment method)
A technique for dimensionality reduction is feature extraction Pre-processing: standardization, normalization, discretization, signal enhancement, extraction of local features, etc. All 3 below are part of feature selection 2. feature subset generation (search strategy) 3. evaluation criterion defn (relevance/ predictive power) 4. evaluation criterion estimation (assessment method) https://towardsdatascience.com/why-how-and-when-to-apply-feature-selection-e9c69adfabf2
Quick look at the data
Go through DT, look at rules, see the accuracy, TP etc TP – true positive – correctly predicted a positive (e.g. cat) TF – true negative – correctly predicted a negative (not a cat) FP – incorrectly predicted a positive – (dog as cat) FN – incorrectly predicted a negative (cat predicted as not cat) Precision: What proportion of positive identifications was actually correct? P = TP/ TP + FP Recall: What proportion of actual positives was identified correctly? R = TP ?
From date, encoded season, year, month, week etc

Weather exploratory data analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Weather exploratory data analysis

Similar to Weather exploratory data analysis (20)

More from madhucharis

More from madhucharis (20)

Recently uploaded

Recently uploaded (20)

Weather exploratory data analysis

Editor's Notes