SlideShare a Scribd company logo
1 of 31
Download to read offline
Concrete Compressive
Strength
Group_8
Shanzhang Nong,
Xinpeng Li
Liang Zhang
Qi Wang
Syllabus
• Problem definition
• Analyze method
• Data Mining process
• Baseline regression
• Dimension Reduction
• Tree-base models
• Evaluation
The way to analyze the Dataset
Problem
Definition
Data Gathering
&Preparation
Model Building
&Evaluation
Knowledge
Deployment
• Data Access
• Data Sampling
• Data Transformation
• Create Model
• Dimension Reduction
• Test Model
• Evaluate Model • Model Apply
• Report
Problem definition: Attributes vs Response
Blast Furnace Slag
Cement
Coarse Aggregate
Fine Aggregate
Fly Ash
Super plasticizer
Water
Age
Concrete
Concrete Compressive Strength Data Set
• Source:
• https://archive.ics.uci.edu/ml/machine-
learning-databases/concrete/compressive/
• Observation Number: 1030
• Attributes: 8
• Numeric: 7
• Integer: 1
• Response: 1
• Concrete compressive strength
Correlation between response and predictors
Response:Concrete
Predictors:Blast Furnace Slag, Fly Ash, Water, Super plasticizer
Coarse Aggregate, Fine. Aggregate, Age
Baseline Linear Regression
Conclusion
1.This Regression is Significant
Concrete=2.637325+0.11630(Cement)+0.101271
(Slag)+0.077125(Ash)-0.198928(Water)
+0.251661(Supaerplasticizer)-0.011350(Coarse)
+0.009328(Fine)+0.114203(Age)
2. Coarse Aggregate , Fine Aggregate and
Intercept are not significant
3. 63.84% of the variation in Concrete is
explained by variation in predictors
(r2 =SSR/SST=63.84%)
Linear Regression Estimation VIF
Conclusion
VIF < 5
There are collinearity between Predictors
Linear Regression with Interaction
conclusion
1.The new regression is significant
(p value<2.2e-16)
2. The interaction term
(Fine Aggregate*Age)is significant.
(p value=0.0199)
3. Coarse Aggregate , Fine Aggregate , Age
and Intercept are not significant.
4. r2 =64.11%>63.84%(baseline)
Testing Error 10-fold Cross Validation
Regression 1(baseline)
Concrete=2.637325+0.11630(Cement)+0.101271(Slag)+0.077125(Ash)-
0.198928(Water)+0.251661(Superplasticizer)-0.011350(Coarse)
+0.009328(Fine)+0.114203(Age)
Regression2( with Interaction term)
Concrete=(1.372e+01)+(1.148e-01)Cement-(1.999e-01)Water+(9.872e-02) Slag +(7.464e-
02)Ash+(2.606e-01)Superplasticizer+(8.917e-03)Coarse-(9.024e-04)Fine+(8.386e-03
)Age+(1.420e-04)(Fine.Aggregate:Age )
Polynomial Regression (Water)
Conclusion
d=4 is a best choice
(the model has an average MSE error
value that is within 0.3-standard error from
this smallest value)
Polynomial Regression (Water)
Conclusion
1.The polynomial Regression is significant (p
value <2.2e-16)
Concrete=35.8039+140.9707(Water)+102.3895(
Water^ 2)+95.6686(Water^3) -38.8344(Water^4)
2.All predictors is significant
3. 20.09% of the variation in Concrete is
explained by variation in Water
Linear Regression with Polynomial (Water)
Conclusion
1.The polynomial Regression is significant
(p value <2.2e-16)
2. poly(Water, 4)2 , poly(Water, 4)4,
Superplasticizer , Coarse.Aggregate and
Fine.Aggregate are not significant
3. 64.7% of the variation in Concrete is
explained by variation in predictors
( r2 =64.7%>63.84%(baseline))
Subset selection
MSE 448.31
Adjust R2 0.6276
BIC -930
Cp 15
Principal component regression
MSE 121
Adjust R2 0.56
Partial Least Squares
MSE 107
Adjust R2 0.615
Ridge regression
RMSE 16.98534
SSE 148578.4
RSE 100.1642
R-Squared 0.5405718
Small λ value MSE = 119.9176
Large λ value MSE = 288.1472
Lasso
RMSE 10.41782
SSE 55893.45
RSE 37.68058
R-Squared 0.623364
MSE 108.6453Small λ value MSE=108.64
Large λ value MSE=267.32
Regression Trees
1. Use six variables
“Age”, “Cement” ,“Water”, “Slag”, “Superplasticizer”
2.Has 13 terminal nodes
3.Residual mean deviance is 71.99
Regression Trees
Tree Pruning
Determine the optimal tree size
Conclusion
Tree size of 11 is good enough
Tree Pruning (tree size= 11)
1. Use five variables
“Age”, "Superplasticizer", “Cement”, "Water”, “Slag”
2.Has 11 terminal nodes
3.Residual mean deviance is 80.17
Testing Error
10-fold Cross Validation
Regression tree1
Original tree with nodes 13
Regression tree2
Pruned tree with nodes 11
MSE 85.055
MSE 80.673
Bagging
Bootstrap 500
Samples
Use 500 Samples to
build 500 decision
Trees
Average them to
get a single low
variance model
MSE 30.78
Adjust R2 0.8928
Random Forest
For each samples,
randomly choose 3
viable for training
Use 500 Samples to
build 500 decision
Trees
Average them to
get a single low
variance model
Bootstrap 500
Samples
MSE 30.44
Adjust R2 0.894
Boosting
Bootstrap 500
Samples and build
500 decision tree
Use the 1st
decision tree as the
basic model
Repeat the last step for 498
times by using the 492 trees left
to get the boosting model
Adding a shrunken version of
the 2nd decision tree to the basic
model as our boosting model MSE 31.86
Adjust R2 0.8821
Test validation result
Bagging Boosting
Random Forest
Predictors importance
Predictors importance
Predictors importance
Evaluation
Model Names MSE Adjusted R2
Baseline Linear regression 109.5879 0.6343
Linear Regression with Interaction 107.4697 0.6399
Linear Regression with Polynomial 106.9055 0.6415
Subset selection 448.31 0.627
Principal component regression 121 0.56
Partial Least Squares 107 0.615
Ridge 112.42 0.54
Lasso 108.56 0.62
Decision Tree 85.055 0.703
Pruned Decision Tree 80.673 0.7534
Bagging 30.78 0.8928
Random Forest 30.44 0.894
Boosting 31.86 0.8821

More Related Content

Similar to Slides_Group_8

Robust and efficient nonlinear structural analysis using the central differen...
Robust and efficient nonlinear structural analysis using the central differen...Robust and efficient nonlinear structural analysis using the central differen...
Robust and efficient nonlinear structural analysis using the central differen...openseesdays
 
Kingsten - Honors Thesis Defense
Kingsten - Honors Thesis DefenseKingsten - Honors Thesis Defense
Kingsten - Honors Thesis DefenseKingsten Banh
 
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Data
Automated Generation of High-accuracy Interatomic Potentials Using Quantum DataAutomated Generation of High-accuracy Interatomic Potentials Using Quantum Data
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Dataaimsnist
 
283807282 cone-crusher-modelling-and-simulation1
283807282 cone-crusher-modelling-and-simulation1283807282 cone-crusher-modelling-and-simulation1
283807282 cone-crusher-modelling-and-simulation1Walter Orquera Ledezma
 
RO - Water Desalination Unit using parabolic trough collector
RO - Water Desalination Unit using parabolic trough collectorRO - Water Desalination Unit using parabolic trough collector
RO - Water Desalination Unit using parabolic trough collectorAhmad Khaled
 
Final Presentation
Final PresentationFinal Presentation
Final PresentationAsh Abel
 
Damian Peckett - Artificially Intelligent Crop Irrigation
Damian Peckett - Artificially Intelligent Crop Irrigation Damian Peckett - Artificially Intelligent Crop Irrigation
Damian Peckett - Artificially Intelligent Crop Irrigation damianpeckett
 
Cornell Computational Chemistry Seminar
Cornell Computational Chemistry SeminarCornell Computational Chemistry Seminar
Cornell Computational Chemistry SeminarGeorge Fitzgerald
 
2011 6 14 Home Builders of MD Environmental Site Design Introduction
2011 6 14 Home Builders of MD Environmental Site Design Introduction2011 6 14 Home Builders of MD Environmental Site Design Introduction
2011 6 14 Home Builders of MD Environmental Site Design IntroductionTheodore Scott
 
Research Presentation.pptx
Research Presentation.pptxResearch Presentation.pptx
Research Presentation.pptxcomplab3
 
Undergraduate Thesis Presentation on Ultra High Performance Concrete by Arafa...
Undergraduate Thesis Presentation on Ultra High Performance Concrete by Arafa...Undergraduate Thesis Presentation on Ultra High Performance Concrete by Arafa...
Undergraduate Thesis Presentation on Ultra High Performance Concrete by Arafa...Arafat Akash
 
Grey-box modeling: systems approach to water management
Grey-box modeling: systems approach to water managementGrey-box modeling: systems approach to water management
Grey-box modeling: systems approach to water managementMoudud Hasan
 
Adaptive Aperture Commissioning Presentation at 57th PTCOG
Adaptive Aperture Commissioning Presentation at 57th PTCOG Adaptive Aperture Commissioning Presentation at 57th PTCOG
Adaptive Aperture Commissioning Presentation at 57th PTCOG Minglei Kang
 
Promoting preventive mitigation of buildings against hurricanes
Promoting preventive mitigation of buildings against hurricanesPromoting preventive mitigation of buildings against hurricanes
Promoting preventive mitigation of buildings against hurricanesBejoy Alduse
 

Similar to Slides_Group_8 (20)

Robust and efficient nonlinear structural analysis using the central differen...
Robust and efficient nonlinear structural analysis using the central differen...Robust and efficient nonlinear structural analysis using the central differen...
Robust and efficient nonlinear structural analysis using the central differen...
 
Kingsten - Honors Thesis Defense
Kingsten - Honors Thesis DefenseKingsten - Honors Thesis Defense
Kingsten - Honors Thesis Defense
 
Srikanta Mishra
Srikanta MishraSrikanta Mishra
Srikanta Mishra
 
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Data
Automated Generation of High-accuracy Interatomic Potentials Using Quantum DataAutomated Generation of High-accuracy Interatomic Potentials Using Quantum Data
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Data
 
283807282 cone-crusher-modelling-and-simulation1
283807282 cone-crusher-modelling-and-simulation1283807282 cone-crusher-modelling-and-simulation1
283807282 cone-crusher-modelling-and-simulation1
 
RO - Water Desalination Unit using parabolic trough collector
RO - Water Desalination Unit using parabolic trough collectorRO - Water Desalination Unit using parabolic trough collector
RO - Water Desalination Unit using parabolic trough collector
 
Project1
Project1Project1
Project1
 
Suspension optimization ppt
Suspension optimization pptSuspension optimization ppt
Suspension optimization ppt
 
Poster_V7_DK
Poster_V7_DKPoster_V7_DK
Poster_V7_DK
 
Clearwater 2016
Clearwater 2016Clearwater 2016
Clearwater 2016
 
Final Presentation
Final PresentationFinal Presentation
Final Presentation
 
Damian Peckett - Artificially Intelligent Crop Irrigation
Damian Peckett - Artificially Intelligent Crop Irrigation Damian Peckett - Artificially Intelligent Crop Irrigation
Damian Peckett - Artificially Intelligent Crop Irrigation
 
Cornell Computational Chemistry Seminar
Cornell Computational Chemistry SeminarCornell Computational Chemistry Seminar
Cornell Computational Chemistry Seminar
 
2011 6 14 Home Builders of MD Environmental Site Design Introduction
2011 6 14 Home Builders of MD Environmental Site Design Introduction2011 6 14 Home Builders of MD Environmental Site Design Introduction
2011 6 14 Home Builders of MD Environmental Site Design Introduction
 
Research Presentation.pptx
Research Presentation.pptxResearch Presentation.pptx
Research Presentation.pptx
 
Undergraduate Thesis Presentation on Ultra High Performance Concrete by Arafa...
Undergraduate Thesis Presentation on Ultra High Performance Concrete by Arafa...Undergraduate Thesis Presentation on Ultra High Performance Concrete by Arafa...
Undergraduate Thesis Presentation on Ultra High Performance Concrete by Arafa...
 
Grey-box modeling: systems approach to water management
Grey-box modeling: systems approach to water managementGrey-box modeling: systems approach to water management
Grey-box modeling: systems approach to water management
 
Measurement_and_Units.pptx
Measurement_and_Units.pptxMeasurement_and_Units.pptx
Measurement_and_Units.pptx
 
Adaptive Aperture Commissioning Presentation at 57th PTCOG
Adaptive Aperture Commissioning Presentation at 57th PTCOG Adaptive Aperture Commissioning Presentation at 57th PTCOG
Adaptive Aperture Commissioning Presentation at 57th PTCOG
 
Promoting preventive mitigation of buildings against hurricanes
Promoting preventive mitigation of buildings against hurricanesPromoting preventive mitigation of buildings against hurricanes
Promoting preventive mitigation of buildings against hurricanes
 

Slides_Group_8

  • 2. Syllabus • Problem definition • Analyze method • Data Mining process • Baseline regression • Dimension Reduction • Tree-base models • Evaluation
  • 3. The way to analyze the Dataset Problem Definition Data Gathering &Preparation Model Building &Evaluation Knowledge Deployment • Data Access • Data Sampling • Data Transformation • Create Model • Dimension Reduction • Test Model • Evaluate Model • Model Apply • Report
  • 4. Problem definition: Attributes vs Response Blast Furnace Slag Cement Coarse Aggregate Fine Aggregate Fly Ash Super plasticizer Water Age Concrete
  • 5. Concrete Compressive Strength Data Set • Source: • https://archive.ics.uci.edu/ml/machine- learning-databases/concrete/compressive/ • Observation Number: 1030 • Attributes: 8 • Numeric: 7 • Integer: 1 • Response: 1 • Concrete compressive strength
  • 6. Correlation between response and predictors Response:Concrete Predictors:Blast Furnace Slag, Fly Ash, Water, Super plasticizer Coarse Aggregate, Fine. Aggregate, Age
  • 7. Baseline Linear Regression Conclusion 1.This Regression is Significant Concrete=2.637325+0.11630(Cement)+0.101271 (Slag)+0.077125(Ash)-0.198928(Water) +0.251661(Supaerplasticizer)-0.011350(Coarse) +0.009328(Fine)+0.114203(Age) 2. Coarse Aggregate , Fine Aggregate and Intercept are not significant 3. 63.84% of the variation in Concrete is explained by variation in predictors (r2 =SSR/SST=63.84%)
  • 8. Linear Regression Estimation VIF Conclusion VIF < 5 There are collinearity between Predictors
  • 9. Linear Regression with Interaction conclusion 1.The new regression is significant (p value<2.2e-16) 2. The interaction term (Fine Aggregate*Age)is significant. (p value=0.0199) 3. Coarse Aggregate , Fine Aggregate , Age and Intercept are not significant. 4. r2 =64.11%>63.84%(baseline)
  • 10. Testing Error 10-fold Cross Validation Regression 1(baseline) Concrete=2.637325+0.11630(Cement)+0.101271(Slag)+0.077125(Ash)- 0.198928(Water)+0.251661(Superplasticizer)-0.011350(Coarse) +0.009328(Fine)+0.114203(Age) Regression2( with Interaction term) Concrete=(1.372e+01)+(1.148e-01)Cement-(1.999e-01)Water+(9.872e-02) Slag +(7.464e- 02)Ash+(2.606e-01)Superplasticizer+(8.917e-03)Coarse-(9.024e-04)Fine+(8.386e-03 )Age+(1.420e-04)(Fine.Aggregate:Age )
  • 11. Polynomial Regression (Water) Conclusion d=4 is a best choice (the model has an average MSE error value that is within 0.3-standard error from this smallest value)
  • 12. Polynomial Regression (Water) Conclusion 1.The polynomial Regression is significant (p value <2.2e-16) Concrete=35.8039+140.9707(Water)+102.3895( Water^ 2)+95.6686(Water^3) -38.8344(Water^4) 2.All predictors is significant 3. 20.09% of the variation in Concrete is explained by variation in Water
  • 13. Linear Regression with Polynomial (Water) Conclusion 1.The polynomial Regression is significant (p value <2.2e-16) 2. poly(Water, 4)2 , poly(Water, 4)4, Superplasticizer , Coarse.Aggregate and Fine.Aggregate are not significant 3. 64.7% of the variation in Concrete is explained by variation in predictors ( r2 =64.7%>63.84%(baseline))
  • 14. Subset selection MSE 448.31 Adjust R2 0.6276 BIC -930 Cp 15
  • 16. Partial Least Squares MSE 107 Adjust R2 0.615
  • 17. Ridge regression RMSE 16.98534 SSE 148578.4 RSE 100.1642 R-Squared 0.5405718 Small λ value MSE = 119.9176 Large λ value MSE = 288.1472
  • 18. Lasso RMSE 10.41782 SSE 55893.45 RSE 37.68058 R-Squared 0.623364 MSE 108.6453Small λ value MSE=108.64 Large λ value MSE=267.32
  • 19. Regression Trees 1. Use six variables “Age”, “Cement” ,“Water”, “Slag”, “Superplasticizer” 2.Has 13 terminal nodes 3.Residual mean deviance is 71.99
  • 21. Tree Pruning Determine the optimal tree size Conclusion Tree size of 11 is good enough
  • 22. Tree Pruning (tree size= 11) 1. Use five variables “Age”, "Superplasticizer", “Cement”, "Water”, “Slag” 2.Has 11 terminal nodes 3.Residual mean deviance is 80.17
  • 23. Testing Error 10-fold Cross Validation Regression tree1 Original tree with nodes 13 Regression tree2 Pruned tree with nodes 11 MSE 85.055 MSE 80.673
  • 24. Bagging Bootstrap 500 Samples Use 500 Samples to build 500 decision Trees Average them to get a single low variance model MSE 30.78 Adjust R2 0.8928
  • 25. Random Forest For each samples, randomly choose 3 viable for training Use 500 Samples to build 500 decision Trees Average them to get a single low variance model Bootstrap 500 Samples MSE 30.44 Adjust R2 0.894
  • 26. Boosting Bootstrap 500 Samples and build 500 decision tree Use the 1st decision tree as the basic model Repeat the last step for 498 times by using the 492 trees left to get the boosting model Adding a shrunken version of the 2nd decision tree to the basic model as our boosting model MSE 31.86 Adjust R2 0.8821
  • 27. Test validation result Bagging Boosting Random Forest
  • 31. Evaluation Model Names MSE Adjusted R2 Baseline Linear regression 109.5879 0.6343 Linear Regression with Interaction 107.4697 0.6399 Linear Regression with Polynomial 106.9055 0.6415 Subset selection 448.31 0.627 Principal component regression 121 0.56 Partial Least Squares 107 0.615 Ridge 112.42 0.54 Lasso 108.56 0.62 Decision Tree 85.055 0.703 Pruned Decision Tree 80.673 0.7534 Bagging 30.78 0.8928 Random Forest 30.44 0.894 Boosting 31.86 0.8821