Artificial Neural Networks for Conceptual Design of Concrete Mix
1. CONCRETE MIX DESIGN USING NEURAL NETWORKS
A PROJECT REPORT SUBMITTED IN PARTIAL FULFILMENT OF THE
REQUIREMENT FOR THE DEGREE OF
BACHELOR OF TECHNOLOGY
IN
CIVIL ENGINEERING
By:
A BALA MURALI
Reg. No.: 139104256
DEPARTMENT OF CIVIL ENGINEERING
MANIPAL UNIVERISTY JAIPUR
JAIPUR-303007, RAJASTHAN, INDIA
MAY/2017
2. Declaration
This is to certify that I, Mr. A BALA MURALI (Reg. No. 139104256) have been undergoing
my internship for final year project entitled “CONCRETE MIX DESIGN USING NEURAL
NETWORKS” in “Manipal University Jaipur” in partial fulfillment for the award of the degree
of Bachelor of Technology in Civil Engineering, Manipal University Jaipur. This report is an
authentic record of my work carried out during a period from 4/1/17.to 10/5/17 under the
supervision of Dr. Gaurav Sancheti The matter presented in this report has not been submitted by
me for award of any other degree.
A BALA MURALI
Regd. No.: 139104256
Civil Engineering, MUJ
abalamurali1996@gmail.com, 7597297663
……….(Signature)……..
……………(Signature)…………
(External Supervisor)
Dr. Gourav Sancheti
Associate Professor
Civil Engg.
gaurav.sancheti@jaipur.manipal.edu, +91 9694727780
……………(Signature)…………
3. ACKNOWLEDGMENTS
I express my deep sense of gratitude to Prof. Anil Datt Vyas, Head of the Department,
Department of Civil Engineering, MUJ for providing all the facilities to make this report a
success. And i am also extremely grateful to Dr. Gaurav Sancheti, Associate Professor, MUJ for
providing me with the necessary guidance and support at every stage of the project
A BALA MURALI
Reg. no.: 139104256
Civil Engineering Department
4. ABSTRACT
The basic ingredients in a concrete mix are cement, water, fine aggregated and coarse aggregates
in appropriate proportions. The calculations of these proportions of materials is the tricky part as
this varies due to various conditions like, the grade of concrete required, water-cement ratio,
workability that needs to be achieved, exposure conditions, sand zone and many other factors.
There are many methods proposed for the estimation of concrete mix proportions and for this
project, IS (Indian Standard) method has been used to compute the data. In this project, an attempt
has been made successfully to see if Neural Networks can be used for the prediction of concrete
mix design.
5. CONTENTS
1. INTRODUCTION 1
1.1 Introduction 1
1.2 Applications of ANN in Civil Engineering 2
2. CONCRETE MIX DESIGN 3
2.1 Introduction 3
2.2 Various Methods of Proportioning 3
2.3 Requirements for concrete mix design 4
2.4 Concrete Mix design as per IS 10262-2009 7
3. ARTIFICIAL NEURAL NETWORK 13
3.1 Introduction 13
3.2 Neural network Applications 14
3.3 Network design steps 14
3.4 Concept behind ANN 14
4. DEVELOPMENT OF THE NETWORK 18
4.1 Introduction 18
4.2 Input Data 18
4.3 Target Data 19
4.4 Data Pre-processing 21
4.5 Data Normalization 30
4.6 Creating Neural Network 21
4.7 Activation/Transfer Function 22
4.8 Hidden Layers 24
4.9 Hidden Layer Neurons 25
4.10 Output Measurement 25
4.11 Training Functions 25
4.12 Validation 28
4.13 Testing the Network 29
5. NETWORK RESULTS AND ANALYSIS 30
5.1 Introduction 30
5.2 Network and its Parameters 30
5.3 Training Results 32
5.4 Testing the network 42
5.5 Comparing Training and Testing data 43
5.6 Final testing 49
6. 5.7 Conclusion 50
6. CONCLUSION 51
7. REFERANCES 52
8. ANNEXURE I 53
9. ANNEXURE II 54
7. LIST OF TABLES
Table 2.1 Standard Deviation 8
Table 2.2 Maximum Water Content 10
Table 2.3 Water Content adjustments 10
Table 2.4 Cement content 11
Table 2.5 Ratio of volume of coarse aggregate 11
Table 4.1 Transfer Functions 24
Table 4.2 Training Functions 26
Table 5.1 Input Data 30
Table 5.2 List of Networks 31
Table 5.3 Training results of cement at 1000 epoches 34
Table 5.4 Training results of water at 1000 epoches 36
Table 5.5 Training results of Fine agg. at 1000 epoches 39
Table 5.6 Training results of coarse agg. 1000 epoches 41
Table 5.7 Testing MSE 42
Table 5.8 Comparing Training and Testing MSE for cement 43
Table 5.9 Comparing Training and Testing MSE for Water 44
Table 5.10 Comparing Training and Testing MSE for FA 46
Table 5.11 Comparing Training and Testing MSE for CA 47
Table 5.12 Final testing for Cement 49
Table 5.13 Final testing for Water 49
Table 5.14 Final testing for Fine Agg. 49
Table 5.15 Final testing for Coarse Agg. 49
8. LIST OF FIGURES
Fig 2.1 Characteristic Strength of Concrete 5
Fig 2.2 Workability of Concrete – Slump Test 5
Fig 2.3 Selection of Water-Cement Ratio for Concrete Mix Design 9
Fig 2.4 Concrete Compressive Strength vs. Water Cement Ratio 9
Fig 3.1 Neural network 13
Fig 3.2 Artificial Neural Network Architecture 15
Fig 3.3 The Structure of a Neuron 15
Fig 3.4 Log sigmoid transfer function 16
Fig 3.5 Linear transfer function 16
Fig 4.1 Structure of an Artificial neuron 21
Fig 4.2 Hard limit transfer function 22
Fig 4.3 Linear transfer function 23
Fig 4.4 Log sigmoid transfer function 23
Fig 5.1 All networks training results for cement 32
Fig 5.2 LM log-log training for cement 32
Fig 5.3 LM log-lin training for cement 33
Fig 5.4 SCG & RP training for cement 33
Fig 5.5 All networks training results for water 34
Fig 5.6 LM log-log training for water 35
Fig 5.7 LM log-lin training for water 35
Fig 5.8 SCG & RP training for water 36
Fig 5.9 All networks training results for Fine aggregate 37
Fig 5.10 LM log-log training for Fine aggregate 37
Fig 5.11 LM log-lin training for Fine aggregate 38
Fig 5.12 SCG & RP training for Fine aggregate 38
Fig 5.13 All networks training results for Coarse aggregate 39
Fig 5.14 LM log-log training for Coarse aggregate 40
Fig 5.15 LM log-lin training for Coarse aggregate 40
Fig 5.16 SCG & RP training for Coarse aggregate 41
Fig 5.17 Average Testing MSE 42
Fig 5.18 Comparing training and testing MSE for Cement 43
Fig 5.19 Comparing training and testing MSE for Cement (log-lin) 44
9. Fig 5.20 Comparing training and testing MSE for Water 45
Fig 5.21 Comparing training and testing MSE for Water (log-lin) 45
Fig 5.22 Comparing training and testing MSE for Fine aggregates 46
Fig 5.23 Comparing training and testing MSE for Fine Aggregates (log-lin) 47
Fig 5.24 Comparing training and testing MSE for Coarse aggregates 48
Fig 5.25 Comparing training and testing MSE for Coarse Aggregates (log-lin) 48
10. 1
Chapter-1
INTRODUCTION
1.1 Introduction
Concrete is the most widely used construction material because of its flow-ability in most
complicated form i.e. its ability to take any shape while wet, and its strength development
characteristics when it hardens. Concrete production is a complex process that involves the effect
of several processing parameters on the quality control of concrete pertaining to workability,
strength etc. These parameters are all effective in producing a single strength quantity of
compressive strength.
Artificial intelligence has proven its capability in simulating and predicting the behavior of the
different physical phenomena in most of the engineering fields. Artificial intelligence is receiving
greater attention from the building industry to aid in the decision-making process in areas such as
diagnostics, design, and repair and rehabilitation. In civil engineering, design of concrete mix is
difficult and sensitive. The classical way for the determination of concrete mix design is based on
uncertainty and depends on expert ideas. Concrete is essentially a mixture which comprises of
paste and aggregates. In concrete mix design and quality control, the uniaxial compressive
strength of concrete is considered as the most valuable property, which in turn is influenced by a
number of factors. The concrete mix design is based on the principles of workability of fresh
concrete, desired strength and durability of hardened concrete which in turn is governed by water-
cement ratio law. The strength of the concrete is determined by the characteristics of the mortar,
coarse aggregate, and the interface. For the same quality mortar, different types of coarse
aggregate with different shape, texture, mineralogy, and strength may result in different concrete
strengths. There are various types of mixes such as nominal mix, standard mix and design mix.
Nominal mixes are mixes of fixed cement-aggregate ratio which ensures adequate strength.
However, due to the variability of mix ingredients the nominal concrete for a given workability
varies widely in strength. The nominal mixes of fixed cement-aggregate ratio (by volume) vary
widely in strength and may result in under- or over-rich mixes. For this reason, the minimum
compressive strength has been included in many specifications. These mixes are termed standard
mixes. In designed mixes the performance of the concrete is specified by the designer but the mix
proportions are determined by the producer of concrete, except that the minimum cement content
can be laid down. The common method of expressing the proportions of ingredients of a concrete
11. 2
mix is in the terms of parts or ratios of cement, fine and coarse aggregates. For e.g., a concrete
mix of proportions 1:2:4 means that cement, fine and coarse aggregate are in the ratio 1:2:4 or the
mix contains one part of cement, two parts of fine aggregate and four parts of coarse aggregate.
The proportions are either by volume or by mass which provides two design methods. The
concrete mix design can be carried out using IS standard code or US system of units. The tests for
compressive strength are generally carried out at about 7, 14 or 28 days from the date of placing
the concrete. The testing at 28-days is standard and therefore essential and at other ages can be
carried out, if necessary.
1.2 Applications of ANN in Civil Engineering
ANNs have been applied to many civil engineering applications with some degree of success.
ANNs have been applied to geotechnical problem like prediction of settlement of shallow
foundations. Many researchers have used ANN in structural engineering developing various
neural network models
12. 3
Chapter-2
CONCRETE MIX DESIGN
2.1 Introduction
Concrete mix design is of two types:
1. Nominal concrete mix
2. Designed concrete mix
Nominal concrete mixes are those specified by standard codes for common construction
works. These mix takes into consideration the margin for quality control, material quality and
workmanship in concrete construction.
M10, M15, M20 are the commonly used nominal mixes used in construction. For higher grade
of concrete, i.e. M25 and above, it is advised to have designed mix concrete.
Designed mix concrete suggests proportions of cement, sand, aggregates and water (and
sometimes admixtures) based on actual material quality, degree of quality control, quality of
materials and their moisture content for given concrete compressive strength required for the
project. Designed mix concrete are carried out in laboratory and based on various tests and
revisions in mix designs, the final mix proportions are suggested.
The concrete mix can be designed from M10 to various grades of concrete such as M50, M80,
M100 etc for various workability requirements from no slump to 150mm slump values. These
grades of concrete can be achieved by variations in the mix proportions and laboratory tests to
ascertain it.
Sometimes admixtures are also required to enhance some properties of concrete such as
workability, setting time etc. These admixtures also need to be considered during concrete
mix design calculations for its optimum use. Their overdose can affect the properties of
concrete and can cause harm to strength and durability.
Concrete mix design is the method of proportioning of ingredients of concrete to enhance its
properties during plastic stage as well as during hardened stage, as well as to find economical
mix proportions.
2.2 Various methods of proportioning
a) Arbitrary proportion
b) Fineness modulus method
c) Maximum density method
d) Surface area method
13. 4
e) Indian Road Congress, IRC 44 method
f) High strength concrete mix design
g) Mix design based on flexural strength
h) Road note No. 4 (Grading Curve method)
i) ACI Committee 211 method
j) DOE method
k) Mix design for pumpable concrete
l) Indian standard Recommended method IS 10262-82
The DOE method and Indian standard recommended methods are commonly used. Since
concrete is very commonly placed by pumping these days. And we are only concerned with
the IS Method
2.3 Requirements for Concrete Mix Design
Requirements of concrete mix design should be known before calculations for concrete mix. Mix
design is done in the laboratory and samples from each mix designed is tested for confirmation of
result. But before the mix design process is started, the information about available materials,
strength of concrete required, workability, site conditions etc. are required to be known.
Following are the information required for concrete mix design:
1. Characteristic strength of concrete required: Characteristic strength is the strength of
concrete below which not more than 5% of test results of samples are expected to fall. This can
also be called as the grade of concrete required for mix design. For example, for M30 grade
concrete, the required concrete compressive strength is 30 N/mm2
and characteristic strength is
also the same.
14. 5
Fig 2.1: Characteristic Strength of Concrete
2. Workability requirement of concrete: The workability of concrete is commonly measured by
slump test. The slump value or workability requirement of concrete is based on the type of
concrete construction.
Fig 2.2: Workability of Concrete – Slump Test
For example, reinforced concrete construction with high percentage of steel reinforcement, it will
be difficult to compact the concrete with vibrators or other equipment. In this case, the
workability of concrete should be such that the concrete flows to each and every part of the
member. For concrete member, where it is easy to compact the concrete, low workability concrete
can also be used.
It is also known that with increase in workability of concrete, the strength of concrete reduces.
Thus, based on type of structure or structural member, the workability requirement of concrete
should be assumed and considered in the mix design.
15. 6
For pumped concrete, it is essential to have high workability to transfer concrete to greater heights
with ease. This case also should be considered in the mix design.
3. Quality control at site: The strength and durability of concrete depends on the degree of
quality control during construction operation at site. Nominal mixes of concrete assumes the
worst quality control at site based on past experiences.
Thus, for design mix concrete, it is essential to understand the quality control capability of
contractor and workmen at construction site in mixing, transporting, placing, compacting and
curing of concrete. Each step in concrete construction process affects the strength and durability
of concrete.
The availability of workmen also affects quality control of concrete. The more skilled workmen
and supervision helps to maintain good quality construction.
4. Weather conditions: Weather impacts the setting time of concrete. In hot climate, the concrete
tends to set early due to loss in moisture, and in this case, the concrete need to have higher water
cement ratio or special admixtures to delay initial setting of concrete. Recommendations for
concrete cooling agents also required to be mentioned in the mix design for very hot weather
conditions.
In cold climates, the initial setting time of concrete increases as the moisture loss rate is very low.
Due to this, water cement ratio is considered appropriately. Admixtures should also be
recommended to prevent freezing of concrete in case of very cold climate.
5. Exposure conditions of concrete: Exposure conditions play an important role in the mix
design of concrete. The exposure conditions such as chemical actions, coastal areas etc. needs to
be considered for the given site. Generally exposure conditions as per code of practices are mild,
moderate, severe, very severe and extreme exposure conditions for concrete constructions.
The grade of concrete and durability requirements of concrete changes with exposure conditions.
For extreme exposure conditions some standard codes mention minimum strength of concrete as
M35.
6. Batching and mixing methods: There are two types of batching method, i.e. volumetric
batching and batching by weight. These two conditions should be known for concrete mix design
calculations.
16. 7
Mixing methods include manual mixing, machine mixing, ready mix concrete etc. The quality
control of concrete varies with each type of mixing method.
7. Quality of materials: Each construction material should have been tested in laboratory before
it is considered for mix design calculations. The type of material, their moisture content,
suitability for construction, and their chemical and physical properties affects the mix design of
concrete. Type of cement to be used for construction, coarse and fine aggregates sources, their
size and shape should be considered.
8. Special Requirements of concrete: Special requirement of concrete such as setting times,
early strength, flexural strength
2.4 Concrete Mix design as per IS 10262-2009
Data required for mix design:
a) Grade designation:
b) Type of cement;
c) Maximum nominal size of aggregate;
d) Minimum cement content:
e) Maximum water-cement ratio;
f) Workability;
g) Exposure conditions as per Table 4 and Table 5 of IS -456;
h) Maximum temperature of concrete at the time of placing :
i) Method of transporting and placing;
j) Early age strength requirements, if required:
k) Type of aggregate;
l) Maximum cement content; and
m) Whether an admixture shall or shall not be used and the type of admixture and the
condition of use.
Procedure for concrete mix design requires following step by step process:
I. Calculation of target strength of concrete
II. Selection of water-cement ratio
III. Selection of water content for concrete
IV. Selection of cement content for concrete
V. Calculation of aggregate ratio
VI. Calculation of aggregate content for concrete
17. 8
VII. Trial mixes for testing concrete mix design strength
I. Calculation of target strength of concrete
Target strength is denoted by ft which is obtained by characteristic compressive strength of
concrete at 28 days (fck) and value of standard deviation (s)
ft = fck + 1.65 s
Standard deviation can be taken from below table
Grade of concrete Standard deviation
(N/mm2
)
M10 3.5
M15 3.5
M20 4.0
M25 4.0
M30 5.0
M35 5.0
M40 5.0
M45 5.0
M50 5.0
Table 2.1 : Standard Deviation
II. Selection of Water Cement ratio
Ratio of the weight of water to weight of cement in the concrete mix is water-cement ratio. It is
the important consideration in concrete mix design to make the concrete workable. Water cement
ratio is selected from the below curve for 28 days characteristic compressive strength of concrete.
18. 9
Fig 2.3 : Selection of Water-Cement Ratio for Concrete Mix Design
Similarly, we can determine the water-cement ration from the 7-day concrete strength, the curves
are divided on the basis of strength from water cement ratio is decided. Which is observed from
the below graph.
Fig 2.4 : Concrete Compressive Strength vs. Water Cement Ratio
19. 10
III. Selection of water content
Select the water content which is useful to get required workability with the help of nominal
maximum size of aggregate as given in below table. The table given below is used when only
angular shaped aggregates are used in concrete as well as the slump should be 25 to 50mm.
Nominal maximum
size of aggregate
Maximum
water content
10mm 208
20mm 186
40mm 165
Table 2.2: Maximum Water Content
If the shape of aggregate or slump value is differing from above, then some adjustments are
required as follows.
Condition Adjustment
Sub angular aggregate Reduce the selected value by 10%
Gravel with crushed stone Reduce the selected value by 20kg
Rounded gravel Reduce the selected value by 25kg
Using plasticizer Decrease the selected value by 5-10%
Using superplasticizer Decrease the selected value by 20-30%
For every increment of 25mm slump Increase the selected value by 3%
Table 2.3: Water Content adjustments
IV.Selection of Cement Content for Concrete
Water – cement ratio is determined in step2 and quantity of water is determined in step -4. So, we
can easily calculate the quantity of cement from these two conditions. But, the value obtained
should satisfy the minimum conditions as given in the below table. The greater of the two values
is decided as quantity of cement content.
20. 11
Cement Content for Reinforced Concrete
Exposure
Reinforced Cement Concrete (RCC)
Minimum Cement
Content Kg/m3
Max Free Water –
Cement Ratio
Minimum Grade of
Concrete
Mild 300 0.55 M20
Moderate 300 0.5 M25
Severe 320 0.45 M30
Very
severe
340 0.45 M35
Extreme 360 0.4 M40
Table 2.4: Cement content
Calculation of Aggregate Ratio
For the given nominal maximum size of aggregate, we can calculate the ratio of volumes of
coarse aggregate and volume of total aggregates for different zones of fine aggregates from the
below table.
Nominal
maximum size of
aggregate
Ratio of volume of coarse aggregate and volume of total
aggregate for different zones of fine aggregate
Zone – 1 Zone – 2 Zone – 3 Zone – 4
10mm 0.44 0.46 0.48 0.50
20mm 0.6 0.62 0.64 0.66
40mm 0.69 0.71 0.73 0.75
Table 2.5: Ratio of volume of coarse aggregate
V. Calculation of Aggregate Content for Concrete
We already determine the coarse aggregate volume ratio in the total aggregate volume. So, it is
very easy that, 1 – volume of coarse aggregate will give the volume of fine aggregate.
Alternatively, there are some formulae to find the volume of fine and coarse aggregates as
follows.
21. 12
Mass of fine aggregate is calculated from below formula
Similarly, mass of coarse aggregate is calculated from below formula.
Where, V = volume of concrete
W = water content
C = cement content
Gc = sp. Gravity of cement
P = aggregate ration obtained in step6
F.A & C.A = masses of fine and coarse aggregates
VI. Trial Mixes for Testing Concrete Mix Design Strength
Based on the values obtained above, conduct a trail test by making at least 3 cubes of 150mm size
as per above standards. Test that cubes and verify whether the required strength is gained or not.
If not, redesign the mix with proper adjustments until required strength of cube occurs.
22. 13
Chapter – 3
ARTIFICIAL NEURAL NETWORK
3.1 Introduction
Neural network is defined as a mathematical model composed of a large number of processing
elements organized into layers. They process many inputs simultaneously, strengthening some,
weakening others, to get the desired output. The neural network technique is particularly useful
for determining a nonlinear system with a number of variables. No predefined mathematical
relationship between the variables was assumed. Instead the neural network learns by examples
fed. In structural engineering, neural networks have been used successfully in diverse fields such
as structure control, design of expert systems, sub structural identifications and sequential analysis
of tall buildings etc.
Neural networks are composed of simple elements operating in parallel. These elements are
inspired by biological nervous systems. As in nature, the connections between elements largely
determine the network function. You can train a neural network to perform a particular function
by adjusting the values of the connections (weights) between elements.
Typically, neural networks are adjusted, or trained, so that a particular input leads to a specific
target output. The next figure illustrates such a situation. Here, the network is adjusted, based on a
comparison of the output and the target, until the network output matches the target. Typically,
many such input/target pairs are needed to train a network.
Fig. 3.1 Neural network
Neural networks have been trained to perform complex functions in various fields, including
pattern recognition, identification, classification, speech, vision, and control systems.
Neural networks can also be trained to solve problems that are difficult for conventional
computers or human beings. The toolbox emphasizes the use of neural network paradigms that
build up to—or are themselves used in— engineering, financial, and other practical applications.
23. 14
3.2 Neural network Applications
It would be impossible to cover the total range of applications for which neural networks have
provided outstanding solutions. The remaining sections of this topic describe only a few of the
applications in function fitting, pattern recognition, clustering, and time series analysis. The
following table provides an idea of the diversity of applications for which neural networks
provide state-of-the-art solutions.
Aerospace, Automotive, Banking, Defense, Electronics, Entertainment, Financial, Industrial,
Insurance, Manufacturing, Medical, Oil and gas, Robotics, Securities, Speech,
Telecommunications, Transportation
3.3 Neural Network design steps
1. Collect data
2. Create the network
3. Configure the network
4. Initialize the weights and biases
5. Train the network
6. Validate the network
7. Use the network
3.4 Concept behind ANN
Back Propagation Neural Network
If we consider the human brain to be the 'ultimate' neural network, then ideally we would like to
build a device which imitates the brain's functions. However, because of limits in our technology,
we must settle for a much simpler design. The obvious approach is to design a small electronic
device which has a transfer function similar to a biological neuron, and then connect each neuron
to many other neurons, using RLC networks to imitate the dendrites, axons, and synapses. This
type of electronic model is still rather complex to implement, and we may have difficulty
'teaching' the network to do anything useful. Further constraints are needed to make the design
more manageable. First, we change the connectivity between the neurons so that they are in
distinct layers, such that each neruon in one layer is connected to every neuron in the next layer.
Further, we define that signals flow only in one direction across the network, and we simplify the
neuron and synapse design to behave as analog comparators being driven by the other neurons
through simple resistors. We now have a feed-forward neural network model that may actually be
practical to build and use.
24. 15
Referring to both figures below, the network functions as follows: Each neuron receives a signal
from the neurons in the previous layer, and each of those signals is multiplied by a separate
weight value. The weighted inputs are summed, and passed through a limiting function which
scales the output to a fixed range of values. The output of the limiter is then broadcast to all of the
neurons in the next layer. So, to use the network to solve a problem, we apply the input values to
the inputs of the first layer, allow the signals to propagate through the network, and read the
output values.
Fig- 3.2 : Artificial Neural Network Architecture
An elementary neuron with R inputs is shown below. Each input is weighted with an appropriate
w. The sum of the weighted inputs and the bias forms the input to the transfer function f. Neurons
may use any differentiable transfer function f to generate their output.
Fig- 3.3 : The Structure of a Neuron
In this project, we are only using log-sig and linear transfer functions.
25. 16
Fig 3.4 : Log-Sigmoid Transfer Function
The function logsig generates outputs between 0 and 1 as the neuron’s net input goes from
negative to positive infinity
If the last layer of a multilayer network has sigmoid neurons, then the outputs of the network are
limited to a small range. If linear output neurons are used the network outputs can take on any
value
Fig 3.5 : Linear Transfer Function
Backpropagation Algorithm.
There are many variations of the backpropagation algorithm. The simplest implementation of
backpropagation learning updates the network weights and biases in the direction in which the
performance function decreases most rapidly - the negative of the gradient. One iteration of this
algorithm can be written
where Xk is a vector of current weights and biases, gk is the current gradient, and αk is the learning
rate.
There are two different ways in which this gradient descent algorithm can be implemented:
incremental mode and batch mode. In the incremental mode, the gradient is computed and the
weights are updated after each input is applied to the network. In the batch mode all of the inputs
are applied to the network before the weights are updated. The next section describes the batch
mode of training.
Advantages of Backpropagation
1. It does not require any pre programming
26. 17
2. Backpropagation algorithm has ability to give precise results even if some of the data is
missing out of the database.
3. Can be applied to any sort of simple as well as complex problems.
4. This algorithm has great tolerance towards the noisy data.
5. Any modification in the structure of the neural network does not affect the working of
backpropagation algorithm.
6. As this algorithm is fast, it saves time.
7. Backpropagation have been applied in various fields and it has been seen that the results
obtained are very close to the actual results maximum number of times.
8. The adaptation and generalization capabilities of the neural networks had greatly increased.
9. With backpropagation algorithm, neural networks can be applied to any real world problem
very conveniently and the results obtained can be trusted upon under a particular error range.
2.6 Limitations of Backpropagation
1. A goal is to be set for error function at which the network is said to be well trained. To achieve
this goal network is to be trained again and again and at different epochs. This process consumes
a lot of time.
2. In absence of sufficient data i.e. when the data provided to the network is somewhat in less
quantity, the generalization of network becomes weak and the results obtained may vary largely
from the actual data.
3. There is always a chance of over fitting of the curve.
4. Backpropagation algorithm does not tell about the black box calculations,
(i.e. the relationship between the input and the outputs remain unknown.)
5. Backpropagation algorithm cannot function if the error function is discontinuous or is not
differentiable.
27. 18
Chapter – 4
DEVELOPMENT OF THE NETWORK
4.1 Introduction
In this project, feed forward neural networks with backpropagation algorithm have been used. For
the development of the network, MATLAB’s Neural Network Toolbox ‘nntool’ and for training
the network. In such a network, the information is first propagated towards the output nodes
passing through the hidden nodes with some initial random weights. Now the variation in the
output from the networks and the target values are evaluated and this difference in values is
termed as an error. These errors are now propagated in backward direction and the associated
weights of hidden layers are updated. Now again, with these updated weights the information is
sent ahead. This iteration is made till the desired level of accuracy in results or the goal is
achieved. If the network is trained with sufficient data, one can expect quiet convincing results.
For the same reason, several networks have been developed in this study. Total of five input
nodes are provided to the network. The variation in these input parameters is shown below.
4.2 Input Data
The following are the data variables chosen as Input
a. Grade of concrete - M20, M25, M30, M35, M40, M45, M50, M55
b. Water-cement ratio proposed - 0.4, 0.45, 0.5
c. Workability (Sump) - 75, 100, 125, 150, 175, 200 mm sump
d. Exposure conditions - Mild, Moderate, Severe, Very Severe, Extreme
e. Sand Zone – I, II , III , IV
Number of datasets; 8 x 3 x 6 x 5 x 4 = 2880 data sets. Which I was able to generate using the C
Program in ANEXXURE I. Also refer to ANEXXURE II for generated data.
Assumptions:
i. Nominal size of aggregate is taken as 20mm
ii. Concrete is pumpable
iii. Super plasticizer is used.
28. 19
4.3 Output layer
1. Weight of cement
2. Weight of Water
3. Weight of Fine aggregate
4. Weight of Coarse aggregate.
4.4 Data Pre-processing
Once the most appropriate raw input data has been selected, it must be pre-processed; otherwise,
the neural network will not produce accurate forecasts. The decisions made in this phase of
development are critical to the performance of a network.
Transformation and normalization are two widely used pre-processing methods. Transformation
involves manipulating raw data inputs to create a single input to a net, while normalization is a
transformation performed on a single data input to distribute the data evenly and scale it into an
acceptable range for the network. Knowledge of the domain is important in choosing pre-
processing methods to highlight underlying features in the data, which can increase the network's
ability to learn the association between inputs and outputs.
Some simple pre-processing methods include computing differences between or taking ratios of
inputs. This reduces the number of inputs to the network and helps it learn more easily. In
financial forecasting, transformations that involve the use of standard technical indicators should
also be considered. Moving averages, for example, which are utilized to help smooth price data,
can be useful as a transform.
When creating a neural net to predict tomorrow's close, a five-day simple moving average of the
close can be used as an input to the net. This benefits the net in two ways. First, it has been given
useful information at a reasonable level of detail; and second, by smoothing the data, the noise
entering the network has been reduced. This is important because noise can obscure the
underlying relationships within input data from the network, as it must concentrate on interpreting
the noise component. The only disadvantage is that worthwhile information might be lost in an
effort to reduce the noise, but this trade off always exists when attempting to smooth noisy data.
While not all technical indicators have a smoothing effect, this does not mean that they cannot be
utilized as data transforms. Possible candidates are other common indicators such as the relative
strength index (RSI), the average directional movement indicator (ADX) and stochastics.
Data normalization is the final pre-processing step. In normalizing data, the goal is to ensure that
the statistical distribution of values for each net input and output is roughly uniform. In addition,
the values should be scaled to match the range of the input neurons. This means that along with
any other transformations performed on network inputs, each input should be normalized as well.
29. 20
4.5 Data Normalization
Here are three methods of data normalization, the first of which is a simple linear scaling of data.
At the very least, data must be scaled into the range used by the input neurons in the neural
network. This is typically the range of -1 to 1 or zero to 1. Many commercially available generic
neural network development programs such as NeuralWorks, BrainMaker and DynaMind
automatically scale each input and in my case, MATLAB’s Neural Network Toolbox does the
same. This function can also be performed in a spreadsheet or custom-written program. Of course,
a linear scaling requires that the minimum and maximum values associated with the facts for a
single data input be found. Let's call these values Dmin and Dmax, respectively. The input range
required for the network must also be determined. Let's assume that the input range is from Imin
to Imax. The formula for transforming each data value D to an input value I is:
I = Imin + (Imax-Imin)*(D-Dmin)/(Dmax-Dmin)
Dmin and Dmax must be computed on an input-by-input basis. This method of normalization will
scale input data into the appropriate range but will not increase its uniformity.
The second normalization method utilizes a statistical measure of central tendency and variance to
help remove outliers, and spread out the distribution of the data, which tends to increase
uniformity. This is a relatively simple method of normalization, in which the mean and standard
deviation for the input data associated with each input are determined. Dmin is then set to the
mean minus some number of standard deviations. So, if the mean is 50, the standard deviation
three and two standard deviations are chosen, then the Dmin value would be 44 (50-2*3).
Dmax is conversely set to the mean plus two standard deviations. All data values less than Dmin
are set to Dmin and all data values greater than Dmax are set to Dmax. A linear scaling is then
performed as described above. By clipping off the ends of the distribution this way, outliers are
removed, causing data to be more uniformly distributed. The third normalization method
minimizes the standard deviation of the heights of the columns in the initial frequency distribution
histogram.
When the network is run on a new test fact, the output produced must be denormalized. If the
normalization is entirely reversible with little or no loss in accuracy, then there is no problem.
However, if the original normalization involved clipping outlier values, then output values equal
to the clipping boundaries should be suspect concerning their actual value. For example, assume
that during training ail output values greater than 50 were clipped. Then, during testing, if the net
produces an output of 50, this indicates only that the net's output is 50 or greater. If that
30. 21
information is acceptable for the application, then the normalization method would be sufficiently
reversible.
Transformation and normalization can greatly improve a network's performance. Basically, these
preprocessing methods are used to encode the highest-level knowledge that is known about a
given problem.
4.6 Creating a Neural Network
As stated earlier, ANNs are such networks which consist of a number of small and powerful
computing units called nodes or neurons. A neuron can be considered as a processing element
which receives the input signals from the input layer and generates an output pulse for the next
connected neuron. Each neuron is connected to every other neuron by a weight and a bias which
indicates the strength of the network.
Fig 4.1 - Structure of an Artificial Neuron
Figure shows the mathematical model of an artificial neuron. The processing of information in the
network takes place through these interconnected artificial neurons only. It implies that these
artificial neurons play a very important role in the learning process of the network in addition to
the validation and testing of the networks.
This artificial neuron receives signals from all the inputs which are fed into the network. Some
random weights are initially associated with each of the inputs. As the signals reach the neuron,
weighted summision takes place, i.e., each input vector is multiplied with its corresponding
weight vector and all the multiplied units are then added. Mathematically it can be represented as:
31. 22
Now the entire weighted sum goes through a sigmoidal transfer function. With the help of a
transfer function a relationship is developed between the inputs and the outputs which help the
network to learn and generalize in a better way. Transfer function (T) can be expressed as:
The number of neurons in the network is adjusted by trial and error method. In this research work,
the single layer networks with five neurons initially are adopted and are increased by five
additional neurons up to twenty five neurons. When switched to double layer networks, starting
has been done by taking five neurons in each of the two layers and increased their number by two
neurons till there were nine neurons in each layer. Maximum of nine neurons, in case of two layer
networks, have been taken since two layered networks showed much better results than single
layered networks with only nine neurons in each layer.
4.7 Activation/Transfer Function
The transfer function may be a linear or a nonlinear function of n. A particular transfer function is
chosen to satisfy some specification of the problem that the neuron is attempting to solve.
A variety of transfer functions have been included in this book. Three of the most commonly used
functions are discussed below.
The hard limit transfer function, shown on the left side of the Figure, sets the output of the neuron
to 0 if the function argument is less than 0, or 1 if its argument is greater than or equal to 0. We
will use this function to create neurons that classify inputs into two distinct categories.
Fig 4.2 : Hard Limit Transfer Function
The graph on the right side of Figure illustrates the input/output characteristic of a single-input
neuron that uses a hard limit transfer function. Here we can see the effect of the weight and the
bias.
The output of a linear transfer function is equal to its input:
a = n
Neurons with this transfer function are used in the ADALINE networks
32. 23
Fig 4.3: Linear Transfer Function
The output (a) versus input (p) characteristic of a single-input linear neuron with a bias is shown
on the right of the figure
The log-sigmoid transfer function is shown in figure below
Fig 4.4: log-sigmoid Transfer Function
This transfer function takes the input (which may have any value between plus and minus
infinity) and squashes the output into the range 0 to 1, according to the expression:
The log-sigmoid transfer function is commonly used in multilayer networks that are trained using
the backpropagation algorithm, in part because this function is differentiable
Most of the transfer functions used in this book are summarized in table below.
33. 24
Table 4.1: Transfer Functions
4.8 Hidden Layers
Hidden layers are one of the most important components of the multilayered feed forward neural
networks with backpropagation algorithm. This is the layer in which the hidden neurons make
their place. The number of hidden layers in a neural network may be taken as one or more
depending on the complexity of the problem under consideration. Increase in the number of
hidden layer provides much better generalization of the problem statement. In this case, I have
chosen 2 hidden layers. Neural networks with more than one hidden layer are best suited for
problems relating to functions approximation. It implies that more the number of hidden layers
more accurate will be the results given by the network.
34. 25
4.9 Hidden Layers Neurons
Hidden layer neurons affect level of accuracy between the output and the target values. It has
been already discussed before that there is no rule or guideline yet to show us the way towards
selecting the number of neurons in the hidden layer(s). The best way is trial and error method.
In my case, the 1st
hidden layer is varied and the 2nd
hidden layer is kept 4 as the number of output
nodes is 4.
4.10 Output Measurement
It is necessary to measure figure out the correctness of our output result compared to the target
values. So, Concepts like SSE and MSE are used. In my project, MSE has been used to compute
the correctness.
Mean square error (MSE) algorithm is an example of supervised training, in which the learning
rule is provided with a set of examples of desired network behavior.
Here pQ is an input to the network, and tq is the corresponding target output. As each input is
applied to the network, the network output is compared to the target. The error is calculated as the
difference between the target output and the network output. We want to minimize the average of
the sum of these errors.
The LMS algorithm adjusts the weights and biases of the ADALINE so as to minimize this mean
square error.
Fortunately, the mean square error performance index for the ADALINE network is a quadratic
function. Thus, the performance index will either have one global minimum, a weak minimum, or
no minimum, depending on the characteristics of the input vectors. Specifically, the
characteristics of the input vectors determine whether or not a unique solution exists
4.11 Training Functions
It is very difficult to know which training algorithm will be the fastest for a given problem. It
depends on many factors, including the complexity of the problem, the number of data points in
the training set, the number of weights and biases in the network, the error goal, and whether the
network is being used for pattern recognition (discriminant analysis) or function approximation
(regression). This section compares the various training algorithms. Feedforward networks are
35. 26
trained on six different problems. Three of the problems fall in the pattern recognition category
and the three others fall in the function approximation category. Two of the problems are simple
"toy" problems, while the other four are "real world" problems. Networks with a variety of
different architectures and complexities are used, and the networks are trained to a variety of
different accuracy levels.
The following table lists the algorithms that are tested and the acronyms used to identify them.
Notation Training Algorithm
trainb Batch training with weight and bias learning rules.
trainbfg BFGS quasi-Newton backpropagation.
trainbr Bayesian regularization.
trainc Cyclical order incremental update.
traincgb Powell-Beale conjugate gradient backpropagation.
traincgf Fletcher-Powell conjugate gradient backpropagation.
traincgp Polak-Ribiere conjugate gradient backpropagation.
traingd Gradient descent backpropagation.
traingda Gradient descent with adaptive lr backpropagation.
traingdm Gradient descent with momentum backpropagation.
traingdx Gradient descent with momentum and adaptive lr backprop.
trainlm Levenberg-Marquardt backpropagation.
trainoss One step secant backpropagation.
trainr Random order incremental update.
trainrp Resilient backpropagation
trains Sequential order incremental update.
trainscg Scaled conjugate gradient backpropagation.
Table 4.2: Training functions
The fastest training function is generally trainlm, and it is the default training function for
feedforwardnet. The quasi-Newton method, trainbfg, is also quite fast. Both of these methods tend
to be less efficient for large networks (with thousands of weights), since they require more
memory and more computation time for these cases. Also, trainlm performs better on function
fitting (nonlinear regression) problems than on pattern recognition problems.
When training large networks, and when training pattern recognition networks, trainscg and
trainrp are good choices. Their memory requirements are relatively small, and yet they are much
faster than standard gradient descent algorithms.
36. 27
In the present problem we have used Levenberg-Marquardt backpropagation algorithm (trainLM),
Resilient backpropagation algorithm (trainRP) and Scaled conjugate gradient backpropagation
(trainSCG) is used for training the datasets provided to the neural network. With these training
functions training becomes fast and the network’s generalization power is increased as compared
to the other training functions.
Levenberg-Marquardt algorithm was designed to approach second-order training speed
without having to compute the Hessian matrix. When the performance function has the form of a
sum of squares (as is typical in training feedforward networks), then the Hessian matrix can be
approximated as
H = JT
J
and the gradient can be computed as
g = JT
e
where, J is the Jacobian matrix that contains first derivatives of the network errors with respect to
the weights and biases, and e is a vector of network errors. The Jacobian matrix can be computed
through a standard backpropagation technique that is much less complex than computing the
Hessian matrix.
The Levenberg-Marquardt algorithm uses this approximation to the Hessian matrix in the
following Newton-like update:
xk+1=xk−[JT
J+μI]−1
JT
e
Resilient Backpropagation (trainrp): When the scalar µ is zero, this is just Newton's method,
using the approximate Hessian matrix. When µ is large, this becomes gradient descent with a
small step size. Newton's method is faster and more accurate near an error minimum, so the aim is
to shift toward Newton's method as quickly as possible. Thus, µ is decreased after each successful
step (reduction in performance function) and is increased only when a tentative step would
increase the performance function. In this way, the performance function is always reduced at
each iteration of the algorithm.
Multilayer networks typically use sigmoid transfer functions in the hidden layers. These functions
are often called "squashing" functions, because they compress an infinite input range into a finite
output range. Sigmoid functions are characterized by the fact that their slopes must approach zero
as the input gets large. This causes a problem when you use steepest descent to train a multilayer
network with sigmoid functions, because the gradient can have a very small magnitude and,
therefore, cause small changes in the weights and biases, even though the weights and biases are
far from their optimal values.
37. 28
The purpose of the resilient backpropagation (Rprop) training algorithm is to eliminate these
harmful effects of the magnitudes of the partial derivatives. Only the sign of the derivative can
determine the direction of the weight update; the magnitude of the derivative has no effect on the
weight update. The size of the weight change is determined by a separate update value. The
update value for each weight and bias is increased by a factor delt_inc whenever the derivative of
the performance function with respect to that weight has the same sign for two successive
iterations. The update value is decreased by a factor delt_dec whenever the derivative with respect
to that weight changes sign from the previous iteration. If the derivative is zero, the update value
remains the same. Whenever the weights are oscillating, the weight change is reduced. If the
weight continues to change in the same direction for several iterations, the magnitude of the
weight change increases.
Scaled Conjugate Gradient (trainscg): Each of the conjugate gradient algorithms that we have
discussed so far requires a line search at each iteration. This line search is computationally
expensive, since it requires that the network response to all training inputs be computed several
times for each search. The scaled conjugate gradient algorithm (SCG), developed by Moller, was
designed to avoid the time-consuming line search. This algorithm is too complex to explain in a
few lines, but the basic idea is to combine the model-trust region approach (used in the
Levenberg-Marquardt algorithm described later), with the conjugate gradient approach.
4.12 Validation
The idea of cross validation is to split the training set into two: a set of examples to train with, and
a validation set. The agent trains using the new training set. Prediction on the validation set is
used to determine which model to use.
The error of the training set gets smaller as the size of the tree grows. The idea of cross validation
is to choose the representation in which the error of the validation set is a minimum. In these
cases, learning can continue until the error of the validation set starts to increase.
The validation set that is used as part of training is not the same as the test set. The test set is used
to evaluate how well the learning algorithm works as a whole. It is cheating to use the test set as
part of learning. Remember that the aim is to predict examples that the agent has not seen. The
test set acts as a surrogate for these unseen examples, and so it cannot be used for training or
validation.
Typically, we want to train on as many examples as possible, because then we get better models.
However, having a small validation set means that the validation set may fit well, or not fit well,
38. 29
just by luck. There are various methods that have been used to reuse examples for both training
and validation.
One method, k-fold cross validation, is used to determine the best model complexity, such as the
depth of a decision tree or the number of hidden units in a neural network. The method of k-fold
cross validation partitions the training set into k sets. For each model complexity, the learner
trains k times, each time using one of the sets as the validation set and the remaining sets as the
training set. It then selects the model complexity that has the smallest average error on the
validation set (averaging over the k runs). It can return the model with that complexity, trained on
all of the data.
4.13 Testing the Network
After creating the network and training he network, we now need to make sure the network
obtained is giving optimum results and compare the error among all other networks created. Here,
a portion of the testing data (20%) is chosen and the output from the network is simulated and
variation in output from the network and the target data is calculate using any performance
parameter (MSE in my case). And MSE from training and MSE from testing is compared also to
obtain the optimum network parameters.
39. 30
Chapter – 5
Network Results and Analysis
5.1 Introduction
In this project, a network is created to calculate the weight of Cement, Water, Fine aggregate,
Coarse aggregate as close as possible to the target values calculated.
Parameter No.
Grade of concrete M20, M25, M30, M35,
M40, M45, M50, M55
8
8 x 3 x 6 x 5 x 4
= 2880 data sets
(ANEXXURE II)
Water-cement ratio 0.4, 0.45, 0.5 3
Workability (Sump) 75, 100, 125, 150, 175, 200
mm sump
6
Exposure conditions Mild, Moderate, Severe,
Very Severe, Extreme
5
Sand Zone I, II , III , IV 4
Table 5.1 : Input data
5.2 Networks and its parameters
All the networks are created with two hidden layers and the 2nd
hidden layers has 4 neurons in all
the cases. And the transfer function in the 1st
hidden layer is LOGSIG and transfer function used
in 2nd
hidden later is LINEAR and LOGSIG
S.No. Network Name Parameters
1 LM10 Levenberg-Marquardt and 10
neurons in 1st
hidden layer using
LOGSIG in 2nd
hidden layer
2 LM15 Levenberg-Marquardt and 15
neurons in 1st
hidden layer using
LOGSIG in 2nd
hidden layer
3 LM20 Levenberg-Marquardt and 20
neurons in 1st
hidden layer using
LOGSIG in 2nd
hidden layer
4 LM25 Levenberg-Marquardt and 25
neurons in 1st
hidden layer using
LOGSIG in 2nd
hidden layer
40. 31
5 LM30 Levenberg-Marquardt and 30
neurons in 1st
hidden layer using
LOGSIG in 2nd
hidden layer
6 RP10 Resilient backpropagation and 10
neurons in 1st
hidden layer using
LOGSIG in 2nd
hidden layer
7 RP15 Resilient backpropagation and 15
neurons in 1st
hidden layer using
LOGSIG in 2nd
hidden layer
8 SCG10 Scaled conjugate gradient
backpropagation and 10 neurons in
1st
hidden layer using LOGSIG in
2nd
hidden layer
9 SCG20 Scaled conjugate gradient
backpropagation and 20 neurons in
1st
hidden layer using LOGSIG in
2nd
hidden layer
10 LM15 (loglin) Levenberg-Marquardt and 15
neurons in 1st
hidden layer using
LINEAR in 2nd
hidden layer
11 LM20 (loglin) Levenberg-Marquardt and 20
neurons in 1st
hidden layer using
LINEAR in 2nd
hidden layer
12 LM25 (loglin) Levenberg-Marquardt and 25
neurons in 1st
hidden layer using
LINEAR in 2nd
hidden layer
13 LM30 (loglin) Levenberg-Marquardt and 30
neurons in 1st
hidden layer using
LINEAR in 2nd
hidden layer
Table 5.2: List of Networks
41. 32
5.3 Training Results
I. Cement
The graph below shows the variation of MSE on incresing the number of epoches for a number of
networks. We can see that with increasing the number of epoches the MSE value is decreasing
which is an partial proof of learning.
Fig 5.1: All Networks Training Results for Cement
Levenberg-Marquardt can be seen to give satisfactory results. As we can see from the graph
below.
Results from log-log type networks as shown above have relatively better results than SCG and
RP but not as good as log-lin networks.
Fig 5.2: LM log-log Training Results for Cement
42. 33
The minimum MSE in log-log LM networks is when number of neurons is 30 i.e 0.018431.
Fig 5.3: LM log-lin Training Results for Cement
The minimum MSE in log-lin LM networks is when number of neurons is 30 i.e 1.22E-06.
MSE from trainRP and trainSCG is too high to be considered even be considered.
Fig 5.4: SCG and RP Training Results for Cement
43. 34
Results at 1000 Epoches for differeent networks:
Network MSE Value
LM - N10 5.552515742
LM - N15 4.179284163
LM - N20 0.47008216
LM - N25 0.316832399
LM - N30 0.018430618
RP - 10 8.212146214
RP - 15 6.761043908
SCG - 10 9.945478184
SCG - 20 4.454723801
LM - 15 (loglin) 3.40E-05
LM - 20 (loglin) 8.49E-06
LM - 25 (loglin) 4.49E-06
LM - 30 (loglin) 1.22E-06
Table 5.3: Training Results for cement at 1000 epoches
Lowest MSE value is at LM30 (loglin)
II. Water
The graph below shows the variation of MSE on incresing the number of epoches for a number of
networks. We can see that with increasing the number of epoches the MSE value is decreasing
which is an partial proof of learning.
Fig 5.5: All Networks Training Results for Water
Levenberg-Marquardt can be seen to give satisfactory results. As we can see from the graph
below.
44. 35
Results from log-log type networks as shown above have relatively better results than SCG and
RP but not as good as log-lin networks.
Fig 5.6: LM log-log Training Results for Water
The minimum MSE in log-log LM networks is when number of neurons is 30 i.e 0.003002
.
Fig 5.7: LM log-lin Training Results for Water
The minimum MSE in log-lin LM networks is when number of neurons is 30 i.e 6.17E-09
MSE from trainRP and trainSCG is too high to be considered even be considered.
45. 36
Fig 5.8: SCG and RP Training Results for Water
Results at 1000 Epoches for differeent networks:
Network MSE
LM - N10 1.109134131
LM - N15 0.876586265
LM - N20 0.741057293
LM - N25 0.042340691
LM - N30 0.00300169
RP - 10 1.207834925
RP - 15 1.223718087
SCG - 10 1.240245568
SCG - 20 0.30152675
LM - 15 (loglin) 6.53E-07
LM - 20 (loglin) 1.34E-07
LM - 25 (loglin) 6.94E-09
LM - 30 (loglin) 6.17E-09
Table 5.4: Training Results for water at 1000 epoches
Lowest MSE value is at LM30 (loglin)
46. 37
III. Fine Aggregates
The graph below shows the variation of MSE on incresing the number of epoches for a number of
networks. We can see that with increasing the number of epoches the MSE value is decreasing
which is an partial proof of learning.
Fig 5.9: All Networks Training Results for Fine Aggregate
Levenberg-Marquardt can be seen to give satisfactory results. As we can see from the graph
below.
Results from log-log type networks as shown above have relatively better results than SCG and
RP but not as good as log-lin networks.
Fig 10: LM log-log Training Results for Fine Aggregate
47. 38
The minimum MSE in log-log LM networks is when number of neurons is 30 i.e 0.231152237
.
Fig 5.11: LM log-lin Training Results for Fine Aggregate
The minimum MSE in log-lin LM networks is when number of neurons is 30 i.e 4.05E-06
MSE from trainRP and trainSCG is too high to be considered even be considered.
Fig 12: SCG and RP Training Results for Fine Aggregate
48. 39
Results at 1000 Epoches for differeent networks:
Network MSE
LM - N10 2.541479549
LM - N15 4.012993016
LM - N20 0.855845994
LM - N25 0.730554904
LM - N30 0.231152237
RP - 10 20.68261928
RP - 15 4.978310282
SCG - 10 11.74366199
SCG - 20 3.598431346
LM - 15 (loglin) 1.16E-04
LM - 20 (loglin) 4.20E-05
LM - 25 (loglin) 1.32E-05
LM - 30 (loglin) 4.05E-06
Table 5.5: Training Results for Fine Agg. at 1000 epoches
Lowest MSE value is at LM30 (loglin)
IV.Coarse Aggregate
The graph below shows the variation of MSE on incresing the number of epoches for a number of
networks. We can see that with increasing the number of epoches the MSE value is decreasing
which is an partial proof of learning.
Fig 5.13: All Networks Training Results for Coarse Aggregate
49. 40
Levenberg-Marquardt can be seen to give satisfactory results. As we can see from the graph
below.
Results from log-log type networks as shown above have relatively better results than SCG and
RP but not as good as log-lin networks.
Fig 5.14: LM log-log Training Results for Coarse Aggregate
The minimum MSE in log-log LM networks is when number of neurons is 30 i.e 0.528527
Fig 5.15: LM log-lin Training Results for Coarse Aggregate
The minimum MSE in log-lin LM networks is when number of neurons is 30 i.e 4.56E-06
50. 41
MSE from trainRP and trainSCG is too high to be considered even be considered.
Fig 5.16: SCG and RP Training Results for Coarse Aggregate
Results at 1000 Epoches for differeent networks:
Network MSE
LM - N10 19.0980555
LM - N15 0.82276396
LM - N20 0.70105641
LM - N25 0.53954786
LM - N30 0.52852653
RP - 10 1.98503475
RP - 15 6.9521942
SCG - 10 4.96784281
SCG - 20 1.10445118
LM - 15 (loglin) 1.44E-04
LM - 20 (loglin) 4.38E-05
LM - 25 (loglin) 1.30E-05
LM - 30 (loglin) 4.56E-06
Table 5.6: Training Results for Coarse aggregate at 1000 epoches
Lowest MSE value is at LM30 (loglin)
51. 42
5.4 Testing The Network
For this stage, 400 data sets are selected in randam containing all parameter elements and the
network is allowed to simulate the result for the input data withholding the target data and
comparing the output data simulated from the network with the target values for the
corresponding elements.
The MSE is calculated between the output data and the target data.
TESTING MSE
AVERAGE CEMENT WATER FA CA
LM10 1.0216601 5.950495 1.127625 2.430874 19.91482
LM15 0.355639202 4.722554 0.931689 3.703647 0.884518
LM20 0.102709893 0.602859 0.780609 0.826565 0.748011
LM25 0.057513369 0.376062 0.042308 0.630338 0.607677
LM30 0.026107764 0.019521 0.002919 0.207262 0.522202
RP10 3.383312006 51.66036 1.897651 36.58328 7.298104
RP15 2.119432205 29.94712 1.202748 22.53727 7.352518
SCG10 1.464134303 10.32793 1.270838 8.948948 21.61935
SCG20 0.663681973 7.468851 1.21401 8.020094 2.411086
LM - 15 (loglin) 9.66381E-06 3.03E-05 2.96E-07 0.000114 0.000133
LM - 20 (loglin) 2.88267E-06 5.53E-06 7.22E-08 3.7E-05 4.04E-05
LM - 25 (loglin) 5.77123E-07 1.9E-06 2E-09 6.54E-06 8.18E-06
LM - 30 (loglin) 3.35711E-07 1.11E-06 6.15E-09 3.97E-06 4.58E-06
Table 5.7: Testing MSE
Fig 5.17: Average Testing MSE
52. 43
5.5 Comparing Training and Testing Data
MSE Values obtained from training the network at 1000 epochs and MSE values of testing the
network is compared to obtain the optimum network.
Note: The Smallest MSE values are Bolded
I. Cement
Training MSE @ 1000
epochs
TESTING
LM10 5.552516 5.950495
LM15 4.179284 4.722554
LM20 0.470082 0.602859
LM25 0.316832 0.376062
LM30 0.018431 0.019521
RP10 8.212146 51.66036
RP15 6.761044 29.94712
SCG10 9.945478 10.32793
SCG20 4.454724 7.468851
LM - 15 (loglin) 3.40E-05 3.03E-05
LM - 20 (loglin) 8.49E-06 5.53E-06
LM - 25 (loglin) 4.49E-06 1.9E-06
LM - 30 (loglin) 1.22E-06 1.11E-06
Table 5.8: Comparing Training and testing MSE for Cement
Fig 5.18: Comparing Training and Testing MSE for Cement
53. 44
Fig 5.19: Comparing Training and Testing MSE for Cement (Log-lin)
II. Water
Training MSE @1000
epochs
TESTING
LM10 1.109134 1.127624987
LM15 0.876586 0.931688923
LM20 0.741057 0.78060943
LM25 0.042341 0.042308071
LM30 0.003002 0.002918722
RP10 1.207835 1.897650666
RP15 1.223718 1.202747618
SCG10 1.240246 1.270837636
SCG20 0.301527 1.214010462
LM – 15 (loglin) 6.53E-07 2.9562E-07
LM – 20 (loglin) 1.34E-07 7.21867E-08
LM – 25 (loglin) 6.94E-09 2.0046E-09
LM – 30 (loglin) 6.17E-09 6.1549E-09
Table 5.9: Comparing training and testing MSE for Water
54. 45
Fig 5.20: Comparing Training and Testing MSE for Water
Fig 5.21: Comparing Training and Testing MSE for Water (Log-lin)
55. 46
III. Fine Aggregates
Training MSE @1000
epochs
TESTING
LM10 2.54148 2.430873792
LM15 4.012993 3.703647255
LM20 0.855846 0.826565367
LM25 0.730555 0.630337937
LM30 0.231152 0.207262006
RP10 20.68262 36.58327539
RP15 4.97831 22.53726515
SCG10 11.74366 8.948948188
SCG20 3.598431 8.020093567
LM - 15 (loglin) 0.000116 0.00011447
LM - 20 (loglin) 4.20E-05 3.69923E-05
LM - 25 (loglin) 1.32E-05 6.54023E-06
LM - 30 (loglin) 4.05E-06 3.97361E-06
Table 5.10: Comparing training and testing MSE for Fine Aggregates
Fig 5.22: Comparing Training and Testing MSE for Fine aggregates
56. 47
Fig 5.23: Comparing Training and Testing MSE for Fine Aggregates (Log-lin)
IV. Coarse Aggregates
Training MSE @1000
epochs
TESTING
LM10 19.09806 19.91481703
LM15 0.822764 0.884518336
LM20 0.701056 0.748011495
LM25 0.539548 0.607677216
LM30 0.528527 0.522201794
RP10 1.985035 7.298103652
RP15 6.952194 7.35251788
SCG10 4.967843 21.61934891
SCG20 1.104451 2.411085685
LM - 15 (loglin) 0.000144 0.000133229
LM - 20 (loglin) 4.38E-05 4.04269E-05
LM - 25 (loglin) 1.30E-05 8.17807E-06
LM - 30 (loglin) 4.56E-06 4.58236E-06
Table 5.11: Comparing training and testing MSE for Coarse Aggregates
57. 48
Fig 5.24: Comparing Training and Testing MSE for Coarse aggregates
Fig 5.24: Comparing Training and Testing MSE for Coarse Aggregates (Log-lin)
59. 50
5.7 Conclusion
Optimum network for,
1 Cement LM30 (loglin)
2 Water LM25 (loglin)
3 Fine Aggregate LM30 (loglin)
4 Coarse Aggregate LM30 (loglin)
60. 51
Chapter-6
CONCLUSION
From my work on Neural Networks using a feedforward backpropagation double layered network
with training functions Levenberg-Marquardt backpropagation algorithm (trainLM), Resilient
backpropagation algorithm (trainRP) and Scaled conjugate gradient backpropagation (trainSCG)
and activation functions LOGSIG and LINEAR. I was able to come to the following conclusions:
1. Artificial Neural Networks is applicable for calculating quantities for concrete mix design.
2. Out of all the combinations of activation functions, logsigmoid-linear combination gives
the best results.
3. Levenberg-Marquardt backpropagation algorithm gives far better results compared to any
other.
4. Out of all the trail networks, LM30 (loglin) network gives the best results with an average
training MSE of 2.46E-06 and average testing MSE of 3.35711E-07.
5. A single network may not be suitable for calculating all the output variables. From the
results while LM25 (loglin) is optimum for the calculation of water content with MSE of
2.0046E-09, all other output variables have optimum results at LM30 (loglin).
61. 52
REFERENCES
Reference / Hand Books
[1] “CONCRETE MIX PROPORTIONING – GUIDELINES – IS 10262- 2009”, BIS, First
revision
[2] “HANDBOOK ON CONCRETE MIXES – IS sp 23 - 1982”, BIS, 2001
[1] “CONCRETE TECHNOLOGY THEORY AND PRACTICE by M S SHETTY”, S. CHAND
& COMPANY LTD., Revised Edition 2005
Web
[1] Neural networks, https://www.doc.ic.ac.uk
[2] Neural Networks, http://stackoverflow.com
[3] Neural Networks, https://www.tutorialspoint.com
62. 53
ANNEXURE – I
#include <stdio.h>
#include <stdlib.h>
int main()
{
int i,j,k,l,m,n,o, z;
float grade[]={20,25,30,35,40,45,50,55}, WC[]={0.4,0.45,0.5}, SZ[]={1,2,3,4}, Work[] =
{75, 100, 125, 150 , 175 , 200}, Exposure[] = {1 , 2 , 3 , 4 , 5};
FILE *fout = fopen("output.txt", "w");
if (!fout) {
fprintf(stderr, "Can't open the file. n");
exit(EXIT_FAILURE);
}
int sgrade = sizeof(grade)/sizeof(float), sWC = sizeof(WC)/sizeof(float), sSZ = sizeof(SZ)
/ sizeof(float),
sExposure = sizeof(Exposure) / sizeof(float), sWork = sizeof(Work) / sizeof(float);
for(i = 0;i < sgrade;i++)
for(j = 0; j < sWC; j++)
for (z = 0; z < sWork; z++)
for(m = 0; m < sExposure; m++)
for(l = 0; l < sSZ; l++)
fprintf(fout, "%.0f,%.3f,%.0f,%.0f,%.0fn",
grade[i], WC[j], Work[z], Exposure[m], SZ[l]);
}