2. Briefing
The first half of this report will deal with simulation modeling, i.e. – To generate
data via computer simulation when you don’t have any.
In the second half, I will be talking about Data Analysis and making predictions
based on the learning examples.
3. Some important Terms…
Data Analysis is a process of inspecting, cleaning, transforming, and modeling
data with the goal of discovering useful information, suggesting conclusions, and
supporting decision-making.
Simulation modeling is the process of creating and analyzing a digital prototype
of a physical model to predict its performance in the real world.
4. Monte Carlo Simulations
Monte Carlo methods (or Monte Carlo experiments) are a broad class of
computational algorithms that rely on repeated random sampling to obtain
numerical results.
5. First Model
You are given 6 balls in a rag, three are white and other three are black. You pick
three balls with eyes closed, find the probability that all three are of the same color.
def run():
c = [1,1,1,2,2,2]
a = []
for i in range(3):
a.append(random.choice(c))
c.remove(a[i])
if (sum(a) == 3) or (sum(a) == 6):
return True
else:
return False
6. Observation:
Running this simulation 500k times, we get the –
Output: 0.099574
Which is very close to the real value as per the formulas of probability theory, i.e. – 0.01
Modification of the model:
Everything is same but this time, you are given 8 balls in total, 4 of each color.
def run():
c = [1,1,1,1,2,2,2,2] #Declared and Initialized every time the function
is called (In each iteration)
a = []
for i in range(3):
a.append(random.choice(c))
c.remove(a[i]) #This removes the first instance of a[i] in the list
to simulate no replacing
if (sum(a) == 3) or (sum(a) == 6):
return True
else:
return False
9. Observation
In case of No Drugs, the virus propagates without any barrier and grows
exponentially.
However, in case of Simulation with Drugs :-
Initially, the viruses grow slowly. Picking up resistances on the way. As we change
the drug given to the patient, the population of viruses’ drops significantly.
In the meantime, the average population of resistant to the given drugs starts to
rise. After a few lifecycles, the average population of viruses is equal to the
average resistant population.
Which means that only those viruses survived who developed a resistance and
every virus became resistant in the end.
10. Machine Learning
Machine learning is a subfield of computer science that evolved from the study
of pattern recognition and computational learning theory in artificial intelligence.
In this report, I will be dealing with Regression Analysis using Supervised Machine
Learning.
12. Observation
Theta found by gradient descent: -3.630291, 1.166362
For the city with a population of 35,000, we predict a profit of 4519.767868
For the city with a population of 70,000, we predict a profit of 45342.450129
13. Multivariate Gradient Descent
Estimating Cost of House:
Dataset: [Area (Sq. Feet), Bedrooms] [Price]
Normalizing the Features...
Running gradient descent for Normalized Dataset...
Theta computed from gradient descent:
334302.063993
100087.116006
3673.548451
The prediction for a 3 bedroom house with area of 1650 sq. Feet:
$289314.620338
14. Machine Learning for Indian Railways
With advanced computers and storage techniques available, Indian Railways hold
the capability to generate and store data like never before.
The problem arises when this data becomes so enormous that it cannot be
analyzed by conventional methods.
But the possibilities remain enormous. CRIS is currently working on models to
predict Train Arrival Delays, Possible component breakdowns, and many more.
15. Future Aspects
Whenever a train comes late, it causes inconvenience to the passengers, delays the
schedules and puts a question on the reliability of Indian Railway’s services.
It has been seen that there is always a pattern to every event. The same is the case with
Train arrival times. When we analyze weather, seasons, date and time, we see a pattern
on how all these constraints affect arrival times.
More than that, we get to know the ‘Hotspots’ of delays in train arrivals. By all this data,
we are able to predict the chances of any train getting late (and by how much time) at
any particular time when we feed in all these constraints to the system.
This helps us plan ahead in time and be able to provide a better service.