SlideShare a Scribd company logo
1 of 16
Data Analysis and
Simulation Modeling
BY – VARUN SHARMA
Briefing
 The first half of this report will deal with simulation modeling, i.e. – To generate
data via computer simulation when you don’t have any.
 In the second half, I will be talking about Data Analysis and making predictions
based on the learning examples.
Some important Terms…
 Data Analysis is a process of inspecting, cleaning, transforming, and modeling
data with the goal of discovering useful information, suggesting conclusions, and
supporting decision-making.
 Simulation modeling is the process of creating and analyzing a digital prototype
of a physical model to predict its performance in the real world.
Monte Carlo Simulations
 Monte Carlo methods (or Monte Carlo experiments) are a broad class of
computational algorithms that rely on repeated random sampling to obtain
numerical results.
First Model
 You are given 6 balls in a rag, three are white and other three are black. You pick
three balls with eyes closed, find the probability that all three are of the same color.
def run():
c = [1,1,1,2,2,2]
a = []
for i in range(3):
a.append(random.choice(c))
c.remove(a[i])
if (sum(a) == 3) or (sum(a) == 6):
return True
else:
return False
Observation:
Running this simulation 500k times, we get the –
Output: 0.099574
Which is very close to the real value as per the formulas of probability theory, i.e. – 0.01
Modification of the model:
Everything is same but this time, you are given 8 balls in total, 4 of each color.
def run():
c = [1,1,1,1,2,2,2,2] #Declared and Initialized every time the function
is called (In each iteration)
a = []
for i in range(3):
a.append(random.choice(c))
c.remove(a[i]) #This removes the first instance of a[i] in the list
to simulate no replacing
if (sum(a) == 3) or (sum(a) == 6):
return True
else:
return False
Observation:
Running this simulation the same 500k times, we get -
Output: 0.143306
Which is very close to the real value of 0.14
HIV Virus Simulation
No Drugs Drugs with Change
Observation
 In case of No Drugs, the virus propagates without any barrier and grows
exponentially.
 However, in case of Simulation with Drugs :-
 Initially, the viruses grow slowly. Picking up resistances on the way. As we change
the drug given to the patient, the population of viruses’ drops significantly.
 In the meantime, the average population of resistant to the given drugs starts to
rise. After a few lifecycles, the average population of viruses is equal to the
average resistant population.
 Which means that only those viruses survived who developed a resistance and
every virus became resistant in the end.
Machine Learning
 Machine learning is a subfield of computer science that evolved from the study
of pattern recognition and computational learning theory in artificial intelligence.
 In this report, I will be dealing with Regression Analysis using Supervised Machine
Learning.
Regression Analysis
Dataset Model
Observation
 Theta found by gradient descent: -3.630291, 1.166362
 For the city with a population of 35,000, we predict a profit of 4519.767868
 For the city with a population of 70,000, we predict a profit of 45342.450129
Multivariate Gradient Descent
 Estimating Cost of House:
 Dataset: [Area (Sq. Feet), Bedrooms] [Price]
 Normalizing the Features...
 Running gradient descent for Normalized Dataset...
 Theta computed from gradient descent:
 334302.063993
 100087.116006
 3673.548451
 The prediction for a 3 bedroom house with area of 1650 sq. Feet:
 $289314.620338
Machine Learning for Indian Railways
 With advanced computers and storage techniques available, Indian Railways hold
the capability to generate and store data like never before.
 The problem arises when this data becomes so enormous that it cannot be
analyzed by conventional methods.
 But the possibilities remain enormous. CRIS is currently working on models to
predict Train Arrival Delays, Possible component breakdowns, and many more.
Future Aspects
 Whenever a train comes late, it causes inconvenience to the passengers, delays the
schedules and puts a question on the reliability of Indian Railway’s services.
 It has been seen that there is always a pattern to every event. The same is the case with
Train arrival times. When we analyze weather, seasons, date and time, we see a pattern
on how all these constraints affect arrival times.
 More than that, we get to know the ‘Hotspots’ of delays in train arrivals. By all this data,
we are able to predict the chances of any train getting late (and by how much time) at
any particular time when we feed in all these constraints to the system.
 This helps us plan ahead in time and be able to provide a better service.
Thank You!

More Related Content

Similar to Data Analysis and Simulation Modeling

Sampling as data collection
Sampling as data collectionSampling as data collection
Sampling as data collectionNaume Jnfajeven
 
Data collection and_sampling sample an method
Data collection and_sampling sample an methodData collection and_sampling sample an method
Data collection and_sampling sample an methodNaume Jnfajeven
 
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...cscpconf
 
Lecture 4 Applied Econometrics and Economic Modeling
Lecture 4 Applied Econometrics and Economic ModelingLecture 4 Applied Econometrics and Economic Modeling
Lecture 4 Applied Econometrics and Economic Modelingstone55
 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Simplilearn
 
韩国会议
韩国会议韩国会议
韩国会议YAO YUAN
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsIstituto nazionale di statistica
 
A Comparative Analysis of Genetic Algorithm Selection Techniques
A Comparative Analysis of Genetic Algorithm Selection TechniquesA Comparative Analysis of Genetic Algorithm Selection Techniques
A Comparative Analysis of Genetic Algorithm Selection TechniquesIRJET Journal
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics Bahzad5
 
Bigger Data v Better Math
Bigger Data v Better MathBigger Data v Better Math
Bigger Data v Better MathBrent Schneeman
 
Monte Carlo Simulations (UC Berkeley School of Information; July 11, 2019)
Monte Carlo Simulations (UC Berkeley School of Information; July 11, 2019)Monte Carlo Simulations (UC Berkeley School of Information; July 11, 2019)
Monte Carlo Simulations (UC Berkeley School of Information; July 11, 2019)Ivan Corneillet
 
Monte Carlo Simulation
Monte Carlo SimulationMonte Carlo Simulation
Monte Carlo SimulationDeepti Singh
 
Modeling selection pressure in XCS for proportionate and tournament selection
Modeling selection pressure in XCS for proportionate and tournament selectionModeling selection pressure in XCS for proportionate and tournament selection
Modeling selection pressure in XCS for proportionate and tournament selectionkknsastry
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifePeea Bal Chakraborty
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsParinaz Ameri
 
Cs854 lecturenotes01
Cs854 lecturenotes01Cs854 lecturenotes01
Cs854 lecturenotes01Mehmet Çelik
 

Similar to Data Analysis and Simulation Modeling (20)

Sampling as data collection
Sampling as data collectionSampling as data collection
Sampling as data collection
 
Data collection and_sampling sample an method
Data collection and_sampling sample an methodData collection and_sampling sample an method
Data collection and_sampling sample an method
 
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...
 
MyStataLab Assignment Help
MyStataLab Assignment HelpMyStataLab Assignment Help
MyStataLab Assignment Help
 
Lecture 4 Applied Econometrics and Economic Modeling
Lecture 4 Applied Econometrics and Economic ModelingLecture 4 Applied Econometrics and Economic Modeling
Lecture 4 Applied Econometrics and Economic Modeling
 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...
 
韩国会议
韩国会议韩国会议
韩国会议
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statistics
 
A Comparative Analysis of Genetic Algorithm Selection Techniques
A Comparative Analysis of Genetic Algorithm Selection TechniquesA Comparative Analysis of Genetic Algorithm Selection Techniques
A Comparative Analysis of Genetic Algorithm Selection Techniques
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics
 
Bigger Data v Better Math
Bigger Data v Better MathBigger Data v Better Math
Bigger Data v Better Math
 
Monte Carlo Simulations (UC Berkeley School of Information; July 11, 2019)
Monte Carlo Simulations (UC Berkeley School of Information; July 11, 2019)Monte Carlo Simulations (UC Berkeley School of Information; July 11, 2019)
Monte Carlo Simulations (UC Berkeley School of Information; July 11, 2019)
 
50120140503015
5012014050301550120140503015
50120140503015
 
Monte Carlo Simulation
Monte Carlo SimulationMonte Carlo Simulation
Monte Carlo Simulation
 
Modeling selection pressure in XCS for proportionate and tournament selection
Modeling selection pressure in XCS for proportionate and tournament selectionModeling selection pressure in XCS for proportionate and tournament selection
Modeling selection pressure in XCS for proportionate and tournament selection
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
 
SAMPLING-PROCEDURE.pdf
SAMPLING-PROCEDURE.pdfSAMPLING-PROCEDURE.pdf
SAMPLING-PROCEDURE.pdf
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data Scientists
 
Cs854 lecturenotes01
Cs854 lecturenotes01Cs854 lecturenotes01
Cs854 lecturenotes01
 
Simulation
SimulationSimulation
Simulation
 

Data Analysis and Simulation Modeling

  • 1. Data Analysis and Simulation Modeling BY – VARUN SHARMA
  • 2. Briefing  The first half of this report will deal with simulation modeling, i.e. – To generate data via computer simulation when you don’t have any.  In the second half, I will be talking about Data Analysis and making predictions based on the learning examples.
  • 3. Some important Terms…  Data Analysis is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making.  Simulation modeling is the process of creating and analyzing a digital prototype of a physical model to predict its performance in the real world.
  • 4. Monte Carlo Simulations  Monte Carlo methods (or Monte Carlo experiments) are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results.
  • 5. First Model  You are given 6 balls in a rag, three are white and other three are black. You pick three balls with eyes closed, find the probability that all three are of the same color. def run(): c = [1,1,1,2,2,2] a = [] for i in range(3): a.append(random.choice(c)) c.remove(a[i]) if (sum(a) == 3) or (sum(a) == 6): return True else: return False
  • 6. Observation: Running this simulation 500k times, we get the – Output: 0.099574 Which is very close to the real value as per the formulas of probability theory, i.e. – 0.01 Modification of the model: Everything is same but this time, you are given 8 balls in total, 4 of each color. def run(): c = [1,1,1,1,2,2,2,2] #Declared and Initialized every time the function is called (In each iteration) a = [] for i in range(3): a.append(random.choice(c)) c.remove(a[i]) #This removes the first instance of a[i] in the list to simulate no replacing if (sum(a) == 3) or (sum(a) == 6): return True else: return False
  • 7. Observation: Running this simulation the same 500k times, we get - Output: 0.143306 Which is very close to the real value of 0.14
  • 8. HIV Virus Simulation No Drugs Drugs with Change
  • 9. Observation  In case of No Drugs, the virus propagates without any barrier and grows exponentially.  However, in case of Simulation with Drugs :-  Initially, the viruses grow slowly. Picking up resistances on the way. As we change the drug given to the patient, the population of viruses’ drops significantly.  In the meantime, the average population of resistant to the given drugs starts to rise. After a few lifecycles, the average population of viruses is equal to the average resistant population.  Which means that only those viruses survived who developed a resistance and every virus became resistant in the end.
  • 10. Machine Learning  Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence.  In this report, I will be dealing with Regression Analysis using Supervised Machine Learning.
  • 12. Observation  Theta found by gradient descent: -3.630291, 1.166362  For the city with a population of 35,000, we predict a profit of 4519.767868  For the city with a population of 70,000, we predict a profit of 45342.450129
  • 13. Multivariate Gradient Descent  Estimating Cost of House:  Dataset: [Area (Sq. Feet), Bedrooms] [Price]  Normalizing the Features...  Running gradient descent for Normalized Dataset...  Theta computed from gradient descent:  334302.063993  100087.116006  3673.548451  The prediction for a 3 bedroom house with area of 1650 sq. Feet:  $289314.620338
  • 14. Machine Learning for Indian Railways  With advanced computers and storage techniques available, Indian Railways hold the capability to generate and store data like never before.  The problem arises when this data becomes so enormous that it cannot be analyzed by conventional methods.  But the possibilities remain enormous. CRIS is currently working on models to predict Train Arrival Delays, Possible component breakdowns, and many more.
  • 15. Future Aspects  Whenever a train comes late, it causes inconvenience to the passengers, delays the schedules and puts a question on the reliability of Indian Railway’s services.  It has been seen that there is always a pattern to every event. The same is the case with Train arrival times. When we analyze weather, seasons, date and time, we see a pattern on how all these constraints affect arrival times.  More than that, we get to know the ‘Hotspots’ of delays in train arrivals. By all this data, we are able to predict the chances of any train getting late (and by how much time) at any particular time when we feed in all these constraints to the system.  This helps us plan ahead in time and be able to provide a better service.