SlideShare a Scribd company logo
International School of Engineering
awards
Certificate in Engineering Excellence
in Big Data Analytics and Optimization
to
Miraj Vashi
on successful completion of all the requirements of the 352-hour program
conducted between June 25, 2016 and December 11, 2016 followed by a project defense.
This program is certified for quality of content, assessment and pedagogy by the Language Technologies Institute (LTI)
of Carnegie Mellon University (CMU). LTI also provided assistance in curriculum development for this program.
Dated this fifteenth day of December, two thousand and sixteen.
Dr. Dakshinamurthy V Kolluru Dr. Sridhar Pappu
President Executive VP - Academics
01CSE03/201612/793 Program details are on the back
Mode: Classroom Teaching
Topics Covered
Certificate Type
Certificate of Participation Assessment-based training program Professional certification
Essential Engineering Skills in Big Data Analytics
Reading from Excel, CSV and other forms; Data exploration (histograms, bar charts, box plots, line graphs and scatter graphs);
Storytelling with data: The science, ggplot, bubble charts with multiple dimensions, gauge charts, tree maps, heat maps and
motion charts
Data pre-processing of structured data: R, Handling missing values, Binning, Standardization, Outliers/Noise, PCA, Type
conversion
Fundamentals of Probability and Statistical Methods
Probabilistic analysis of data and models, Analyzing networks and graphs: Analyzing transitions, Markov chains and
unstructured data
Computing the properties of an attribute: Central tendencies (Mean, Median, Mode, Range, Variance, Standard Deviation);
Expectations of a Variable; Describing an attribute: Probability distributions (Discrete and Continuous) - Bernoulli, Geometric,
Binomial, Poisson and Exponential distributions; Special emphasis on Normal distribution; Central Limit Theorem; t-distribution
Describing the relationship between attributes: Covariance; Correlation; ChiSquare
Inferential statistics: How to learn about the population from a sample and vice-versa, Sampling distributions, Confidence
Intervals, Hypothesis Testing; ANOVA
Statistics and Probability in Decision Modeling
Regression (Linear, Multivariate Regression) in forecasting; Analyzing and interpreting regression results;
Logistic Regression for classification
Trend analysis and Time Series; Cyclical and Seasonal analysis; Box-Jenkins method; Smoothing; Moving averages; Auto-
correlation; ARIMA – Holt-Winters method
Bayesian analysis and Naïve Bayes classifier; Bayesian Belief Networks
Optimization and Decision Analysis
Genetic algorithms: The algorithm and the process, Representing data, Why and how do they work?
Linear Programming: Graphical analysis; Sensitivity and Duality analyses
Integer and Binary programming: Applications, Problem formulation, Solving in R
Goal programming; Data development analysis
Quadratic programming
Engineering Big Data with R and Hadoop Ecosystem
Big Data, Hadoop applications;
Parallel and Distributed computing;
Introduction to algorithms; Concurrent algorithms;
Linux refresher; NoSQL; GFS; HDFS; CDH4-HDFS
Map Reduce: YARN Map Reduce Applications: Text Mining, Page Rank, Graph processing Hadoop ecosystem components: Pig,
Hive, HBase, Sqoop, Mahout, H2O, Hama, Flume, Chukwa, Avro, Whirr, Hue, Oozie, Zookeeper, Kafka
Hadoop Streaming with R and Python
Spark-SQL, Streaming and ML
Text Mining, Social Network Analysis and Natural Language Processing
Introduction to text mining and text pre-processing: Write a web crawler to collect data, R, Find unique words and counts,
Handling number, Punctuations, Stop words, Incorrect spellings, Stemming, Lemmatization and TxD computation
Unstructured vs. semi-structured data; Fundamentals of information retrieval
Properties of words; Vector space models; Creating Term-Document (TxD) matrices; Similarity measures
Low-level processes (Sentence Splitting; Tokenization; Part-of-Speech Tagging; Stemming; Chunking)
Text classification and feature selection: How to use Naïve Bayes classifier for text classification
Evaluation systems on the accuracy of text mining
Sentiment Analysis, Natural Language Analysis
Discussion of text mining tools and applications
Naive Bayes Classifier
Methods and Algorithms in Machine Learning — Unsupervised and Supervised
Rule based knowledge: Logic of rules, Evaluating rules, Rule induction and Association rules
Construction of Decision Trees through simplified examples; Choosing the "best" attribute at each
non-leaf node; Entropy; Information Gain; Generalizing Decision Trees; Information Content and Gain Ratio; Dealing with
numerical variables; Other measures of randomness; Pruning a Decision Tree; Cost as a consideration; Unwrapping Trees as
rules
Specialized decision trees (oblique trees)
Ensemble and Hybrid models, AdaBoost, Random Forests
K-Nearest Neighbor method; Wilson editing and triangulations; K-nearest neighbors in collaborative filtering, digit recognition
Motivation for Neural Networks and its applications; Perceptron and Single Layer Neural Network, and hand calculations;
Learning in a Neural Net: Back propagation and conjugant gradient techniques; Application of Neural Net in Face and Digit
Recognition
Connectivity models (hierarchical clustering); Centroid models (K-Means algorithm); Distribution models (Expectation
maximization); Bayesian Analysis
Linear learning machines and Kernel methods in learning
VC (Vapnik-Chervonenkis) dimension; Shattering power of models
Algorithm of Support Vector Machines (SVM)
Advance Machine Learning
Concept of regularization, Lasso, Ridge and Elasticnets, GLM
Deep learning method, auto-encoders and feature extraction methods
Convolution Neural Nets and Recurrent Neural Nets with applications in text analysis
Matrix factorization methods
Monte Carlo simulations, Evolutionary Search methods - , Genetic Algorithms, Simulated Annealing
Planning, Thinking and architecting data science solutions
The Art and Science of Storytelling with Data Visualizations
Why and Where of Visualizations
Storytelling- A great art and science (examples)
Communicating with data: Issues and guiding principles; Primary ingredients of data visualization; How to pick visual encodings
such as colour, shape, size, etc.; Which chart to use when; How to accommodate more than 2 dimensions
Case highlighting the transition from a simple chart to a powerful visualization, complete with storytelling
Using R-ggplots and Qliksense for visualizations
Communication, Ethical and IP Challenges for Analytics Professionals (Video)
Why is Communication important?
How to communicate effectively: Telling stories
Communications issues from daily life using examples using audio, video, blogs, charts, email, etc.
Seeing the big picture; Paying attention to details; Seeing things from multiple perspectives
Challenges: Mix of stakeholders, Explicability of results, Visualization
Guiding Principles: Clarity, Transparency, Integrity, Humility
Framework for Effective Presentations; Examples of bad and good presentations
Writing effective technical reports
Difference between Legal and Ethical issues
Challenges in current laws, regulations and fair information practices: Data protection, Intellectual property rights,
Confidentiality, Contractual liability, Competition law, Licensing of Open Source software and Open Data
How to handle legal, ethical and IP issues at an organization and an individual level
The “Ethics Check” questions

More Related Content

What's hot

392_SannaReddyBharath (1)
392_SannaReddyBharath (1)392_SannaReddyBharath (1)
392_SannaReddyBharath (1)bharath reddy
 
Data quality and uncertainty visualization
Data quality and uncertainty visualizationData quality and uncertainty visualization
Data quality and uncertainty visualizationbdemchak
 
Provinance in scientific workflows in e science
Provinance in scientific workflows in e scienceProvinance in scientific workflows in e science
Provinance in scientific workflows in e sciencebdemchak
 
Imtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsImtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsimtiaz khan
 
Bootcamp python-1
Bootcamp python-1Bootcamp python-1
Bootcamp python-1Era Wibowo
 
Datamining intro-iep
Datamining intro-iepDatamining intro-iep
Datamining intro-iepaaryarun1999
 
Model evaluation in the land of deep learning
Model evaluation in the land of deep learningModel evaluation in the land of deep learning
Model evaluation in the land of deep learningPramit Choudhary
 
Chi 2008 katsanos et al auto_cardsorter_final
Chi 2008 katsanos et al auto_cardsorter_finalChi 2008 katsanos et al auto_cardsorter_final
Chi 2008 katsanos et al auto_cardsorter_finalNikolaos Tselios
 
Data Quality Dimensions and measures
Data Quality Dimensions and measuresData Quality Dimensions and measures
Data Quality Dimensions and measuresAlex CK
 
Pikas bibliometricsfor21may2015
Pikas bibliometricsfor21may2015Pikas bibliometricsfor21may2015
Pikas bibliometricsfor21may2015Christina Pikas
 
Building Blocks for Distributed Geo-Knowledge Graphs
Building Blocks for Distributed Geo-Knowledge GraphsBuilding Blocks for Distributed Geo-Knowledge Graphs
Building Blocks for Distributed Geo-Knowledge Graphskjanowicz
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIPramit Choudhary
 
Big data visualization state of the art
Big data visualization state of the artBig data visualization state of the art
Big data visualization state of the artsoria musa
 

What's hot (20)

566_SriramDandamudi_CEE
566_SriramDandamudi_CEE566_SriramDandamudi_CEE
566_SriramDandamudi_CEE
 
662_AravindKumarN_CEE
662_AravindKumarN_CEE662_AravindKumarN_CEE
662_AravindKumarN_CEE
 
587_EswarPrasadReddyMachireddy_CEE
587_EswarPrasadReddyMachireddy_CEE587_EswarPrasadReddyMachireddy_CEE
587_EswarPrasadReddyMachireddy_CEE
 
671_JeevanRavula_CEE
671_JeevanRavula_CEE671_JeevanRavula_CEE
671_JeevanRavula_CEE
 
Data Analytics_BigData Cert
Data Analytics_BigData CertData Analytics_BigData Cert
Data Analytics_BigData Cert
 
362_NeelimaKandepu (1)
362_NeelimaKandepu (1)362_NeelimaKandepu (1)
362_NeelimaKandepu (1)
 
392_SannaReddyBharath (1)
392_SannaReddyBharath (1)392_SannaReddyBharath (1)
392_SannaReddyBharath (1)
 
Data quality and uncertainty visualization
Data quality and uncertainty visualizationData quality and uncertainty visualization
Data quality and uncertainty visualization
 
Provinance in scientific workflows in e science
Provinance in scientific workflows in e scienceProvinance in scientific workflows in e science
Provinance in scientific workflows in e science
 
Imtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsImtiaz khan data_science_analytics
Imtiaz khan data_science_analytics
 
Bootcamp python-1
Bootcamp python-1Bootcamp python-1
Bootcamp python-1
 
Datamining intro-iep
Datamining intro-iepDatamining intro-iep
Datamining intro-iep
 
Model evaluation in the land of deep learning
Model evaluation in the land of deep learningModel evaluation in the land of deep learning
Model evaluation in the land of deep learning
 
Chi 2008 katsanos et al auto_cardsorter_final
Chi 2008 katsanos et al auto_cardsorter_finalChi 2008 katsanos et al auto_cardsorter_final
Chi 2008 katsanos et al auto_cardsorter_final
 
Data Mining
Data MiningData Mining
Data Mining
 
Data Quality Dimensions and measures
Data Quality Dimensions and measuresData Quality Dimensions and measures
Data Quality Dimensions and measures
 
Pikas bibliometricsfor21may2015
Pikas bibliometricsfor21may2015Pikas bibliometricsfor21may2015
Pikas bibliometricsfor21may2015
 
Building Blocks for Distributed Geo-Knowledge Graphs
Building Blocks for Distributed Geo-Knowledge GraphsBuilding Blocks for Distributed Geo-Knowledge Graphs
Building Blocks for Distributed Geo-Knowledge Graphs
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AI
 
Big data visualization state of the art
Big data visualization state of the artBig data visualization state of the art
Big data visualization state of the art
 

Similar to Miraj Vashi_CPEE

848_VamsiKrishnaPenumadu_CEE
848_VamsiKrishnaPenumadu_CEE848_VamsiKrishnaPenumadu_CEE
848_VamsiKrishnaPenumadu_CEEVamsi Krishna
 
Sivrama Sarma - Profile_July_2015
Sivrama Sarma - Profile_July_2015Sivrama Sarma - Profile_July_2015
Sivrama Sarma - Profile_July_2015Siva Rama Sarma
 
Data Science Course in Pune
Data Science Course in Pune Data Science Course in Pune
Data Science Course in Pune nmdfilmProduction
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACwebuploader
 
factorization methods
factorization methodsfactorization methods
factorization methodsShaina Raza
 
Python for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive GuidePython for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive GuideAivada
 
DILEEP DATA SCIERNCES PROJECT POWERPOINT PPT
DILEEP DATA SCIERNCES PROJECT POWERPOINT PPTDILEEP DATA SCIERNCES PROJECT POWERPOINT PPT
DILEEP DATA SCIERNCES PROJECT POWERPOINT PPTPatnalaVeenamadhuri
 
Introduction to Fundamentals of Data Science
Introduction to Fundamentals of Data ScienceIntroduction to Fundamentals of Data Science
Introduction to Fundamentals of Data ScienceKakaraSrikanth1
 
Data Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdfData Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdfRAJVEERKUMAR41
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Miningdataminers.ir
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
Dwdm ppt for the btech student contain basis
Dwdm ppt for the btech student contain basisDwdm ppt for the btech student contain basis
Dwdm ppt for the btech student contain basisnivatripathy93
 
Topic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam FilterTopic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam FilterSudarsun Santhiappan
 
A Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationA Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationRich Heimann
 
Big Data Conference
Big Data ConferenceBig Data Conference
Big Data ConferenceDataTactics
 

Similar to Miraj Vashi_CPEE (20)

848_VamsiKrishnaPenumadu_CEE
848_VamsiKrishnaPenumadu_CEE848_VamsiKrishnaPenumadu_CEE
848_VamsiKrishnaPenumadu_CEE
 
Sivrama Sarma - Profile_July_2015
Sivrama Sarma - Profile_July_2015Sivrama Sarma - Profile_July_2015
Sivrama Sarma - Profile_July_2015
 
Data Science Course in Pune
Data Science Course in Pune Data Science Course in Pune
Data Science Course in Pune
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SAC
 
factorization methods
factorization methodsfactorization methods
factorization methods
 
Python for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive GuidePython for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive Guide
 
1st sem
1st sem1st sem
1st sem
 
1st sem
1st sem1st sem
1st sem
 
DILEEP DATA SCIERNCES PROJECT POWERPOINT PPT
DILEEP DATA SCIERNCES PROJECT POWERPOINT PPTDILEEP DATA SCIERNCES PROJECT POWERPOINT PPT
DILEEP DATA SCIERNCES PROJECT POWERPOINT PPT
 
Introduction to Fundamentals of Data Science
Introduction to Fundamentals of Data ScienceIntroduction to Fundamentals of Data Science
Introduction to Fundamentals of Data Science
 
DataScience_RoadMap_2023.pdf
DataScience_RoadMap_2023.pdfDataScience_RoadMap_2023.pdf
DataScience_RoadMap_2023.pdf
 
Data Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdfData Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdf
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Dwdm ppt for the btech student contain basis
Dwdm ppt for the btech student contain basisDwdm ppt for the btech student contain basis
Dwdm ppt for the btech student contain basis
 
resume_MH
resume_MHresume_MH
resume_MH
 
Topic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam FilterTopic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam Filter
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
A Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationA Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics Corporation
 
Big Data Conference
Big Data ConferenceBig Data Conference
Big Data Conference
 

Miraj Vashi_CPEE

  • 1. International School of Engineering awards Certificate in Engineering Excellence in Big Data Analytics and Optimization to Miraj Vashi on successful completion of all the requirements of the 352-hour program conducted between June 25, 2016 and December 11, 2016 followed by a project defense. This program is certified for quality of content, assessment and pedagogy by the Language Technologies Institute (LTI) of Carnegie Mellon University (CMU). LTI also provided assistance in curriculum development for this program. Dated this fifteenth day of December, two thousand and sixteen. Dr. Dakshinamurthy V Kolluru Dr. Sridhar Pappu President Executive VP - Academics 01CSE03/201612/793 Program details are on the back
  • 2. Mode: Classroom Teaching Topics Covered Certificate Type Certificate of Participation Assessment-based training program Professional certification Essential Engineering Skills in Big Data Analytics Reading from Excel, CSV and other forms; Data exploration (histograms, bar charts, box plots, line graphs and scatter graphs); Storytelling with data: The science, ggplot, bubble charts with multiple dimensions, gauge charts, tree maps, heat maps and motion charts Data pre-processing of structured data: R, Handling missing values, Binning, Standardization, Outliers/Noise, PCA, Type conversion Fundamentals of Probability and Statistical Methods Probabilistic analysis of data and models, Analyzing networks and graphs: Analyzing transitions, Markov chains and unstructured data Computing the properties of an attribute: Central tendencies (Mean, Median, Mode, Range, Variance, Standard Deviation); Expectations of a Variable; Describing an attribute: Probability distributions (Discrete and Continuous) - Bernoulli, Geometric, Binomial, Poisson and Exponential distributions; Special emphasis on Normal distribution; Central Limit Theorem; t-distribution Describing the relationship between attributes: Covariance; Correlation; ChiSquare Inferential statistics: How to learn about the population from a sample and vice-versa, Sampling distributions, Confidence Intervals, Hypothesis Testing; ANOVA Statistics and Probability in Decision Modeling Regression (Linear, Multivariate Regression) in forecasting; Analyzing and interpreting regression results; Logistic Regression for classification Trend analysis and Time Series; Cyclical and Seasonal analysis; Box-Jenkins method; Smoothing; Moving averages; Auto- correlation; ARIMA – Holt-Winters method Bayesian analysis and Naïve Bayes classifier; Bayesian Belief Networks Optimization and Decision Analysis Genetic algorithms: The algorithm and the process, Representing data, Why and how do they work? Linear Programming: Graphical analysis; Sensitivity and Duality analyses Integer and Binary programming: Applications, Problem formulation, Solving in R Goal programming; Data development analysis Quadratic programming Engineering Big Data with R and Hadoop Ecosystem Big Data, Hadoop applications; Parallel and Distributed computing; Introduction to algorithms; Concurrent algorithms; Linux refresher; NoSQL; GFS; HDFS; CDH4-HDFS Map Reduce: YARN Map Reduce Applications: Text Mining, Page Rank, Graph processing Hadoop ecosystem components: Pig, Hive, HBase, Sqoop, Mahout, H2O, Hama, Flume, Chukwa, Avro, Whirr, Hue, Oozie, Zookeeper, Kafka Hadoop Streaming with R and Python Spark-SQL, Streaming and ML Text Mining, Social Network Analysis and Natural Language Processing Introduction to text mining and text pre-processing: Write a web crawler to collect data, R, Find unique words and counts, Handling number, Punctuations, Stop words, Incorrect spellings, Stemming, Lemmatization and TxD computation Unstructured vs. semi-structured data; Fundamentals of information retrieval Properties of words; Vector space models; Creating Term-Document (TxD) matrices; Similarity measures Low-level processes (Sentence Splitting; Tokenization; Part-of-Speech Tagging; Stemming; Chunking) Text classification and feature selection: How to use Naïve Bayes classifier for text classification Evaluation systems on the accuracy of text mining Sentiment Analysis, Natural Language Analysis Discussion of text mining tools and applications Naive Bayes Classifier Methods and Algorithms in Machine Learning — Unsupervised and Supervised Rule based knowledge: Logic of rules, Evaluating rules, Rule induction and Association rules Construction of Decision Trees through simplified examples; Choosing the "best" attribute at each non-leaf node; Entropy; Information Gain; Generalizing Decision Trees; Information Content and Gain Ratio; Dealing with numerical variables; Other measures of randomness; Pruning a Decision Tree; Cost as a consideration; Unwrapping Trees as rules Specialized decision trees (oblique trees) Ensemble and Hybrid models, AdaBoost, Random Forests K-Nearest Neighbor method; Wilson editing and triangulations; K-nearest neighbors in collaborative filtering, digit recognition Motivation for Neural Networks and its applications; Perceptron and Single Layer Neural Network, and hand calculations; Learning in a Neural Net: Back propagation and conjugant gradient techniques; Application of Neural Net in Face and Digit Recognition Connectivity models (hierarchical clustering); Centroid models (K-Means algorithm); Distribution models (Expectation maximization); Bayesian Analysis Linear learning machines and Kernel methods in learning VC (Vapnik-Chervonenkis) dimension; Shattering power of models Algorithm of Support Vector Machines (SVM) Advance Machine Learning Concept of regularization, Lasso, Ridge and Elasticnets, GLM Deep learning method, auto-encoders and feature extraction methods Convolution Neural Nets and Recurrent Neural Nets with applications in text analysis Matrix factorization methods Monte Carlo simulations, Evolutionary Search methods - , Genetic Algorithms, Simulated Annealing Planning, Thinking and architecting data science solutions The Art and Science of Storytelling with Data Visualizations Why and Where of Visualizations Storytelling- A great art and science (examples) Communicating with data: Issues and guiding principles; Primary ingredients of data visualization; How to pick visual encodings such as colour, shape, size, etc.; Which chart to use when; How to accommodate more than 2 dimensions Case highlighting the transition from a simple chart to a powerful visualization, complete with storytelling Using R-ggplots and Qliksense for visualizations Communication, Ethical and IP Challenges for Analytics Professionals (Video) Why is Communication important? How to communicate effectively: Telling stories Communications issues from daily life using examples using audio, video, blogs, charts, email, etc. Seeing the big picture; Paying attention to details; Seeing things from multiple perspectives Challenges: Mix of stakeholders, Explicability of results, Visualization Guiding Principles: Clarity, Transparency, Integrity, Humility Framework for Effective Presentations; Examples of bad and good presentations Writing effective technical reports Difference between Legal and Ethical issues Challenges in current laws, regulations and fair information practices: Data protection, Intellectual property rights, Confidentiality, Contractual liability, Competition law, Licensing of Open Source software and Open Data How to handle legal, ethical and IP issues at an organization and an individual level The “Ethics Check” questions