Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
SparkSPSS ModelerStatisticsscoring IBMMlib
Rvariance decission treealgorithm
regressiondistributionpropensity accuracybino...
SparkSPSS ModelerStatisticsscoring IBMMlib
Rvariance decission treealgorithm
regressiondistributionpropensity accuracybino...
SparkSPSS ModelerStatisticsscoring IBMMlib
Rvariance decission treealgorithm
regressiondistributionpropensity accuracybino...
SparkSPSS ModelerStatisticsscoring IBMMlib
Rvariance decission treealgorithm
regressiondistributionpropensity accuracybino...
SparkSPSS ModelerStatisticsscoring IBMMlib
Rvariance decission treealgorithm
regressiondistributionpropensity accuracybino...
SparkSPSS ModelerStatisticsscoring IBMMlib
Rvariance decission treealgorithm
regressiondistributionpropensity accuracybino...
SparkSPSS ModelerStatisticsscoring IBMMlib
Rvariance decission treealgorithm
regressiondistributionpropensity accuracybino...
SparkSPSS ModelerStatisticsscoring IBMMlib
Rvariance decission treealgorithm
regressiondistributionpropensity accuracybino...
SparkSPSS ModelerStatisticsscoring IBMMlib
Rvariance decission treealgorithm
regressiondistributionpropensity accuracybino...
SparkSPSS ModelerStatisticsscoring IBMMlib
Rvariance decission treealgorithm
regressiondistributionpropensity accuracybino...
SparkSPSS ModelerStatisticsscoring IBMMlib
Rvariance decission treealgorithm
regressiondistributionpropensity accuracybino...
SparkSPSS ModelerStatisticsscoring IBMMlib
Rvariance decission treealgorithm
regressiondistributionpropensity accuracybino...
SparkSPSS ModelerStatisticsscoring IBMMlib
Rvariance decission treealgorithm
regressiondistributionpropensity accuracybino...
SparkSPSS ModelerStatisticsscoring IBMMlib
Rvariance decission treealgorithm
regressiondistributionpropensity accuracybino...
SparkSPSS ModelerStatisticsscoring IBMMlib
Rvariance decission treealgorithm
regressiondistributionpropensity accuracybino...
SparkSPSS ModelerStatisticsscoring IBMMlib
Rvariance decission treealgorithm
regressiondistributionpropensity accuracybino...
SparkSPSS ModelerStatisticsscoring IBMMlib
Rvariance decission treealgorithm
regressiondistributionpropensity accuracybino...
SparkSPSS ModelerStatisticsscoring IBMMlib
Rvariance decission treealgorithm
regressiondistributionpropensity accuracybino...
SparkSPSS ModelerStatisticsscoring IBMMlib
Rvariance decission treealgorithm
regressiondistributionpropensity accuracybino...
SparkSPSS ModelerStatisticsscoring IBMMlib
Rvariance decission treealgorithm
regressiondistributionpropensity accuracybino...
SparkSPSS ModelerStatisticsscoring IBMMlib
Rvariance decission treealgorithm
regressiondistributionpropensity accuracybino...
Upcoming SlideShare
Loading in …5
×

VII Jornadas eMadrid "Education in exponential times". "Uso de IBM Analytics para aprender a tomar mejores decisiones". Ramiro Regó Álvarez. 05/07/2017.

149 views

Published on

VII Jornadas eMadrid "Education in exponential times". "Uso de IBM Analytics para aprender a tomar mejores decisiones". Ramiro Regó Álvarez. 05/07/2017.

Published in: Education
  • Be the first to comment

  • Be the first to like this

VII Jornadas eMadrid "Education in exponential times". "Uso de IBM Analytics para aprender a tomar mejores decisiones". Ramiro Regó Álvarez. 05/07/2017.

  1. 1. SparkSPSS ModelerStatisticsscoring IBMMlib Rvariance decission treealgorithm regressiondistributionpropensity accuracybinomial variable Stratified sample Analytic Server Hadoop Map/ReduceGini Weibull PCA Spark SPSS Modeler Statistics scoring IBMMlib R variance decission tree algorithm regression distribution propensity accuracy binomial Stratified sample Analytic Server Hadoop Map/Reduce Gini Weibull PCA gamma Montecarlo decision management neural network type I error cluster K-means SQL learning machine learning Using IBM Analytics to help learn taking better decisions
  2. 2. SparkSPSS ModelerStatisticsscoring IBMMlib Rvariance decission treealgorithm regressiondistributionpropensity accuracybinomial variable Stratified sample Analytic Server Hadoop Map/ReduceGini Weibull PCA Spark SPSS Modeler Statistics scoring IBMMlib Rvariance decission tree algorithm regressiondistribution propensity accuracy binomialStratified sample Analytic Server Hadoop Map/Reduce Gini Weibull PCAgamma What is Data Science and what is not
  3. 3. SparkSPSS ModelerStatisticsscoring IBMMlib Rvariance decission treealgorithm regressiondistributionpropensity accuracybinomial variable Stratified sample Analytic Server Hadoop Map/ReduceGini Weibull PCA CRISP-DM Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment Cross Industry Standard Process for Data Mining, commonly known by its acronym CRISP-DM, is a data mining process model that describes commonly used approaches that data mining experts use to tackle problems. In other words is common practice (and common sense) put in a diagram. But is it as simple as it seems? Can we just walk in and start mining? How I am sure I got the right tool ?
  4. 4. SparkSPSS ModelerStatisticsscoring IBMMlib Rvariance decission treealgorithm regressiondistributionpropensity accuracybinomial variable Stratified sample Analytic Server Hadoop Map/ReduceGini Weibull PCA Simple case: One variable. Straightforward? Source https://clevertap.com/blog/the-fallacy-of-seeing-patterns/ From the shape of the histogram, it seems the distribution is left-skewed, but does it picture the entire story? The data is represented on 5 intervals between 35 and 85. A little over 45% of the observations are in the interval – 65 to 75. What if we change the number of intervals from the current 5 to something higher that could give a better distribution of data among the intervals? 0 20 40 60 80 100 120 140 160 35 45 55 65 75 More FREQUENCY X Histogram Frequency
  5. 5. SparkSPSS ModelerStatisticsscoring IBMMlib Rvariance decission treealgorithm regressiondistributionpropensity accuracybinomial variable Stratified sample Analytic Server Hadoop Map/ReduceGini Weibull PCA Not that much Source https://clevertap.com/blog/the-fallacy-of-seeing-patterns/ 0 10 20 30 40 50 60 FREQUENCY X Histogram Frequency In this histogram, each interval is of size 3 approximately. There seems to be a change in the shape of the distribution now. The original inference of left-skewed distribution is now replaced with a shape that has 2 peaks.
  6. 6. SparkSPSS ModelerStatisticsscoring IBMMlib Rvariance decission treealgorithm regressiondistributionpropensity accuracybinomial variable Stratified sample Analytic Server Hadoop Map/ReduceGini Weibull PCA This is correlated, does it mean one causes the other? 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Divorce rate in Maine 5 4,7 4,6 4,4 4,3 4,1 4,2 4,2 4,2 4,1 Per capita consumption of margarine 8,2 7 6,5 5,3 5,2 4 4,6 4,5 4,2 3,7 Correlation: 0,992558 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Per capita consumption of chicken 54,2 54 56,8 57,5 59,3 60,5 60,9 59,9 58,7 56 Total US crude oil imports 3,311 3,405 3,336 3,521 3,674 3,67 3,685 3,656 3,571 3,307 Correlation: 0,899899 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Number of people who died by becoming tangled in their bedsheets 327 456 509 497 596 573 661 741 809 717 Total revenue generated by skiing facilities 1,551 1,635 1,801 1,827 1,956 1,989 2,178 2,257 2,476 2,438 Correlation: 0,969724 Source: http://tylervigen.com/spurious-correlations
  7. 7. SparkSPSS ModelerStatisticsscoring IBMMlib Rvariance decission treealgorithm regressiondistributionpropensity accuracybinomial variable Stratified sample Analytic Server Hadoop Map/ReduceGini Weibull PCA Correlation does not mean causation That correlation proves causation, is considered a questionable cause logical fallacy when two events occurring together are taken to have established a cause-and-effect relationship. This fallacy is also known as cum hoc ergo propter hoc, Latin for with this, therefore because of this, and "false cause." A similar fallacy, that an event that followed another was necessarily a consequence of the first event, is the post hoc ergo propter hoc , Latin for after this, therefore because of this. fallacy. Source: http://tylervigen.com/spurious-correlations & Wikipedia
  8. 8. SparkSPSS ModelerStatisticsscoring IBMMlib Rvariance decission treealgorithm regressiondistributionpropensity accuracybinomial variable Stratified sample Analytic Server Hadoop Map/ReduceGini Weibull PCA Need training (who doesn’t)? No problem.
  9. 9. SparkSPSS ModelerStatisticsscoring IBMMlib Rvariance decission treealgorithm regressiondistributionpropensity accuracybinomial variable Stratified sample Analytic Server Hadoop Map/ReduceGini Weibull PCA Specifics on Data Science
  10. 10. SparkSPSS ModelerStatisticsscoring IBMMlib Rvariance decission treealgorithm regressiondistributionpropensity accuracybinomial variable Stratified sample Analytic Server Hadoop Map/ReduceGini Weibull PCA Bigdata with Hortonworks
  11. 11. SparkSPSS ModelerStatisticsscoring IBMMlib Rvariance decission treealgorithm regressiondistributionpropensity accuracybinomial variable Stratified sample Analytic Server Hadoop Map/ReduceGini Weibull PCA Try Watson Analytics for free
  12. 12. SparkSPSS ModelerStatisticsscoring IBMMlib Rvariance decission treealgorithm regressiondistributionpropensity accuracybinomial variable Stratified sample Analytic Server Hadoop Map/ReduceGini Weibull PCA Watson Analytics
  13. 13. SparkSPSS ModelerStatisticsscoring IBMMlib Rvariance decission treealgorithm regressiondistributionpropensity accuracybinomial variable Stratified sample Analytic Server Hadoop Map/ReduceGini Weibull PCA Cognitive classes
  14. 14. SparkSPSS ModelerStatisticsscoring IBMMlib Rvariance decission treealgorithm regressiondistributionpropensity accuracybinomial variable Stratified sample Analytic Server Hadoop Map/ReduceGini Weibull PCA Train yourself …
  15. 15. SparkSPSS ModelerStatisticsscoring IBMMlib Rvariance decission treealgorithm regressiondistributionpropensity accuracybinomial variable Stratified sample Analytic Server Hadoop Map/ReduceGini Weibull PCA …for free …
  16. 16. SparkSPSS ModelerStatisticsscoring IBMMlib Rvariance decission treealgorithm regressiondistributionpropensity accuracybinomial variable Stratified sample Analytic Server Hadoop Map/ReduceGini Weibull PCA … with a free and open environment available
  17. 17. SparkSPSS ModelerStatisticsscoring IBMMlib Rvariance decission treealgorithm regressiondistributionpropensity accuracybinomial variable Stratified sample Analytic Server Hadoop Map/ReduceGini Weibull PCA Spark SPSS Modeler Statistics scoring IBMMlib Rvariance decission tree algorithm regressiondistribution propensity accuracy binomialStratified sample Analytic Server Hadoop Map/Reduce Gini Weibull PCAgamma I understood IBM is with open source, but we need support and guarantee Any commercial software?
  18. 18. SparkSPSS ModelerStatisticsscoring IBMMlib Rvariance decission treealgorithm regressiondistributionpropensity accuracybinomial variable Stratified sample Analytic Server Hadoop Map/ReduceGini Weibull PCA Data Science & Machine Learning A product for each of the techniques / user profile Data Science Experience SPSS Decision Optimization Machine Learning Watson Analytics IBM is making data science and machine learning simple and open.
  19. 19. SparkSPSS ModelerStatisticsscoring IBMMlib Rvariance decission treealgorithm regressiondistributionpropensity accuracybinomial variable Stratified sample Analytic Server Hadoop Map/ReduceGini Weibull PCA Data Science Experience
  20. 20. SparkSPSS ModelerStatisticsscoring IBMMlib Rvariance decission treealgorithm regressiondistributionpropensity accuracybinomial variable Stratified sample Analytic Server Hadoop Map/ReduceGini Weibull PCA IBM SPSS Product portfolio IBM SPSS Modeler Gold IBM SPSS Modeler Professional IBM SPSS Analytic Server IBM SPSS Statistics IBM SPSS C&DS IBM SPSS Decision Management IBM SPSS Modeler Premium
  21. 21. SparkSPSS ModelerStatisticsscoring IBMMlib Rvariance decission treealgorithm regressiondistributionpropensity accuracybinomial variable Stratified sample Analytic Server Hadoop Map/ReduceGini Weibull PCA Doubts, concerns, questions, suggestions? Let me know: ramiro.rego@es.ibm.com Thanks!

×