SlideShare a Scribd company logo
1 of 12
CVC TechParty


A practical Introduction to 
Machine Learning in Python


                         Piero Casale



           
CVC TechParty




www.ailab.si/orange/

                        
CVC TechParty



   Load-In Data and Basic Data Exploration
- Loading Data:
      iris = orange.ExampleTable('iris.tab')


- Exploring Features and Examples
       iris.domain.attributes
       iris.domain.classVar.name


- Basic Dataset Characteristics
       GetDatasetStatistics()


- Dataset Formats in Orange:
       csv, txt, xls


- Dataset as Python Lists:
      indexing, append, extend, native




                                 
CVC TechParty


                   Dataset Visualization
- Multi Dimensional Scaling:
     MultiDimensional Scaling Functions in orngMDS




                              
CVC TechParty



My First Classifier in Orange : Bayes




              
CVC TechParty



        My First Classifier in Orange : Bayes
- Loading Data:
        iris = orange.ExampleTable('iris.tab')


-   Declare the Learning Function:
        bayes = orange.BayesLearner()


- Train the Bayes Classifier on Data:
       BayesClassifier = bayes(iris)


- Classify new data:
       Prediction = bayesClassifier(newExample)


_ Example on Iris Dataset:
       exCodes.showBayes()



                                
CVC TechParty


        My (Second) Classifier in Orange :
                Decision Trees
- As before:
      import orngTree
      treeLearner = orngTree.TreeLearner()
      treeClassifier = treeLearner(iris)
      prediction = treeClassifier(newExample)


_ Measures for splitting : infoGain, gainRatio, gini
       treeLearner = orngTree.TreeLearner(measure='gini')


- Print the Tree:
  - on screen : orngTree.printTree(treeClassifier)
  - save as an image :
      orngTree.printDot(treeClassifier, fileName='tree.dot')
      dot -Tpng tree.dot -otree.png




                                
CVC TechParty



        Testing and Evaluating a Classifier
- Testing Functions in orngTest
      import orngTest
      learners = [bayesLearner, treeLearner]


- Make a 10 folds Cross Validation
      xv = orngTest.crossValidation(learners, data, folds=10)


- Scores Functions in orngStat
      import orngStat
      accuracy = orngStat.CA(xv)
      confusionMatrix = orngStat.cm(xv)


- Example on Iris Dataset using Bayes, DecisionTree and Knn.
      exCodes.crossValidate()




                               
CVC TechParty


                    Ensemble Methods
- Basic Ensemble Methods in orngEnsemble
      Bagging, Boosting and Random Forest
      import orngEnsemble


- Bagging of Decision Trees
      treeLearner = orngTree.TreeLearner()
      baggedTrees = orngEnsemble.BaggedLearner(treeLearner, t=10)


- Boosting of Decision Trees
      treeLearner = orngTree.TreeLearner()
      boostedTrees = orngEnsemble.BoostedLearner(treeLearner, t=10)


- Random Forest
      forest = orngEnsemble.RandomForestLearner(trees = 10)


- Example on Iris Dataset:
      exCodes.crossValidateEnsembles()


                             
CVC TechParty



                      Features Selection
- Functions for Features Selectoin in orngFSS
    import orngFSS
    vehicle = orange.ExampleTable('vehicle.tab')


-   Measuring Import of features with Information Gain
    measures = orngFSS.attMeasure(vehicle)
    TenBests = orngFSS.bestNAtts(measures,n=10)


-   Measuring Import of features with Gain Ratio
    gainRatio = orange.MeasureAttribute_gainRatio()
    measures = orngFSS.attMeasure(vehicle,gainRatio)
    fiveBests = orngFSS.bestNAtts(measures,n=5)


- Example on Vehicle Dataset:
       exCodes.measureAttributes()


                               
CVC TechParty

                             More.....
- Supervised Learning Algorithms:
           orngSVM,orngLR,orngC45
- Unsupervised Learning Algorithm :
           orngClustering
- Reinforcement Learning :
           orngReinforcement
- Outlier Detection :
           orngOutlier
- Discretization Functions :
           orngDisc




                          
CVC TechParty




      Enjoy.....
    More at www.ailab.si/orange




                                  Piero Casale



            

More Related Content

Similar to A practical Introduction to Machine Learning in Python

Stat Design3 18 09
Stat Design3 18 09Stat Design3 18 09
Stat Design3 18 09
stat
 
Using R on Netezza
Using R on NetezzaUsing R on Netezza
Using R on Netezza
Ajay Ohri
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Olivier Grisel
 
Apache Spark for Cyber Security in an Enterprise Company
Apache Spark for Cyber Security in an Enterprise CompanyApache Spark for Cyber Security in an Enterprise Company
Apache Spark for Cyber Security in an Enterprise Company
Databricks
 

Similar to A practical Introduction to Machine Learning in Python (20)

Visualization of Supervised Learning with {arules} + {arulesViz}
Visualization of Supervised Learning with {arules} + {arulesViz}Visualization of Supervised Learning with {arules} + {arulesViz}
Visualization of Supervised Learning with {arules} + {arulesViz}
 
Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...
Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...
Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science Project
 
Computational decision making
Computational decision makingComputational decision making
Computational decision making
 
Eugene Khvedchenya. State of the art Image Augmentations with Albumentations.
Eugene Khvedchenya. State of the art Image Augmentations with Albumentations.Eugene Khvedchenya. State of the art Image Augmentations with Albumentations.
Eugene Khvedchenya. State of the art Image Augmentations with Albumentations.
 
Stat Design3 18 09
Stat Design3 18 09Stat Design3 18 09
Stat Design3 18 09
 
Scaling Deep Learning Algorithms on Extreme Scale Architectures
Scaling Deep Learning Algorithms on Extreme Scale ArchitecturesScaling Deep Learning Algorithms on Extreme Scale Architectures
Scaling Deep Learning Algorithms on Extreme Scale Architectures
 
Using R on Netezza
Using R on NetezzaUsing R on Netezza
Using R on Netezza
 
2012 8 29 TAR Webinar Part 2 Sigler
2012 8 29 TAR Webinar Part 2 Sigler2012 8 29 TAR Webinar Part 2 Sigler
2012 8 29 TAR Webinar Part 2 Sigler
 
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
 
Jupyter Notebooks for machine learning on Kubernetes & OpenShift | DevNation ...
Jupyter Notebooks for machine learning on Kubernetes & OpenShift | DevNation ...Jupyter Notebooks for machine learning on Kubernetes & OpenShift | DevNation ...
Jupyter Notebooks for machine learning on Kubernetes & OpenShift | DevNation ...
 
Introduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntroduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at Galvanize
 
C3 w2
C3 w2C3 w2
C3 w2
 
Training Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in SparkTraining Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in Spark
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)
 
Machine Learning and Go. Go!
Machine Learning and Go. Go!Machine Learning and Go. Go!
Machine Learning and Go. Go!
 
Apache Spark for Cyber Security in an Enterprise Company
Apache Spark for Cyber Security in an Enterprise CompanyApache Spark for Cyber Security in an Enterprise Company
Apache Spark for Cyber Security in an Enterprise Company
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
 

Recently uploaded

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

A practical Introduction to Machine Learning in Python

  • 3. CVC TechParty Load-In Data and Basic Data Exploration - Loading Data: iris = orange.ExampleTable('iris.tab') - Exploring Features and Examples iris.domain.attributes iris.domain.classVar.name - Basic Dataset Characteristics GetDatasetStatistics() - Dataset Formats in Orange: csv, txt, xls - Dataset as Python Lists: indexing, append, extend, native    
  • 4. CVC TechParty Dataset Visualization - Multi Dimensional Scaling: MultiDimensional Scaling Functions in orngMDS    
  • 5. CVC TechParty My First Classifier in Orange : Bayes    
  • 6. CVC TechParty My First Classifier in Orange : Bayes - Loading Data: iris = orange.ExampleTable('iris.tab') - Declare the Learning Function: bayes = orange.BayesLearner() - Train the Bayes Classifier on Data: BayesClassifier = bayes(iris) - Classify new data: Prediction = bayesClassifier(newExample) _ Example on Iris Dataset: exCodes.showBayes()    
  • 7. CVC TechParty My (Second) Classifier in Orange : Decision Trees - As before: import orngTree treeLearner = orngTree.TreeLearner() treeClassifier = treeLearner(iris) prediction = treeClassifier(newExample) _ Measures for splitting : infoGain, gainRatio, gini treeLearner = orngTree.TreeLearner(measure='gini') - Print the Tree: - on screen : orngTree.printTree(treeClassifier) - save as an image : orngTree.printDot(treeClassifier, fileName='tree.dot') dot -Tpng tree.dot -otree.png    
  • 8. CVC TechParty Testing and Evaluating a Classifier - Testing Functions in orngTest import orngTest learners = [bayesLearner, treeLearner] - Make a 10 folds Cross Validation xv = orngTest.crossValidation(learners, data, folds=10) - Scores Functions in orngStat import orngStat accuracy = orngStat.CA(xv) confusionMatrix = orngStat.cm(xv) - Example on Iris Dataset using Bayes, DecisionTree and Knn. exCodes.crossValidate()    
  • 9. CVC TechParty Ensemble Methods - Basic Ensemble Methods in orngEnsemble Bagging, Boosting and Random Forest import orngEnsemble - Bagging of Decision Trees treeLearner = orngTree.TreeLearner() baggedTrees = orngEnsemble.BaggedLearner(treeLearner, t=10) - Boosting of Decision Trees treeLearner = orngTree.TreeLearner() boostedTrees = orngEnsemble.BoostedLearner(treeLearner, t=10) - Random Forest forest = orngEnsemble.RandomForestLearner(trees = 10) - Example on Iris Dataset: exCodes.crossValidateEnsembles()    
  • 10. CVC TechParty Features Selection - Functions for Features Selectoin in orngFSS import orngFSS vehicle = orange.ExampleTable('vehicle.tab') - Measuring Import of features with Information Gain measures = orngFSS.attMeasure(vehicle) TenBests = orngFSS.bestNAtts(measures,n=10) - Measuring Import of features with Gain Ratio gainRatio = orange.MeasureAttribute_gainRatio() measures = orngFSS.attMeasure(vehicle,gainRatio) fiveBests = orngFSS.bestNAtts(measures,n=5) - Example on Vehicle Dataset: exCodes.measureAttributes()    
  • 11. CVC TechParty More..... - Supervised Learning Algorithms: orngSVM,orngLR,orngC45 - Unsupervised Learning Algorithm : orngClustering - Reinforcement Learning : orngReinforcement - Outlier Detection : orngOutlier - Discretization Functions : orngDisc    
  • 12. CVC TechParty Enjoy..... More at www.ailab.si/orange Piero Casale