[Video recording available at https://www.youtube.com/playlist?list=PLewjn-vrZ7d3x0M4Uu_57oaJPRXkiS221]
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We present case studies across different companies, spanning application domains such as search & recommendation systems, hiring, sales, and lending. Finally, based on our experiences in industry, we identify open problems and research directions for the data mining / machine learning community.
Security and Privacy of Machine LearningPriyanka Aash
Machine learning is a powerful new tool that can be used for security applications (for example, to detect malware) but machine learning itself introduces many new attack surfaces. For example, attackers can control the output of machine learning models by manipulating their inputs or training data. In this session, I give an overview of the emerging field of machine learning security and privacy.
Learning Objectives:
1: Learn about vulnerabilities of machine learning.
2: Explore existing defense techniques (differential privacy).
3: Understand opportunities to join research effort to make new defenses.
(Source: RSA Conference USA 2018)
Human-Centered AI: Scalable, Interactive Tools for Interpretation and Attribu...polochau
Artificial intelligence and machine learning models are growing increasingly available, but many models offer predictions that are difficult to understand, evaluate and ultimately act upon. We present how scalable interactive visualization can be used to amplify people’s ability to understand and interact with large-scale data and complex models. We sample from projects where interactive visualization has provided key leaps of insight, from increased model explorability with models trained on millions of instances (ActiVis deployed with Facebook), increased usability for non-experts about state-of-the-art AI (GAN Lab open-sourced with Google Brain; went viral!), and our latest work Summit, the first interactive system that scalably summarizes and visualizes what features a deep learning model has learned and how those features interact to make predictions. We conclude by highlighting the next visual analytics research frontiers in AI.
[Video recording available at https://www.youtube.com/playlist?list=PLewjn-vrZ7d3x0M4Uu_57oaJPRXkiS221]
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We present case studies across different companies, spanning application domains such as search & recommendation systems, hiring, sales, and lending. Finally, based on our experiences in industry, we identify open problems and research directions for the data mining / machine learning community.
Security and Privacy of Machine LearningPriyanka Aash
Machine learning is a powerful new tool that can be used for security applications (for example, to detect malware) but machine learning itself introduces many new attack surfaces. For example, attackers can control the output of machine learning models by manipulating their inputs or training data. In this session, I give an overview of the emerging field of machine learning security and privacy.
Learning Objectives:
1: Learn about vulnerabilities of machine learning.
2: Explore existing defense techniques (differential privacy).
3: Understand opportunities to join research effort to make new defenses.
(Source: RSA Conference USA 2018)
Human-Centered AI: Scalable, Interactive Tools for Interpretation and Attribu...polochau
Artificial intelligence and machine learning models are growing increasingly available, but many models offer predictions that are difficult to understand, evaluate and ultimately act upon. We present how scalable interactive visualization can be used to amplify people’s ability to understand and interact with large-scale data and complex models. We sample from projects where interactive visualization has provided key leaps of insight, from increased model explorability with models trained on millions of instances (ActiVis deployed with Facebook), increased usability for non-experts about state-of-the-art AI (GAN Lab open-sourced with Google Brain; went viral!), and our latest work Summit, the first interactive system that scalably summarizes and visualizes what features a deep learning model has learned and how those features interact to make predictions. We conclude by highlighting the next visual analytics research frontiers in AI.
Introductory presentation to Explainable AI, defending its main motivations and importance. We describe briefly the main techniques available in March 2020 and share many references to allow the reader to continue his/her studies.
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
This is the documentation of the study-meeting in lab.
Tha book title is "Hands-On Machine Learning with Scikit-Learn and TensorFlow" and this is the chapter 8.
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Simplilearn
This presentation on "Supervised and Unsupervised Learning" will help you understand what is machine learning, what are the types of Machine learning, what is supervised machine learning, types of supervised machine learning, what is unsupervised learning, types of unsupervised learning and what are the differences between supervised and unsupervised machine learning. In supervised learning, the model learns from a labeled data whereas in unsupervised learning, model trains itself on unlabeled data. Now, let us get started and understand supervised and unsupervised learning and how they are different from each other.
Below are the topics explained in this supervised and unsupervised learning in Machine Learning presentation-
1. What is Machine Learning
- Types of Machine Learning
- Supervised Learning
- Unsupervised Learning
2. Supervised Learning
- Types of Supervised Learning
3. Unsupervised Learning
- Types of Unsupervised Learning
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars. This Machine Learning course prepares engineers, data scientists and other professionals with the knowledge and hands-on skills required for certification and job competency in Machine Learning.
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire a thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, naive Bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
Learn more at: https://www.simplilearn.com/
Explainable AI (XAI) is becoming Must-Have NFR for most AI enabled product or solution deployments. Keen to know viewpoints and collaboration opportunities.
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete DeckSlideTeam
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck is loaded with easy-to-follow content, and intuitive design. Introduce the types and levels of artificial intelligence using the highly-effective visuals featured in this PPT slide deck. Showcase the AI-subfield of machine learning, as well as deep learning through our comprehensive PowerPoint theme. Represent the differences, and interrelationship between AI, ML, and DL. Elaborate on the scope and use case of machine intelligence in healthcare, HR, banking, supply chain, or any other industry. Take advantage of the infographic-style layout to describe why AI is flourishing in today’s day and age. Elucidate AI trends such as robotic process automation, advanced cybersecurity, AI-powered chatbots, and more. Cover all the essentials of machine learning and deep learning with the help of this PPT slideshow. Outline the application, algorithms, use cases, significance, and selection criteria for machine learning. Highlight the deep learning process, types, limitations, and significance. Describe reinforcement training, neural network classifications, and a lot more. Hit download and begin personalization. Our AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck are topically designed to provide an attractive backdrop to any subject. Use them to look like a presentation pro. https://bit.ly/3ngJCKf
Use of Artificial Intelligence in Cyber Security - Avantika UniversityAvantika University
There are many uses of artificial intelligence in cyber security. Although artificial intelligence has so many advantages over human intelligence, it is dependent on humans. Due to the ever-increasing demand for engineers, there is a bright scope in the field of cyber security. Avantika University is one of the top engineering colleges in India.
To know more details, visit us at : https://www.avantikauniversity.edu.in/engineering-colleges/use-of-artificial-intelligence-in-cyber-security.php
An Introduction to XAI! Towards Trusting Your ML Models!Mansour Saffar
Machine learning (ML) is currently disrupting almost every industry and is being used as the core component in many systems. The decisions made by these systems may have a great impact on society and specific individuals and thus the decision-making process has to be clear and explainable so humans can trust it. Explainable AI (XAI) is a rather new field in ML in which researchers try to develop models that are able to explain the decision-making process behind ML models. In this talk, we'll learn about the fundamentals of XAI and discuss why we need to start to integrate XAI with our ML models!
Presented in Edmonton DataScience Meetup on October 2nd, 2019. Learn more: https://youtu.be/gEkPXOsDt_w
Organizations are collecting massive amounts of data from disparate sources. However, they continuously face the challenge of identifying patterns, detecting anomalies, and projecting future trends based on large data sets. Machine learning for anomaly detection provides a promising alternative for the detection and classification of anomalies.
Find out how you can implement machine learning to increase speed and effectiveness in identifying and reporting anomalies.
In this webinar, we will discuss :
How machine learning can help in identifying anomalies
Steps to approach an anomaly detection problem
Various techniques available for anomaly detection
Best algorithms that fit in different situations
Implementing an anomaly detection use case on the StreamAnalytix platform
To view the webinar - https://bit.ly/2IV2ahC
Overview of Artificial Intelligence in CybersecurityOlivier Busolini
If you are interested in understsanding a bit more the potential of Artifical Intelligence in Cybersecurity, you might want to have a look at this overview.
Written from my CISO -and non AI expert- point of view, for fellow security professional to navigate the AI hype, and (hopefully!) make better, informed decisions :-)
All feedback welcome !
Machine Learning and Real-World ApplicationsMachinePulse
This presentation was created by Ajay, Machine Learning Scientist at MachinePulse, to present at a Meetup on Jan. 30, 2015. These slides provide an overview of widely used machine learning algorithms. The slides conclude with examples of real world applications.
Ajay Ramaseshan, is a Machine Learning Scientist at MachinePulse. He holds a Bachelors degree in Computer Science from NITK, Suratkhal and a Master in Machine Learning and Data Mining from Aalto University School of Science, Finland. He has extensive experience in the machine learning domain and has dealt with various real world problems.
Dr Murari Mandal from NUS presented as part of 3 days OpenPOWER Industry summit about Robustness in Deep learning where he talked about AI Breakthroughs , Performance improments in AI models , Adversarial attacks , Attacks on semantic segmentation , Attacs on object detector , Defending Against adversarial attacks and many other areas.
Spark 2019: Equifax's SVP Data & Analytics, Peter Maynard, discusses the notion (and importance) of explainable AI in the financial services sector. He looks at the work Equifax have done to crack open the black box by creating patented AI technology that helps companies make smarter, explainable decisions using AI.
Introductory presentation to Explainable AI, defending its main motivations and importance. We describe briefly the main techniques available in March 2020 and share many references to allow the reader to continue his/her studies.
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
This is the documentation of the study-meeting in lab.
Tha book title is "Hands-On Machine Learning with Scikit-Learn and TensorFlow" and this is the chapter 8.
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Simplilearn
This presentation on "Supervised and Unsupervised Learning" will help you understand what is machine learning, what are the types of Machine learning, what is supervised machine learning, types of supervised machine learning, what is unsupervised learning, types of unsupervised learning and what are the differences between supervised and unsupervised machine learning. In supervised learning, the model learns from a labeled data whereas in unsupervised learning, model trains itself on unlabeled data. Now, let us get started and understand supervised and unsupervised learning and how they are different from each other.
Below are the topics explained in this supervised and unsupervised learning in Machine Learning presentation-
1. What is Machine Learning
- Types of Machine Learning
- Supervised Learning
- Unsupervised Learning
2. Supervised Learning
- Types of Supervised Learning
3. Unsupervised Learning
- Types of Unsupervised Learning
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars. This Machine Learning course prepares engineers, data scientists and other professionals with the knowledge and hands-on skills required for certification and job competency in Machine Learning.
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire a thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, naive Bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
Learn more at: https://www.simplilearn.com/
Explainable AI (XAI) is becoming Must-Have NFR for most AI enabled product or solution deployments. Keen to know viewpoints and collaboration opportunities.
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete DeckSlideTeam
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck is loaded with easy-to-follow content, and intuitive design. Introduce the types and levels of artificial intelligence using the highly-effective visuals featured in this PPT slide deck. Showcase the AI-subfield of machine learning, as well as deep learning through our comprehensive PowerPoint theme. Represent the differences, and interrelationship between AI, ML, and DL. Elaborate on the scope and use case of machine intelligence in healthcare, HR, banking, supply chain, or any other industry. Take advantage of the infographic-style layout to describe why AI is flourishing in today’s day and age. Elucidate AI trends such as robotic process automation, advanced cybersecurity, AI-powered chatbots, and more. Cover all the essentials of machine learning and deep learning with the help of this PPT slideshow. Outline the application, algorithms, use cases, significance, and selection criteria for machine learning. Highlight the deep learning process, types, limitations, and significance. Describe reinforcement training, neural network classifications, and a lot more. Hit download and begin personalization. Our AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck are topically designed to provide an attractive backdrop to any subject. Use them to look like a presentation pro. https://bit.ly/3ngJCKf
Use of Artificial Intelligence in Cyber Security - Avantika UniversityAvantika University
There are many uses of artificial intelligence in cyber security. Although artificial intelligence has so many advantages over human intelligence, it is dependent on humans. Due to the ever-increasing demand for engineers, there is a bright scope in the field of cyber security. Avantika University is one of the top engineering colleges in India.
To know more details, visit us at : https://www.avantikauniversity.edu.in/engineering-colleges/use-of-artificial-intelligence-in-cyber-security.php
An Introduction to XAI! Towards Trusting Your ML Models!Mansour Saffar
Machine learning (ML) is currently disrupting almost every industry and is being used as the core component in many systems. The decisions made by these systems may have a great impact on society and specific individuals and thus the decision-making process has to be clear and explainable so humans can trust it. Explainable AI (XAI) is a rather new field in ML in which researchers try to develop models that are able to explain the decision-making process behind ML models. In this talk, we'll learn about the fundamentals of XAI and discuss why we need to start to integrate XAI with our ML models!
Presented in Edmonton DataScience Meetup on October 2nd, 2019. Learn more: https://youtu.be/gEkPXOsDt_w
Organizations are collecting massive amounts of data from disparate sources. However, they continuously face the challenge of identifying patterns, detecting anomalies, and projecting future trends based on large data sets. Machine learning for anomaly detection provides a promising alternative for the detection and classification of anomalies.
Find out how you can implement machine learning to increase speed and effectiveness in identifying and reporting anomalies.
In this webinar, we will discuss :
How machine learning can help in identifying anomalies
Steps to approach an anomaly detection problem
Various techniques available for anomaly detection
Best algorithms that fit in different situations
Implementing an anomaly detection use case on the StreamAnalytix platform
To view the webinar - https://bit.ly/2IV2ahC
Overview of Artificial Intelligence in CybersecurityOlivier Busolini
If you are interested in understsanding a bit more the potential of Artifical Intelligence in Cybersecurity, you might want to have a look at this overview.
Written from my CISO -and non AI expert- point of view, for fellow security professional to navigate the AI hype, and (hopefully!) make better, informed decisions :-)
All feedback welcome !
Machine Learning and Real-World ApplicationsMachinePulse
This presentation was created by Ajay, Machine Learning Scientist at MachinePulse, to present at a Meetup on Jan. 30, 2015. These slides provide an overview of widely used machine learning algorithms. The slides conclude with examples of real world applications.
Ajay Ramaseshan, is a Machine Learning Scientist at MachinePulse. He holds a Bachelors degree in Computer Science from NITK, Suratkhal and a Master in Machine Learning and Data Mining from Aalto University School of Science, Finland. He has extensive experience in the machine learning domain and has dealt with various real world problems.
Dr Murari Mandal from NUS presented as part of 3 days OpenPOWER Industry summit about Robustness in Deep learning where he talked about AI Breakthroughs , Performance improments in AI models , Adversarial attacks , Attacks on semantic segmentation , Attacs on object detector , Defending Against adversarial attacks and many other areas.
Spark 2019: Equifax's SVP Data & Analytics, Peter Maynard, discusses the notion (and importance) of explainable AI in the financial services sector. He looks at the work Equifax have done to crack open the black box by creating patented AI technology that helps companies make smarter, explainable decisions using AI.
Osmangazi Üniversitesinde Ders Notu olarak kullanılan temel bir matlab kilavuzudur. Hedef kitlesi matematikçiler olup, matlaba yeni başlayan mühendislere de yol gösterebilir.
Anlatımda kavramlar üzerine çok durulmamıştır, adından anlaşılacağı üzere en sade hali ile verilmeye çalışılmıştır.
Yararlı olması dileği ile
Muhammet ÇAĞATAY
http://muhammetcagatay.com/
Çalışmada ilk olarak WEKA programı, veri madenciliği ve LABOR veri seti hakkında gerekli bilgilerden bahsedilmiştir. Ayrıca veri madenciliği başlığı altında veri, veri tabanı ve veri ambarı ile ilgili detaylı bilgi eklenmiştir. LABOR veri seti incelenmiş, içermiş olduğu özniteliklerle ilgili detaylı bilgi verilmiş ve veri seti üzerinde en yaygın kullanılan sınıflandırma algoritmalarından olan OneR, ZeroR, Naive Bayes algoritmaları uygulanmış ve karşılaştırılmıştır.
* What is Engineering?
* Who is an Engineer?
* The reasons to become an Engineer
* What is Software Engineering?
* Software Engineering: History
* The principles of Software Engineering
* Who is a Software Engineer?
* The reasons to become Software Engineer
* Requirements of being Software Engineer
* The Areas of Software Engineers
* The working areas of Software Engineers
* Difference between Computer Science and Software Engineering
* Pros and Cons of being Software Engineer
* A Software Engineer Responsibilities
* The Most Popular Software Development Methodologies(Waterfall, Rapid Application, Agile and DevOps) Development Methodology
* Version control
* Centralized Version Control
* What is Business Analysis?
* Who is a Business Analyst?
* The reasons to become a Business Analyst
* The principles of Business Analysis
* Business Analyst’s role
* S.W.O.T and M.O.S.T Analysis
* Requirements of being Business Analyst
* Business Analysts’ work
* Business Analysts’ workplaces
* Difference between Data Scientist and a Business Analyst
* Analysis work
* What is a Software?
* What is Software Testing?
* Software Testing History
* The principles of Software Testing
* Who is a Software Tester?
* The requirements of being Software Tester
* The principles of testing
* What Do Software Testers Do?
* The difference between Software Developers and Software Testers
* A Software Tester Responsibilities
* The relation of Testing and Quality Assurance
* Why is a Software Defect Called a Bug?
* Why does Software have Defects?
* Automated Testing
* Manual Testing
5. Yapay Zeka Nedir?
AI kısaltmasıyla da ifade edilen Yapay Zeka,
görevleri yerine getirmek için insan zekasını
taklit eden ve topladığı bilgilerle kendisini
kademeli olarak geliştirebilen sistemler veya
makineler anlamına gelir [1].
Makine Öğrenimi Nedir?
Makine öğrenimi (ML), tükettikleri verilere
göre öğrenen ya da performansı iyileştiren
sistemler oluşturmaya odaklanan bir yapay
zeka (AI) alt kümesidir. Yapay zeka, insan
zekasını taklit eden sistemler veya
makineler anlamına gelen kapsamlı bir
terimdir [2].
7. Anaconda Programı
Anaconda ücretsiz ve açık kaynaklı, Python
ve R programlama dillerinin bilimsel
hesaplama kullanımında paket yönetimini
kolaylaştırmayı amaçlayan bir özgür ve açık
kaynaklı dağıtımdır. Paket sürümleri conda
paket yönetim sistemi ile yönetilir.
Anaconda dağıtımı Windows, Linux ve
MacOS işletim sistemlerinde kullanılabilen
veri bilimi paketleri içerir [3].
8. Spyder Yazılımı
Spyder, Python dilinde bilimsel
programlama için açık kaynaklı bir çapraz
platform entegre geliştirme ortamıdır.
Spyder, bilimsel Python yığınındaki NumPy,
SciPy, Matplotlib, pandalar, IPython, SymPy
ve Cython ve diğer açık kaynaklı yazılımlar
dahil olmak üzere bir dizi önde gelen
paketle entegre olur [4].
9. VERİ SETİ
• Veri Seti Tanıtımı
• Veri Setinin Spyder Yazılımına Aktarılması
• Veri Seti ile İlgili Bilgiler
03
10. Veri Seti Tanıtımı
Auto MPG veri kümesi, StatLib kitaplığında sağlanan veri
kümesinin biraz değiştirilmiş bir versiyonudur [5].
Veri seti toplam 398 örnekten ve 9 sınıf özellik (class
attribute) bilgilerinden oluşmaktadır.
Veri Setinin Özellik Bilgileri:
• mpg: continuous – 1 galon benzin veya mazotla kaç mil
gittiğini gösterir
• cylinders: multi-valued discrete – silindir sayısı
• displacement: continuous – motor hacmi
• horsepower: continuous – beygir gücü
• weight: continuous – ağırlığı
• acceleration: continuous – hızlanma
• model year: multi-valued discrete – model yılı
• origin: multi-valued discrete – kökeni
• car name: string (unique for each instance) – araç ismi
11. Veri Setinin Spyder Yazılımına Aktarılması
Öncelikle aktarmak istediğimiz veri setinin ismini yazmamız gerekiyor (auto-mpg.data). Sonraki işlemler sırasıyla
names (sütun adı), na_values (boş değer – eğer boş değer varsa soru işareti ile gösterilecek), comment (yorum), sep
(ayırma – veri seti boşluklarla ayrıldığı için tırnak içinde boşluk kullanılmıştır.) ve skipinitialspace (boşluk – veri
boşluklarla ayrıldığı için boşluk atlamaya yarayan komut)
12. Veri Seti ile İlgili Bilgiler
Data Info() komutu: Göründüğü gibi 398 tane girdi
(entries) var ve bunlar 0’dan 397’e kadar
işaretlenmişler. 8 tane sütun mevcut ve sadece
beygir gücünde (Hoursepower) 6 tane kayıp değer
var. Veri seti ondalık (float) ve tam (int) sayılardan
oluşmaktadır.
Describe komutu: Count (sayı) – Girdi sayısını
ifade eder. Mean (ortalama) – veri setimizdeki tüm
veri noktalarının toplamının toplam veri noktasına
bölümü ile edilen bir ortalama sayıdır. Std
(Standart Sapma) – Varyansın karekökü olarak
tanımlanır.
13. VERİ ANALİZİ
• Kayıp Değerlerin Bulunması ve Doldurulması
• Keşifsel Veri Analizi
• Aykırı Değer
• Öz Nitelik Mühendisliği
• Ön İşleme
• Doğrusal Regresyon
• Düzenlileştirme
• XGBoost
• Modellerin Ortalaması
04
14. Kayıp Değerlerin Bulunması ve Doldurulması
Her sütundaki eksik değerleri bulmak için
data.isna().sum() komutunu kullanmak gerekiyor.
Bu veri seti için en sağlıklı yöntem istatistiksel
dağılıma göre (mean) kayıp verilerin
doldurulmasıdır.
Göründüğü gibi Hoursepower (beygir gücü)
sütununda 6 tane kayıp değer bulunmaktadır.
Diğer sütunlarda kayıp değer bulunmamaktadır.
Burada fillna kayıp değerleri doldurmak için
kullanılan bir komut.
15. Keşifsel Veri Analizi
Korelasyon, olasılık kuramı ve istatistikte iki rassal değişken
arasındaki doğrusal ilişkinin yönünü ve gücünü belirtir [6].
Veri setinde nümerik sayılar olduğu için birbirleri arasındaki
ilişkiyi daha iyi analiz etmek mümkün.
Acceleration (hızlanma) ile Weight (ağırlık) arasındaki ilişkiye
baktığımız zaman negatif korelasyon gözükmektedir. Yani
aracın ağırlığı ne kadar az ise o kadar hızlıdır sonucunu
çıkarabiliriz. Target (MPG) ile Horsepower (beygir gücü),
Weight (ağırlık), Cylinders (silindir sayısı) ve Displacement
(motor hacmi) arasındaki ilişkiye baktığımız zaman yine
negatif bir korelasyon gözükmektedir. Target (MPG) ile diğer
sütunlar arasındaki ilişkiyi araştırdığımız için burada Target
(MPG) bağımlı değişken diğer sütunlar ise bağımsız
değişkenlerdir.
16. Keşifsel Veri Analizi
Bir korelasyon matrisinde birleri ile yüksek korelasyona
sahip özellikler varsa buna çoklu doğrusal bağlantı
(multicollinearity) denir.
Korelasyon matrisini küçültmek ve daha iyi analiz yapmak
için ±0,75 aralığındaki korelasyonlara bakıldı ve özellikler
arasında yüksek korelasyon bulundu. Sonuç olarak özellikler
arasında çoklu doğrusal bağlantı (multicollinearity) vardır.
17. Keşifsel Veri Analizi
Genel Grafiğe bakıldığı zaman Target (MPG) ile Target (MPG)
arasında histogram ortaya çıkmıştır. Target (MPG) ile
Cylinders (silindir sayısı) ve Origin (kökeni) arasında
kategorik bir ilişki mevcuttur. Target (MPG) ile Displacement
(motor hacmi), Horsepower (beygir gücü) ve Weight (ağırlık)
arasında ters orantı vardır. Target (MPG) ile Acceleration
(hızlanma) ve Model Year (model yılı) arasında herhangi bir
korelasyon gözükmemektedir.
18. Aykırı Değer
İstatistikte aykırı değer (outlier), diğer gözlemlerden önemli ölçüde farklı olan bir veri noktasıdır. Aykırı değer,
ölçümdeki değişkenlikten kaynaklanabilir veya deneysel hatayı gösterebilir; ikincisi bazen veri setinden hariç tutulur.
Bir aykırı değer, istatistiksel analizlerde ciddi sorunlara neden olabilir [7].
Q1 (1. Çeyreklik) ve Q3 (3. Çeyreklik) değerleri birbirlerinden çıkartılarak IQR (Çeyrek arası açıklık) değeri elde edilir. Q3
ile 1,5*IQR topladığı zaman sağ, Q1 ile 1,5*IQR farkını aldığımız zaman sol aykırı değer sınırını elde etmiş oluyoruz.
Aykırı sınır değerlerinden dışarda kalan değerler aykırı değerlerdir.
19. Aykırı Değer
Aykırı Değerlerin Tespiti
Grafikten de görüldüğü gibi, Horsepower (beygir
gücü) ve Acceleration (hızlanma) sütununda aykırı
değerler mevcuttur. Target (MPG) sütununda çok
az aykırı değer vardır. Diğer sütunlarda aykırı
değerler yoktur.
Aykırı Değerlerin Çıkarılması
Göründüğü gibi 398 tane girdiden sadece 395
tanesi kaldı. Horsepower (beygir gücü) ve
Acceleration (hızlanma) sütunlarındaki aykırı
değerler başarılı bir şekilde veri setinden çıkarıldı.
20. Öz Nitelik Mühendisliği
Çarpıklık olasılık kuramı ve istatistik bilim dallarında bir reel-değerli rassal değişkenin olasılık dağılımının simetrik
olamayışının ölçülmesidir [8].
21. Öz Nitelik Mühendisliği
İlk önce Target (MPG) bağımlı değişkenin çarpıklık
dağılımının grafiğine bakacak olursak, kuyruğun
sağ tarafta olduğu görülmektedir. Yani pozitif bir
çarpıklık vardır.
Bu çarpıklık değerini azaltabilmek için Log
dönüşümü yapıldı.
Grafikten de anlaşılacağı üzere çarpıklık değeri
azalmıştır.
22. Öz Nitelik Mühendisliği
Bağımsız Değişkenlerin Çarpıklık Dağılımı
Horsepower (beygir gücü) 1’den büyük
olduğu için pozitif bir çarpıklık var fakat
çok küçük olduğu için sorun teşkil
etmemektedir. Geri kalan bağımsız
değişkenlerin çarpıklık değeri gayet
idealdir.
23. Öz Nitelik Mühendisliği
One Hot Encoding, kategorik
değişkenlerin ikili (binary) vektörler
olarak temsilidir.
Cylinders (silindir sayısı) ve Origin
(kökeni) kategorik özellikler oldukları için
başarılı bir şekilde One Hot Encoding
yapılmıştır.
24. Ön İşleme
Ön işleme, makine öğrenimi modellerini eğitmeden önce veri seti üzerinde yapılan birtakım işlemlerdir.
25. Ön İşleme
Eğitim ve Test Verisinin Tanımlanması Eğitim ve Test Verisinin Tanımlanması
26. Ön İşleme
Standardizasyon, veri yapısını bozmadan standart hale getirme işlemidir.
Normal bir dağılım olması için Train ve Test verisinin Mean (ortalama) değeri 1 olarak, Std (Standart Sapma) değeri 0
olarak tanımlandı.
27. Doğrusal Regresyon
Doğrusal regresyon, bir bağımlı
değişken ile diğer başka bir
bağımsız değişken arasındaki ilişkiyi
tahmin etmeye çalışan doğrusal bir
model yaklaşımıdır.
Amaç her zaman en küçük kareler
yöntemini (Least Squared Error)
minimize etmektir.
Ortalama kare hata (mean squared
error) payı 0,020 olarak bulundu.
29. Düzenlileştirme
Ridge Regularization
Birden fazla değişkenli regresyon
verilerini analiz etmek için kullanılır.
Doğrusal bir modeldir.
Ortalama kare hata (mean squared
error) payı 0,018 olarak bulundu.
30. Düzenlileştirme
Lasso Regularization
Oluşturulan modelin tahmin
doğruluğunu geliştirmek için hem
değişken seçimi hem de düzenleme
gerçekleştirir.
Ortalama kare hata (mean squared
error) payı 0,016 olarak bulundu.
Ridge Regularization’dan en büyük
farkı gereksiz özelliklere sıfır değeri
atamasıdır.
31. Düzenlileştirme
ElasticNet Regularization
Ridge ve Lasso Regularization’nın
karışımı gibidir. Her ikisinin de
kuvvetli yönlerine sahiptir.
Ortalama kare hata (mean squared
error) payı 0,017 olarak bulundu.
32. XGBoost
Büyük ve karmaşık veri setleri için tasarlanmış bir algoritmadır. Bu algoritmanın en önemli özelliği hızlı çalışması ve
yüksek tahmin gücü elde edebiliyor olmasıdır.
Ortalama kare hata (mean squared error) payı 0,017 olarak bulundu.
34. Modellerin Ortalaması
En iyi sonucu veren iki algoritmanın ortalaması alınarak bulunur.
Ortalama kare hata (mean squared error) payı 0,015 olarak bulundu.
35. Sonuç
Kullanılmış olan modellerin regresyon skoru bulunmuş ve en iyi sonucu veren modellerin ortalaması alınarak test
skoru ortaya çıkarılmıştır. Projenin geliştirilme sürecinde karşılaşılan hataları çözebilmek için literatür araştırması
yapılmıştır. Yapılan bu araştırmalarda yapay zeka, veri analizi, makine öğrenimi modelleri ve Python programlama dili
ile ilgili bilgiler edinilmiştir. Yeni makine öğrenimi modelleri kullanılarak proje daha da geliştirilebilir.
36. [1] Oracle, «Yapay Zeka nedir? Yapay Zeka hakkında bilgi edinin», erişim tarihi: 15 Haziran 2022,
https://www.oracle.com/tr/artificial-intelligence/what-is-ai/
[2] Oracle, «Makine Öğrenimi nedir?», erişim tarihi: 15 Haziran 2022,
https://www.oracle.com/tr/data-science/machine-learning/what-is-machine-learning/
[3] Wikipedia, “Anaconda (Python dağıtımı)”, erişim: 15 Haziran 2022,
https://tr.wikipedia.org/wiki/Anaconda_(Python_da%C4%9F%C4%B1t%C4%B1m%C4%B1)
[4] Wikipedia, “Spyder (software)”, erişim: 15 Haziran 2022,
https://en.wikipedia.org/wiki/Spyder_(software)
[5] UCI Machine Learning Repository, “Auto MPG Data Set”, erişim: 02 Haziran 2022,
https://archive.ics.uci.edu/ml/datasets/Auto+MPG
[6] Wikipedia, “Korelasyon”, erişim: 02 Haziran 2022,
https://tr.wikipedia.org/wiki/Korelasyon
[7] Wikipedia, “Aykırı değer”, erişim: 02 Haziran 2022,
https://tr.wikipedia.org/wiki/Ayk%C4%B1r%C4%B1_de%C4%9Fer
[8] Wikipedia, “Çarpıklık”, erişim 02 Haziran 2022,
https://tr.wikipedia.org/wiki/%C3%87arp%C4%B1kl%C4%B1k
KAYNAKÇA