Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance.
In this workshop, we will discuss the core techniques in anomaly detection and discuss advances in Deep Learning in this field.
Through case studies, we will discuss how anomaly detection techniques could be applied to various business problems. We will also demonstrate examples using R, Python, Keras and Tensorflow applications to help reinforce concepts in anomaly detection and best practices in analyzing and reviewing results.
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance. In this talk, we will introduce anomaly detection and discuss the various analytical and machine learning techniques used in in this field. Through a case study, we will discuss how anomaly detection techniques could be applied to energy data sets. We will also demonstrate, using R and Apache Spark, an application to help reinforce concepts in anomaly detection and best practices in analyzing and reviewing results.
With R, Python, Apache Spark and a plethora of other open source tools, anyone with a computer can run machine learning algorithms in a jiffy! However, without an understanding of which algorithms to choose and when to apply a particular technique, most machine learning efforts turn into trial and error experiments with conclusions like "The algorithms don't work" or "Perhaps we should get more data".
In this lecture, we will focus on the key tenets of machine learning algorithms and how to choose an algorithm for a particular purpose. Rather than just showing how to run experiments in R ,Python or Apache Spark, we will provide an intuitive introduction to machine learning with just enough mathematics and basic statistics.
We will address:
• How do you differentiate Clustering, Classification and Prediction algorithms?
• What are the key steps in running a machine learning algorithm?
• How do you choose an algorithm for a specific goal?
• Where does exploratory data analysis and feature engineering fit into the picture?
• Once you run an algorithm, how do you evaluate the performance of an algorithm?
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance.
In this talk, we will introduce anomaly detection and discuss the various analytical and machine learning techniques used in in this field. Through a case study, we will discuss how anomaly detection techniques could be applied to energy data sets. We will also demonstrate, using R and Apache Spark, an application to help reinforce concepts in anomaly detection and best practices in analyzing and reviewing results
In Part II of the Anomaly Detection Series, we discuss the challenges in analyzing Temporal datasets and discuss methods for outlier analysis. We focus on single time series and discuss point outlier and sub-sequence methods.
Innovations in technology has revolutionized financial services to an extent that large financial institutions like Goldman Sachs are claiming to be technology companies! It is no secret that technological innovations like Data science and AI are changing fundamentally how financial products are created, tested and delivered. While it is exciting to learn about technologies themselves, there is very little guidance available to companies and financial professionals should retool and gear themselves towards the upcoming revolution.
In this master class, we will discuss key innovations in Data Science and AI and connect applications of these novel fields in forecasting and optimization. Through case studies and examples, we will demonstrate why now is the time you should invest to learn about the topics that will reshape the financial services industry of the future!
Topic
- Frontier topics in Optimization
Anomaly detection: Core Techniques and Advances in Big Data and Deep LearningQuantUniversity
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance.
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance. In this talk, we will introduce anomaly detection and discuss the various analytical and machine learning techniques used in in this field. Through a case study, we will discuss how anomaly detection techniques could be applied to energy data sets. We will also demonstrate, using R and Apache Spark, an application to help reinforce concepts in anomaly detection and best practices in analyzing and reviewing results.
Since its debut in 2010, Apache Spark has become one of the most popular Big Data technologies in the Apache open source ecosystem. In addition to enabling processing of large data sets through its distributed computing architecture, Spark provides out-of-the-box support for machine learning, streaming and graph processing in a single framework. Spark has been supported by companies like Microsoft, Google, Amazon and IBM and in financial services, companies like Blackrock (http://bit.ly/1Q1DVJH ) and Bloomberg (http://bit.ly/29LXbPv ) have started to integrate Apache Spark into their tool chain and the interest is growing. Unlike other big-data technologies which require intensive programming using Java etc., Spark enables data scientists to work with a big-data technology using higher level languages like Python and R making it accessible to conduct experiments and for rapid prototyping.
In this talk, we will introduce Apache Spark and discuss the key features that differentiate Apache Spark from other technologies. We will provide examples on how Apache Spark can help scale analytics and discuss how the machine learning API could be used to solve large-scale machine learning problems using Spark’s distributed computing framework. We will also illustrate enterprise use cases for scaling analytics with Apache Spark.
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance. In this talk, we will introduce anomaly detection and discuss the various analytical and machine learning techniques used in in this field. Through a case study, we will discuss how anomaly detection techniques could be applied to energy data sets. We will also demonstrate, using R and Apache Spark, an application to help reinforce concepts in anomaly detection and best practices in analyzing and reviewing results.
With R, Python, Apache Spark and a plethora of other open source tools, anyone with a computer can run machine learning algorithms in a jiffy! However, without an understanding of which algorithms to choose and when to apply a particular technique, most machine learning efforts turn into trial and error experiments with conclusions like "The algorithms don't work" or "Perhaps we should get more data".
In this lecture, we will focus on the key tenets of machine learning algorithms and how to choose an algorithm for a particular purpose. Rather than just showing how to run experiments in R ,Python or Apache Spark, we will provide an intuitive introduction to machine learning with just enough mathematics and basic statistics.
We will address:
• How do you differentiate Clustering, Classification and Prediction algorithms?
• What are the key steps in running a machine learning algorithm?
• How do you choose an algorithm for a specific goal?
• Where does exploratory data analysis and feature engineering fit into the picture?
• Once you run an algorithm, how do you evaluate the performance of an algorithm?
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance.
In this talk, we will introduce anomaly detection and discuss the various analytical and machine learning techniques used in in this field. Through a case study, we will discuss how anomaly detection techniques could be applied to energy data sets. We will also demonstrate, using R and Apache Spark, an application to help reinforce concepts in anomaly detection and best practices in analyzing and reviewing results
In Part II of the Anomaly Detection Series, we discuss the challenges in analyzing Temporal datasets and discuss methods for outlier analysis. We focus on single time series and discuss point outlier and sub-sequence methods.
Innovations in technology has revolutionized financial services to an extent that large financial institutions like Goldman Sachs are claiming to be technology companies! It is no secret that technological innovations like Data science and AI are changing fundamentally how financial products are created, tested and delivered. While it is exciting to learn about technologies themselves, there is very little guidance available to companies and financial professionals should retool and gear themselves towards the upcoming revolution.
In this master class, we will discuss key innovations in Data Science and AI and connect applications of these novel fields in forecasting and optimization. Through case studies and examples, we will demonstrate why now is the time you should invest to learn about the topics that will reshape the financial services industry of the future!
Topic
- Frontier topics in Optimization
Anomaly detection: Core Techniques and Advances in Big Data and Deep LearningQuantUniversity
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance.
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance. In this talk, we will introduce anomaly detection and discuss the various analytical and machine learning techniques used in in this field. Through a case study, we will discuss how anomaly detection techniques could be applied to energy data sets. We will also demonstrate, using R and Apache Spark, an application to help reinforce concepts in anomaly detection and best practices in analyzing and reviewing results.
Since its debut in 2010, Apache Spark has become one of the most popular Big Data technologies in the Apache open source ecosystem. In addition to enabling processing of large data sets through its distributed computing architecture, Spark provides out-of-the-box support for machine learning, streaming and graph processing in a single framework. Spark has been supported by companies like Microsoft, Google, Amazon and IBM and in financial services, companies like Blackrock (http://bit.ly/1Q1DVJH ) and Bloomberg (http://bit.ly/29LXbPv ) have started to integrate Apache Spark into their tool chain and the interest is growing. Unlike other big-data technologies which require intensive programming using Java etc., Spark enables data scientists to work with a big-data technology using higher level languages like Python and R making it accessible to conduct experiments and for rapid prototyping.
In this talk, we will introduce Apache Spark and discuss the key features that differentiate Apache Spark from other technologies. We will provide examples on how Apache Spark can help scale analytics and discuss how the machine learning API could be used to solve large-scale machine learning problems using Spark’s distributed computing framework. We will also illustrate enterprise use cases for scaling analytics with Apache Spark.
Credit scoring has been used to categorize customers based on various characteristics to evaluate their credit worthiness. Increasingly, machine learning techniques are being deployed for customer segmentation, classification and scoring.
In this talk, we will discuss various machine learning techniques that can be used for credit risk applications. Through a case study built in R, we will illustrate the nuances of working with practical datasets which includes categorical and numerical data, different techniques that can be used to evaluate and explore customer profiles, visualizing high dimensional datasets and machine learning techniques for customer segmentation.
This talk should be of interest to practicing quants and data scientists who are interested in applying machine learning techniques for credit risk and scoring applications.
Missing data handling is typically done in an ad-hoc way. Without understanding the repurcussions of a missing data handling technique, approaches that only let you get to the "next step" in your analytics pipeline leads to terrible outputs, conclusions that aren't robust and biased estimates. Handling missing data in data sets requires a structured approach. In this workshop, we will cover the key tenets of handling missing data in a structured way
Credit scoring has been used to categorize customers based on various characteristics to evaluate their credit worthiness. Increasingly, machine learning techniques are being deployed for customer segmentation, classification and scoring. In this talk, we will discuss various machine learning techniques that can be used for credit risk applications. Through a case study built in R, we will illustrate the nuances of working with practical data sets which includes categorical and numerical data, different techniques that can be used to evaluate and explore customer profiles, visualizing high dimensional data sets and machine learning techniques for customer segmentation.
This presentation will present topics such as "What is Anomaly Detection? What are the different types of Data that may be used? What are the popular techniques may be used to identify anomalies. What are the best practices in anomaly detection? What is the Value of Anomaly Detection?
This presentation deals with the formal presentation of anomaly detection and outlier analysis and types of anomalies and outliers. Different approaches to tackel anomaly detection problems.
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaPyData
PyData London 2018
This talk will focus on the importance of correctly defining an anomaly when conducting anomaly detection using unsupervised machine learning. It will include a review of Isolation Forest algorithm (Liu et al. 2008), and a demonstration of how this algorithm can be applied to transaction monitoring, specifically to detect money laundering.
---
www.pydata.org
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.
Synthetic VIX Data Generation Using ML TechniquesQuantUniversity
Slides from PRIMIA webinar: https://prmia.org/Shared_Content/Events/PRMIA_Event_Display.aspx?EventKey=8504&WebsiteKey=e0a57874-c04b-476a-827d-2bbc348e6b08
Part 1: We will discuss key trends in AI and machine learning in the financial services industry. We will discuss the key use cases, challenges, and best practices of using AI and ML techniques in financial services. We will also discuss key players and drivers for the AI and Machine learning revolution.
Part 2: We will illustrate a case study where AI and machine learning techniques are applied in financial services.
Case study: Synthetic VIX data generation using Machine learning techniques
Synthetic data sets and simulations are used to enrich and augment existing datasets to provide comprehensive samples while training machine learning problems. In addition, synthetic data generators could be used for scenario generation when modeling future scenarios when trained on real and synthetic scenarios. The advent of novel techniques in Machine Learning has rekindled interest in using deep learning techniques like Generative Adversarial Networks (GANs) and Encoder-Decoder architectures in financial synthetic data generation.
In this case study, we discuss a recent study we did to see the efficacy of synthetic data generation when there are significant VIX changes in the market during short time horizons. We used QuSynthesize, a synthetic data generator for time-series based datasets and used historical VIX datasets and synthetic VIX scenarios to generate futuristic scenarios.
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance.
In this workshop, we will discuss the core techniques in anomaly detection and discuss advances in Deep Learning in this field.
Through case studies, we will discuss how anomaly detection techniques could be applied to various business problems. We will also demonstrate examples using R, Python, Keras and Tensorflow applications to help reinforce concepts in anomaly detection and best practices in analyzing and reviewing results.
What you will learn:
Anomaly Detection: An introduction
Graphical and Exploratory analysis techniques
Statistical techniques in Anomaly Detection
Machine learning methods for Outlier analysis
Evaluating performance in Anomaly detection techniques
Detecting anomalies in time series data
Case study 1: Anomalies in Freddie Mac mortgage data
Case study 2: Auto-encoder based Anomaly Detection for Credit risk with Keras and Tensorflow
Credit scoring has been used to categorize customers based on various characteristics to evaluate their credit worthiness. Increasingly, machine learning techniques are being deployed for customer segmentation, classification and scoring.
In this talk, we will discuss various machine learning techniques that can be used for credit risk applications. Through a case study built in R, we will illustrate the nuances of working with practical datasets which includes categorical and numerical data, different techniques that can be used to evaluate and explore customer profiles, visualizing high dimensional datasets and machine learning techniques for customer segmentation.
This talk should be of interest to practicing quants and data scientists who are interested in applying machine learning techniques for credit risk and scoring applications.
Missing data handling is typically done in an ad-hoc way. Without understanding the repurcussions of a missing data handling technique, approaches that only let you get to the "next step" in your analytics pipeline leads to terrible outputs, conclusions that aren't robust and biased estimates. Handling missing data in data sets requires a structured approach. In this workshop, we will cover the key tenets of handling missing data in a structured way
Credit scoring has been used to categorize customers based on various characteristics to evaluate their credit worthiness. Increasingly, machine learning techniques are being deployed for customer segmentation, classification and scoring. In this talk, we will discuss various machine learning techniques that can be used for credit risk applications. Through a case study built in R, we will illustrate the nuances of working with practical data sets which includes categorical and numerical data, different techniques that can be used to evaluate and explore customer profiles, visualizing high dimensional data sets and machine learning techniques for customer segmentation.
This presentation will present topics such as "What is Anomaly Detection? What are the different types of Data that may be used? What are the popular techniques may be used to identify anomalies. What are the best practices in anomaly detection? What is the Value of Anomaly Detection?
This presentation deals with the formal presentation of anomaly detection and outlier analysis and types of anomalies and outliers. Different approaches to tackel anomaly detection problems.
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaPyData
PyData London 2018
This talk will focus on the importance of correctly defining an anomaly when conducting anomaly detection using unsupervised machine learning. It will include a review of Isolation Forest algorithm (Liu et al. 2008), and a demonstration of how this algorithm can be applied to transaction monitoring, specifically to detect money laundering.
---
www.pydata.org
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.
Synthetic VIX Data Generation Using ML TechniquesQuantUniversity
Slides from PRIMIA webinar: https://prmia.org/Shared_Content/Events/PRMIA_Event_Display.aspx?EventKey=8504&WebsiteKey=e0a57874-c04b-476a-827d-2bbc348e6b08
Part 1: We will discuss key trends in AI and machine learning in the financial services industry. We will discuss the key use cases, challenges, and best practices of using AI and ML techniques in financial services. We will also discuss key players and drivers for the AI and Machine learning revolution.
Part 2: We will illustrate a case study where AI and machine learning techniques are applied in financial services.
Case study: Synthetic VIX data generation using Machine learning techniques
Synthetic data sets and simulations are used to enrich and augment existing datasets to provide comprehensive samples while training machine learning problems. In addition, synthetic data generators could be used for scenario generation when modeling future scenarios when trained on real and synthetic scenarios. The advent of novel techniques in Machine Learning has rekindled interest in using deep learning techniques like Generative Adversarial Networks (GANs) and Encoder-Decoder architectures in financial synthetic data generation.
In this case study, we discuss a recent study we did to see the efficacy of synthetic data generation when there are significant VIX changes in the market during short time horizons. We used QuSynthesize, a synthetic data generator for time-series based datasets and used historical VIX datasets and synthetic VIX scenarios to generate futuristic scenarios.
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance.
In this workshop, we will discuss the core techniques in anomaly detection and discuss advances in Deep Learning in this field.
Through case studies, we will discuss how anomaly detection techniques could be applied to various business problems. We will also demonstrate examples using R, Python, Keras and Tensorflow applications to help reinforce concepts in anomaly detection and best practices in analyzing and reviewing results.
What you will learn:
Anomaly Detection: An introduction
Graphical and Exploratory analysis techniques
Statistical techniques in Anomaly Detection
Machine learning methods for Outlier analysis
Evaluating performance in Anomaly detection techniques
Detecting anomalies in time series data
Case study 1: Anomalies in Freddie Mac mortgage data
Case study 2: Auto-encoder based Anomaly Detection for Credit risk with Keras and Tensorflow
Data reduction: breaking down large sets of data into more-manageable groups or segments that provide better insight.
- Data sampling
- Data cleaning
- Data transformation
- Data segmentation
- Dimension reduction
Ever wondered what factors influence house prices? This project explores the world of house price prediction using data science techniques. We delve into analyzing real estate data to build models that can estimate the value of a home. This can be a valuable tool for both buyers and sellers navigating the housing market. visit https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/ for more details
This project presents a machine learning approach to predicting house prices using a dataset containing various features such as the size of the house, number of bedrooms, location, and others. The project aims to build a predictive model that can accurately estimate the selling price of a house based on its features. The presentation covers data preprocessing steps, feature selection techniques, and the application of machine learning algorithms such as linear regression or decision trees. It also discusses model evaluation metrics and the potential impact of the model on the real estate industry. Visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Imputation techniques for missing data in clinical trialsNitin George
Missing data are unavoidable in clinical and epidemiological researches. Missing data leads to bias and loss of information in research analysis. Usually we are not aware of missing data techniques because we are depending on some software’s. The objective of this seminar is to introduce different missing data mechanisms and imputation techniques for missing data with the help of examples.
Application of Machine Learning in AgricultureAman Vasisht
With the growing trend of machine learning, it is needless to say how machine learning can help reap benefits in agriculture. It will be boon for the farmer welfare.
Probability and random processes project based learning template.pdfVedant Srivastava
To understand the concept of Monte –Carlo Method and its various applications and it rely on repeated and random sampling to obtain numerical result.
Developing the computational algorithms to solve the problem related to random sampling.
Objective also contains simulation of specific problem in Matlab Software.
Presentation to the third LIS DREaM workshop, held at Edinburgh Napier university on Wednesday 25th April 2012.
More information about the event can be found at http://lisresearch.org/dream-project/dream-event-4-workshop-wednesday-25-april-2012/
Learning machine learning with YellowbrickRebecca Bilbro
Yellowbrick is an open source Python library that provides visual diagnostic tools called “Visualizers” that extend the Scikit-Learn API to allow human steering of the model selection process. For teachers and students of machine learning, Yellowbrick can be used as a framework for teaching and understanding a large variety of algorithms and methods.
Machine Learning techniques used in Artificial Intelligence- Supervised, Unsupervised, Reinforcement Learning. It discusses about Linear Regression, Logistic Regression, SVM, Random forest, KNN, K-Means Clustering and Apriori Algorithm. It also Illustrates the applications of AI in various fields.
Uniform Legal Framework for AI: The EU AI Act establishes a uniform legal framework for the development, marketing, and use of artificial intelligence systems within the EU, aimed at promoting trustworthy and human-centric AI while ensuring a high level of health, safety, and fundamental rights protection.
Risk-Based Approach: The regulation adopts a risk-based approach, classifying AI systems based on the level of risk they pose, from minimal to unacceptable risk, with stringent requirements for high-risk AI systems, particularly those impacting health, safety, and fundamental rights.
Prohibitions for Certain AI Practices: Unacceptable risk practices, such as manipulative social scoring and real-time biometric identification in public spaces without justification, are prohibited to protect individual rights and freedoms.
Mandatory Requirements for High-Risk AI Systems: High-risk AI systems must comply with mandatory requirements before they can be marketed, put into service, or used within the EU. These requirements include transparency, data governance, technical documentation, and human oversight to ensure safety and compliance with fundamental rights.
Conformity Assessment and Compliance: Providers of high-risk AI systems must undergo a conformity assessment procedure to demonstrate compliance with the mandatory requirements. This includes maintaining technical documentation and conducting risk management activities.
Transparency Obligations: AI systems must be transparent, providing users with information about the AI system's capabilities, limitations, and the purpose for which it is intended, ensuring informed use of AI technologies.
Market Surveillance: The EU AI Act establishes mechanisms for market surveillance to monitor and enforce compliance, with the European Artificial Intelligence Board (EAIB) playing a central role in coordinating activities across member states.
Protection of Fundamental Rights: The Act emphasizes the protection of fundamental rights, including privacy, non-discrimination, and consumer rights, with specific provisions to safeguard these rights in the context of AI use.
Innovation and SME Support: The regulation aims to foster innovation and support small and medium-sized enterprises (SMEs) through regulatory sandboxes and by reducing administrative burdens for low and minimal risk AI applications.
Global Impact and Alignment: While the EU AI Act directly applies to the EU market, its global impact is significant, influencing international standards and practices in AI development and use. Financial industry professionals worldwide should be aware of these regulations as they may affect global operations and international collaborations.
The financial industry is witnessing an emerging trend of Large Language Models (LLMs) applications to improve operational efficiency. This article, based on a round table discussion hosted by TruEra and QuantUniversity in New York in May 2023, explores the potential use cases of LLMs in financial institutions (FIs), the risks to consider, approaches to manage these risks, and the implications for people, skills, and ways of working. Frontline personnel from Data and Analytics/AI teams, Model Risk, Data Management, and other roles from fifteen financial institutions devoted over two hours to discussing the LLM opportunities within their industry, as well as strategies for mitigating associated risks.
The discussions revealed a preference for discriminative use cases over generative ones, with a focus on information retrieval and operational automation. The necessity for a human-in-the-loop was emphasized, along with a detailed discourse on risks and their mitigation.
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSQuantUniversity
Join CFA Institute and QuantUniversity for an information session about the upcoming CFA Institute Professional Learning course: Python and Data Science for Investment professionals.
Learn how artificial intelligence (AI) and machine learning are revolutionizing industries — this course will introduce key concepts and illustrate the role of machine learning, data science techniques, and AI through examples and case studies from the investment industry. The presentation uses simple mathematics and basic statistics to provide an intuitive understanding of machine learning, as used by firms, to augment traditional decision making.
https://quforindia.splashthat.com/
Mathematical Finance & Financial Data Science Seminar
AI and machine learning are entering every aspect of our life. Marketing, autonomous driving, personalization, computer vision, finance, wearables, travel are all benefiting from the advances in AI in the last decade. As more and more AI applications are being deployed in enterprises, concerns are growing about potential "AI accidents" and the misuse of AI. With increased complexity, some are questioning whether the models actually work! As the debate about fairness, bias, and privacy grow, there is increased attention to understanding how the models work and whether the models are thoroughly tested and designed to address potential issues.
The area "Responsible AI" is fast emerging and becoming an important aspect of the adoption of machine learning and AI products in the enterprise. Companies are now incorporating formal ethics reviews, model validation exercises, and independent algorithmic auditing to ensure that the adoption of AI is transparent and has gone through formal validation phases.
In this talk, Sri will introduce Algorithmic auditing and discuss why Algorithmic auditing will be a formal process industries using AI will need. Sri will also discuss the emerging risks in the adoption of AI and discuss how QuSandbox, his company is building, will address the emerging needs of formal Algorithmic auditing practices in enterprises.
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...QuantUniversity
Machine Learning: Considerations for Fairly and Transparently Expanding Access to Credit
With Raghu Kulkarni and Steve Dickerson
Recently, machine learning has been used extensively in credit decision making. As ML proliferates the industry, issues of considerations for fair and transparent access to credit decision making is becoming important.
In this talk, Dr.Raghu Kulkarni and Dr.Steven Dickerson from Discover Financial Services will share their experiences at Discover. The talk will include:
- An overview of how ML models are used across financial life cycle
- Practical problems practitioners run into and why explainability and bias detection becomes important.
References:
1- https://www.h2o.ai/resources/white-paper/machine-learning-considerations-for-fairly-and-transparently-expanding-access-to-credit/
2- https://arxiv.org/abs/2011.03156
Seeing what a gan cannot generate: paper reviewQuantUniversity
Seeing what a GAN cannot Generate Paper review: Bau, David et al. “Seeing What a GAN Cannot Generate.” 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019): 4501-4510.
Machine Learning in Finance: 10 Things You Need to Know in 2021QuantUniversity
Machine Learning and AI has revolutionized Finance! In the last five years, innovations in computing, technology and business models have created multiple products and services in Fintech prompting organizations to prioritize their data and AI strategies. What will 2021 bring and how should you prepare for it? Join Sri Krishnamurthy,CFA as we kickoff the QuantUniversity’s Winter school 2021. We will introduce you to the upcoming programs and have a masterclass on 10 innovations in AI and ML you need to know in 2021!
Markowitz portfolio optimization is optimal in theory, however, when applied in practice it often fails catastrophically. Usually, this is addressed by various simplifications to increase robustness. In this talk I will make the case that the reason this theory fails in practice is because uncertainty in the parameter estimation is not taken into account. By using Bayesian statistics we can fix Markowitz and retain all its desirable properties while still having a robustness technique that can be easily extended. This talk is geared at intermediate and will give a general introduction to Bayesian modeling using PyMC3 and focus on application and code examples rather than theory.
With Alternative Data becoming more and more popular in the industry, quants are eager to adopt them into their investment processes. However, with a plethora of options, API standards, trying and evaluating datasets is a major hindrance to adoption of datasets.
Join Yaacov, Sri, James and Brad discuss the opportunities, pitfalls and challenges of Alternative Data and its adoption in finance
A Unified Framework for Model Explanation
Ian Covert, University of Washington
Explainable AI is becoming increasingly important, but the field is evolving rapidly and requires better organizing principles to remain manageable for researchers and practitioners. In this talk, Ian will discuss a new paper that unifies a large portion of the literature using a simple idea: simulating feature removal. The new class of "removal-based explanations" describes 20+ existing methods (e.g., LIME, SHAP) and reveals underlying links with psychology, game theory and information theory.
Practical examples will be presented and available on the Qu.Academy site
Reference:
Explaining by Removing: A Unified Framework for Model Explanation
Ian Covert, Scott Lundberg, Su-In Lee
https://arxiv.org/abs/2011.14878
Machine Learning Interpretability -
Self-Explanatory Models: Interpretability, Diagnostics and Simplification
With Agus Sudjianto, Wells Fargo
The deep neural networks (DNNs) have achieved great success in learning complex patterns with strong predictive power, but they are often thought of as "black box"models without a sufficient level of transparency and interpretability. It is important to demystify the DNNs with rigorous mathematics and practical tools, especially when they are used for mission-critical applications. This talk aims to unwrap the black box of deep ReLU networks through exact local linear representation, which utilizes the activation pattern and disentangles the complex network into an equivalent set of local linear models (LLMs). We develop a convenient LLM-based toolkit for interpretability, diagnostics, and simplification of a pre-trained deep ReLU network. We propose the local linear profile plot and other visualization methods for interpretation and diagnostics, and an effective merging strategy for network simplification. The proposed methods are demonstrated by simulation examples, benchmark datasets, and a real case study in credit risk assessment. The paper that will be presented in this talk can be found here.
Qu speaker series 14: Synthetic Data Generation in FinanceQuantUniversity
In this master class, Stefan shows how to create synthetic time-series data using generative adversarial networks (GAN). GANs train a generator and a discriminator network in a competitive setting so that the generator learns to produce samples that the discriminator cannot distinguish from a given class of training data. The goal is to yield a generative model capable of producing synthetic samples representative of this class. While most popular with image data, GANs have also been used to generate synthetic time-series data in the medical domain. Subsequent experiments with financial data explored whether GANs can produce alternative price trajectories useful for ML training or strategy backtests.
In 2009 author and motivational speaker Simon Sinek delivered the now-classic TED talk “Start with why”. Viewed by over 28 million people, “Start with Why” is the third most popular TED video of all time and it teaches us that great leaders and companies inspire us to take action by focusing on the WHY over the “what” or the “how”. In this talk we’ll ask how applied data and computational scientists can use the power of WHY to frame problems, inspire others, and give them answers to business questions they might never think of asking.
Bio
Jessica Stauth is a Managing Director in Fidelity Labs, an internal startup incubator with a mission to create new fintech businesses that drive growth for the firm. Dr. Stauth previously held roles as Managing Director of Portfolio Management, Research, and Trading at Quantopian, a crowd-sourced systematic hedge fund based in Boston, Director of Quant Product Strategy for Thomson Reuters (now Refinitiv), and as a Senior Quant Researcher at the StarMine Corporation, where she built global stock selection models including the design and implementation of the StarMine Short Interest model. Dr. Stauth holds a PhD in Biophysics from UC Berkeley, where her research focused on computational neuroscience.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
1. Anomaly Detection
Techniques and Best Practices
2019 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
www.QuantUniversity.com
sri@quantuniversity.com
2. 2
About us:
• Data Science, Quant Finance and
Machine Learning Startup
• Technologies using MATLAB, Python
and R
• Programs
▫ Analytics Certificate Program
▫ Fintech programs
• Platform
3. • Founder of QuantUniversity LLC. and
www.analyticscertificate.com
• Advisory and Consultancy for Financial Analytics
• Prior Experience at MathWorks, Citigroup and
Endeca and 25+ financial services and energy
customers.
• Regular Columnist for the Wilmott Magazine
• Author of forthcoming book
“Financial Modeling: A case study approach”
published by Wiley
• Charted Financial Analyst and Certified Analytics
Professional
• Teaches Analytics in the Babson College MBA
program and at Northeastern University, Boston
Sri Krishnamurthy
Founder and CEO
3
4. What is anomaly detection?
• Anomalies or outliers are data points within the datasets
that appear to deviate markedly from expected outputs
under certain assumptions.
• Anomaly detection is the process of finding patterns in
data that do not conform to a prior expected behavior.
• Anomaly detection is being employed more increasingly in
the presence of big data that is captured by sensors, social
media platforms, huge networks, etc. including energy
systems, medical devices, banking, network intrusion
detection, etc.
4
5. 5
• Outliers are data points that are considered out of the ordinary or
abnormal . This includes noise.
• Anomalies are a special kind of outlier that has significant/
critical/actionable information which could be of interest to
analysts.
Anomaly vs Outliers
1
2
All points not in clusters 1 & 2 are Outliers
Point B is an Anomaly (Both X and Y are large)
7. 7
• Fraud Detection
▫ Credit card fraud detection
– By owner or by operation
▫ Mobile phone fraud/anomaly detection
– Calling behavior, volume etc
▫ Insurance claim fraud detection
– Medical malpractice
– Auto insurance
▫ Insider trading detection
• E-commerce
▫ Pricing issues
▫ Network issues
Applications of Anomaly Detection
8. 8
• Intrusion detection:
▫ Detect malicious activity in computer systems
▫ This could be host-based or network-based
• Medical anomalies
Examples of Anomaly Detection
9. 9
• Manufacturing and sensors:
▫ Fault detection
▫ Heat, fire sensors
• Text data
▫ Novel topics, events
▫ Plagiarism
Examples of Anomaly Detection
10. Anomaly Detection Methods
• Most outlier detection methods generate an output
that can be categorized in one of the following
groups:
▫ Real-valued outlier score: which quantifies the tendency
of a data point being an outlier by assigning a score or
probability to it.
▫ Binary label: which is the result of using a threshold to
convert outlier scores to binary labels, inlier or outlier.
10
11. 11
1. Graphical approaches
2. Statistical approaches
3. Machine learning approaches
4. Time series methods
Illustration of four methodologies to Anomaly Detection
13. Graphical approaches
• Graphical methods utilize extreme value analysis, by which outliers
correspond to the statistical tails of probability distributions.
• Statistical tails are most commonly used for one dimensional
distributions, although the same concept can be applied to
multidimensional case.
• It is important to understand that all extreme values are outliers
but the reverse may not be true.
• For instance in one dimensional dataset of
{1,3,3,3,50,97,97,97,100}, observation 50 equals to mean and isn’t
considered as an extreme value, but since this observation is the
most isolated point, it should be considered as an outlier from a
generative perspective.
13
14. Box plot
• A standardized way of displaying the
variation of data based on the five
number summary, which includes
minimum, first quartile, median, third
quartile, and maximum.
• This plot does not make any assumptions
of the underlying statistical distribution.
• Any data not included between the
minimum and maximum are considered
as an outlier.
14
16. Scatter plot
• A mathematical diagram, which uses Cartesian coordinates for plotting ordered
pairs to show the correlation between typically two random variables.
• This plot is useful for detecting outliers.
• An outlier is defined as a data point that doesn't seem to fit with the rest of the
data points.
• In scatterplots, outliers of either intersection or union sets of two variables can
be shown.
16
18. 18
• In statistics, a Q–Q plot is a probability plot, which is a graphical
method for comparing two probability distributions by plotting their
quantiles against each other.
• If the two distributions being compared are similar, the points in the
Q–Q plot will approximately lie on the line y = x.
Q-Q plot
Source: Wikipedia
19. Adjusted quantile plot
• This plot identifies possible multivariate outliers by calculating the Mahalanobis
distance of each point from the center of the data.
• Multi-dimensional Mahalanobis distance between vectors x and y in !" can be
formulated as:
d(x,y) = x − y ,S./(x − y)
where x and y are random vectors of the same distribution with the covariance
matrix S.
• An outlier is defined as a point with a distance larger than some predetermined
value.
19
20. Adjusted quantile plot
• Before applying this method and many other parametric
multivariate methods, first we need to check if the data is
multivariate normally distributed using different
multivariate normality tests, such as Royston, Mardia, Chi-
square, univariate plots, etc.
• In R, we use the “mvoutlier” package, which utilizes
graphical approaches as discussed above.
20
21. Adjusted quantile plot
21
Min-Max normalization before diving into analysis
Multivariate normality test
Outlier Boolean vector identifies the
outliers
Alpha defines maximum thresholding proportion
See Graphical_Approach.ipynb
24. Symbol plot
• This plot plots two dimensional data, using robust Mahalanobis distances based
on the minimum covariance determinant(mcd) estimator with adjustment.
• Minimum Covariance Determinant (MCD) estimator looks for the subset of h
data points whose covariance matrix has the smallest determinant.
• Four drawn ellipsoids in the plot show the Mahalanobis distances correspond to
25%, 50%, 75% and adjusted quantiles of the chi-square distribution.
24
25. Symbol plot
25
See Graphical_Approach.ipynb
Parameter “quan” defines the amount of observations,
which are used for minimum covariance determinant
estimations. The default is 0.5.
Alpha defines the amount of observations used for
calculating the adjusted quantile.
27. Hypothesis testing
• This method draws conclusions about a sample point by testing whether it
comes from the same distribution as the training data.
• Statistical tests, such as the t-test and the ANOVA table, can be used on multiple
subsets of the data.
• Here, the level of significance, i.e, the probability of incorrectly rejecting the
true null hypothesis, needs to be chosen.
• To apply this method in R, “outliers” package, which utilizes statistical
tests, is used .
27
28. Chi-square test
• Chi-square test performs a simple test for detecting outliers of univariate data
based on Chi-square distribution of squared difference between data and
sample mean.
• In this test, sample variance counts as the estimator of the population variance.
• Chi-square test helps us identify the lowest and highest values, since outliers
can exist in both tails of the data.
28
30. Grubbs’ test
•This test is defined for the following hypotheses:
H0: There are no outliers in the data set
H1: There is exactly one outlier in the data set
•The Grubbs' test statistic is defined as:
! =
#$% % − ̅%
(
30
33. Scores
• Scores quantifies the tendency of a data point being an outlier by assigning it a
score or probability.
• The most commonly used scores are:
▫ Normal score:
!" #$%&'
()&'*&+* *%,-&)-.'
▫ T-student score:
(0#(1+) '#2 )
(1+)(0#4#)5)
▫ Chi-square score:
!" #$%&'
(*
2
▫ IQR score: 67-64
• By using “score” function in R, p-values can be returned instead of scores.
33
35. Scores
35
See Statistical_Approach.ipynb
By setting “prob” to any specific value, logical vector
returns the data points, whose probabilities are
greater than this cut-off value, as outliers.
By setting “type” to IQR, all values lower than first
and greater than third quartiles are considered and
difference between them and nearest quartile
divided by IQR is calculated.
36. 36
ü Linear regression
ü Piecewise/ segmented regression
ü Autoencoder-Decoder methods
ü Clustering-based approaches
ü PCA
37. Linear regression
• Linear regression investigates the linear relationships between variables and
predict one variable based on one or more other variables and it can be
formulated as:
! = #$ + &
'()
*
#'+'
where Y and +' are random variables, #' is regression coefficient and #$ is a
constant.
• In this model, ordinary least squares estimator is usually used to minimize the
difference between the dependent variable and independent variables.
37
38. Piecewise/segmented regression
• A method in regression analysis, in which the independent variable is
partitioned into intervals to allow multiple linear models to be fitted to data for
different ranges.
• This model can be applied when there are ‘breakpoints’ and clearly two
different linear relationships in the data with a sudden, sharp change in
directionality. Below is a simple segmented regression for data with two
breakpoints:
! = #$ + &'( ( < ('
! = #' + &*( ( > ('
where Y is a predicted value, X is an independent variable, #$ and #' are
constant values, &' and &* are regression coefficients, and (' and (* are
breakpoints.
38
40. Piecewise/segmented regression
• For this example, we use “segmented” package in R to first illustrate piecewise
regression for two dimensional data set, which has a breakpoint around z=0.5.
40
See Piecewise_Regression.ipynb
“pmax” is used for parallel maximization to
create different values for y.
42. Piecewise/segmented regression
• Finally, the outliers can be detected for each segment by setting some rules for
residuals of model.
42
See Piecewise_Regression.ipynb
Here, we set the rule for the residuals corresponding to z
less than 0.5, by which the outliers with residuals below
0.5 can be defined as outliers.
44. 44
• Goal is to have !" to approximate x
• Interesting applications such as
▫ Data compression
▫ Visualization
▫ Pre-train neural networks
Autoencoder
45. 45
Demo in Keras1
1. https://blog.keras.io/building-autoencoders-in-keras.html
2. https://keras.io/models/model/
46. 46
Principal Component Analysis
Principal component analysis (PCA) is a statistical
procedure that uses an orthogonal transformation to
convert a set of observations of possibly correlated
variables (entities each of which takes on various
numerical values) into a set of values of linearly
uncorrelated variables called principal components.
In Outlier analysis, we do principal component
analysis and computes p-values to test for outliers.
https://en.wikipedia.org/wiki/Principal_component_analysis
47. Clustering-based approaches
• These methods are suitable for unsupervised anomaly detection.
• They aim to partition the data into meaningful groups (clusters) based on the
similarities and relationships between the groups found in the data.
• Each data point is assigned a degree of membership for each of the clusters.
• Anomalies are those data points that:
▫ Do not fit into any clusters.
▫ Belong to a particular cluster but are far away from the cluster centroid.
▫ Form small or sparse clusters.
47
48. Clustering-based approaches
• These methods partition the data into k clusters by assigning each data point to
its closest cluster centroid by minimizing the within-cluster sum of squares
(WSS), which is:
!
"#$
%
!
&∈()
!
*#$
+
(-&* − /"*)1
where 2" is the set of observations in the kth cluster and /"* is the mean of jth
variable of the cluster center of the kth cluster.
• Then, they select the top n points that are the farthest away from their nearest
cluster centers as outliers.
48
50. Clustering-based approaches
• “Kmod” package in R is used to show the application of K-means model.
50
In this example the number of clusters is defined
through bend graph in order to pass to K-mod
function.
See Clustering_Approach.ipynb
54. Time-series method
• Time-series model is used to identify outliers only in univariate time-series
data.
• In order to apply this model, we use “Anomalydetection” package in R.
• This package was published by twitter for detecting anomalies in time-series
data in the presence of seasonality and an underlying trend using statistical
approaches.
• Since this package uses a specific algorithm to detect anomalies, we go over it
in details in the next slide.
55. Anomaly detection, R package
• Twitter’s R package: https://github.com/twitter/AnomalyDetection
• Seasonal Hybrid ESD (S-H-ESD), which builds upon the Generalized ESD test, is the
underlying algorithm of this package.
• The algorithm employs time series decomposition and statistical metrics with ESD test.
• Since the time-series data exhibit a huge variety of pattern, time-series decomposition,
which a statistical method, is used to decompose the data into its four components.
• The four components are:
1. Trend: refers to the long term progression of the series
2. Cyclical: refers to variations in recognizable cycles
3. Seasonal: refers to seasonal variations or fluctuations
4. Irregular: describes random, irregular influences
v Find more about ESD test in tutorial slides.
57. Summary
57
We have covered Anomaly detection
Introduction ü Definition of anomaly detection and its importance in energy systems
ü Different types of anomaly detection methods: Statistical, graphical and machine
learning methods
Graphical approach ü Graphical methods consist of boxplot, scatterplot, adjusted quantile plot and symbol
plot to demonstrate outliers graphically
ü The main assumption for applying graphical approaches is multivariate normality
ü Mahalanobis distance methods is mainly used for calculating the distance of a point
from a center of multivariate distribution
Statistical approach ü Statistical hypothesis testing includes of: Chi-square, Grubb’s test
ü Statistical methods may use either scores or p-value as threshold to detect outliers
Machine learning approach ü Both supervised and unsupervised learning methods can be used for outlier detection
ü Piece wised or segmented regression can be used to identify outliers based on the
residuals for each segment
ü In K-means clustering method outliers are defined as points which have doesn’t belong
to any cluster, are far away from the centroids of the cluster or shaping sparse clusters
ü In PCA, Auto-encoder decoder methods, we look at points that weren’t recovered closer
to the original points as anomalies
Time Series ü Temporal outlier detection to detect anomalies which is robust, from a statistical
standpoint, in the presence of seasonality and an underlying trend.
60. Thank you!
Sri Krishnamurthy, CFA, CAP
Founder and CEO
QuantUniversity LLC.
srikrishnamurthy
www.QuantUniversity.com
Contact
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
60