The presentation, first given on January 8, 2019, introduces the concept of interpretability in machine learning, and why we might care about it. It also introduces an example of an interpretable, sparse model which is lasso regression.
This document discusses different types of models used in computer science and engineering. It defines models as representations and abstractions of real or proposed systems. Models are classified as mathematical, physical, static, dynamic, linear, nonlinear, stable, unstable, analytical, numerical, descriptive, and prescriptive. Examples are provided for each type to illustrate the distinctions between mathematical, physical, static, dynamic, linear, nonlinear, stable, unstable, analytical, numerical, descriptive, and prescriptive models. Distributed-lag and autoregressive models are also introduced as special cases of regression models.
This document provides an overview of machine learning, including definitions of key terms, examples of supervised and unsupervised learning algorithms, and potential applications. It explains that machine learning involves using algorithms to find patterns in data and make predictions without being explicitly programmed. Some supervised learning algorithms discussed are decision trees, ordinary least squares regression, and support vector machines. Unsupervised learning algorithms mentioned include clustering, principal component analysis, and independent component analysis. The document concludes by asking if there are any questions.
What is pattern recognition (lecture 3 of 6)Randa Elanwar
In this series I intend to simplify a beautiful branch of computer science that we as humans use it in everyday life without knowing. Pattern recognition is a sub-branch of the computer vision research and is tightly related to digital signal processing research as well as machine learning and artificial intelligence.
[Emnlp] what is glo ve part i - towards data scienceNikhil Jaiswal
This document introduces GloVe (Global Vectors), a method for creating word embeddings that combines global matrix factorization and local context window models. It discusses how global matrix factorization uses singular value decomposition to reduce a term-frequency matrix to learn word vectors from global corpus statistics. It also explains how local context window models like skip-gram and CBOW learn word embeddings by predicting words from a fixed-size window of surrounding context words during training. GloVe aims to learn from both global co-occurrence patterns and local context to generate word vectors.
Major and Minor Elements of Object Modelsohailsaif
The document discusses the major and minor elements of an object model. The four major elements are abstraction, encapsulation, modularity, and hierarchy. Abstraction allows focusing on essential characteristics while hiding unnecessary details. Encapsulation binds code and data together and controls access through defined interfaces. Modularity partitions a program into independent and interchangeable modules. Hierarchy represents generalization/specialization relationships through classes and aggregation through objects. The minor elements include typing for type safety, persistence to maintain object state over time/space, and concurrency to support parallel execution threads.
UML class diagrams visually represent the structure of a system by showing the system's classes, their attributes, and the relationships between classes. They can show static variables and operations with underlines, associations between classes with lines and arrows to indicate directionality and multiplicity, generalization relationships to represent inheritance, realization relationships to represent interfaces being implemented, and aggregation and composition relationships to represent "has a" relationships between whole and part classes. UML class diagrams provide an overview of the key classes in a system and how they relate and interact.
This document proposes a bi-modal face recognition framework that uses principal component analysis (PCA) and fisher linear discriminant analysis (FLDA) to build an expression invariant model. It first discusses treating expressions as either noise or extra features. It then outlines using PCA to reduce dimensionality and extract low-dimensional facial features, and FLDA to maximize discrimination between classes. The document provides walkthroughs of how PCA works by identifying patterns in data and reducing dimensions without much information loss. It explains that FLDA finds projections that maximize distance between class means while accounting for within-class scatter. Fusion of modalities is mentioned but not described in detail.
This document discusses different types of models used in computer science and engineering. It defines models as representations and abstractions of real or proposed systems. Models are classified as mathematical, physical, static, dynamic, linear, nonlinear, stable, unstable, analytical, numerical, descriptive, and prescriptive. Examples are provided for each type to illustrate the distinctions between mathematical, physical, static, dynamic, linear, nonlinear, stable, unstable, analytical, numerical, descriptive, and prescriptive models. Distributed-lag and autoregressive models are also introduced as special cases of regression models.
This document provides an overview of machine learning, including definitions of key terms, examples of supervised and unsupervised learning algorithms, and potential applications. It explains that machine learning involves using algorithms to find patterns in data and make predictions without being explicitly programmed. Some supervised learning algorithms discussed are decision trees, ordinary least squares regression, and support vector machines. Unsupervised learning algorithms mentioned include clustering, principal component analysis, and independent component analysis. The document concludes by asking if there are any questions.
What is pattern recognition (lecture 3 of 6)Randa Elanwar
In this series I intend to simplify a beautiful branch of computer science that we as humans use it in everyday life without knowing. Pattern recognition is a sub-branch of the computer vision research and is tightly related to digital signal processing research as well as machine learning and artificial intelligence.
[Emnlp] what is glo ve part i - towards data scienceNikhil Jaiswal
This document introduces GloVe (Global Vectors), a method for creating word embeddings that combines global matrix factorization and local context window models. It discusses how global matrix factorization uses singular value decomposition to reduce a term-frequency matrix to learn word vectors from global corpus statistics. It also explains how local context window models like skip-gram and CBOW learn word embeddings by predicting words from a fixed-size window of surrounding context words during training. GloVe aims to learn from both global co-occurrence patterns and local context to generate word vectors.
Major and Minor Elements of Object Modelsohailsaif
The document discusses the major and minor elements of an object model. The four major elements are abstraction, encapsulation, modularity, and hierarchy. Abstraction allows focusing on essential characteristics while hiding unnecessary details. Encapsulation binds code and data together and controls access through defined interfaces. Modularity partitions a program into independent and interchangeable modules. Hierarchy represents generalization/specialization relationships through classes and aggregation through objects. The minor elements include typing for type safety, persistence to maintain object state over time/space, and concurrency to support parallel execution threads.
UML class diagrams visually represent the structure of a system by showing the system's classes, their attributes, and the relationships between classes. They can show static variables and operations with underlines, associations between classes with lines and arrows to indicate directionality and multiplicity, generalization relationships to represent inheritance, realization relationships to represent interfaces being implemented, and aggregation and composition relationships to represent "has a" relationships between whole and part classes. UML class diagrams provide an overview of the key classes in a system and how they relate and interact.
This document proposes a bi-modal face recognition framework that uses principal component analysis (PCA) and fisher linear discriminant analysis (FLDA) to build an expression invariant model. It first discusses treating expressions as either noise or extra features. It then outlines using PCA to reduce dimensionality and extract low-dimensional facial features, and FLDA to maximize discrimination between classes. The document provides walkthroughs of how PCA works by identifying patterns in data and reducing dimensions without much information loss. It explains that FLDA finds projections that maximize distance between class means while accounting for within-class scatter. Fusion of modalities is mentioned but not described in detail.
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
This document provides an overview of regularized regression techniques including ridge regression and lasso regression. It discusses when to use regularization to prevent overfitting, the tradeoff between bias and variance, and different types of regularization. Ridge regression minimizes the sum of squared coefficients while lasso regression minimizes the sum of absolute values of coefficients, allowing it to perform variable selection. Cross-validation is described as a method for selecting the optimal regularization parameter lambda. Advantages of regularization include improved generalization and interpretability. The document also provides an example using different regression models to predict diamond prices based on other variables in a dataset.
Binary dependent variable classification model in context of large databases: interpretation via visual tools such as partial dependency plots for 1, 2, 3, and 4 variables and other plots. Presentation focuses on overall and not individual observation interpretation, and is still work in progress.
This document provides a summary of visual tools for interpreting machine learning models based on partial dependency plots and their variants. It introduces novel visual concepts such as overall, collapsed, and marginal partial dependency plots and shows how they can help with model interpretation. An example is provided using a simple dataset with 6 predictors and a binary target variable to classify fraudulent vs. valid insurance claims. Model interpretation focuses on identifying important variables and their effects rather than explaining individual predictions.
These slides will help you to crack interviews for product-based companies if you are planning your career in Data Science, Artificial Intelligence, etc.
The document provides information on interpreting statistical and machine learning models. It discusses how interpretation depends on the intended audience and context. Various types of model interpretation are categorized, including univariate, bivariate, and multivariate interpretation. Interpretation methods like partial dependency plots are presented on a fraud detection example to show their use and limitations.
The document discusses machine learning algorithms including logistic regression, random forests, support vector machines (SVM), and analysis of variance (ANOVA). It provides descriptions of how each algorithm works, its advantages, and examples of applications. Logistic regression uses a sigmoid function to predict binary outcomes. Random forests create an ensemble of decision trees to make classifications. SVM finds the optimal separating hyperplane between classes. ANOVA splits variability in a data set into systematic and random factors.
This document provides an overview of machine learning and linear regression. It defines machine learning as a segment of artificial intelligence that allows computers to learn from data without being explicitly programmed. The document then discusses linear regression as an algorithm that finds a linear relationship between variables to predict future outcomes. It provides the linear regression equation and describes simple, multiple, and non-linear regression. Examples of using linear regression in various industries are also given along with best practices.
A tour of the top 10 algorithms for machine learning newbiesVimal Gupta
The document summarizes the top 10 machine learning algorithms for machine learning newbies. It discusses linear regression, logistic regression, linear discriminant analysis, classification and regression trees, naive bayes, k-nearest neighbors, and learning vector quantization. For each algorithm, it provides a brief overview of the model representation and how predictions are made. The document emphasizes that no single algorithm is best and recommends trying multiple algorithms to find the best one for the given problem and dataset.
This document discusses various techniques for regularization in machine learning models to prevent overfitting. It defines overfitting and underfitting, and describes ways to detect and address each issue. Specifically, it covers regularization techniques like Lasso regression and Ridge regression that add penalties to reduce the influence of features and complexity. Other techniques discussed include dropout, early stopping, data augmentation, and cross-validation.
The document discusses the importance of data quality, proper use of statistics, and correct interpretation of results in statistical analysis. It provides a 3 step approach: 1) Ensuring high quality data by addressing issues like missing values and outliers. 2) Appropriate use of statistical techniques after defining the variables and objectives clearly. Considering issues like correlation, normality, and model assumptions. 3) Careful interpretation of results while preserving the multidimensional nature of phenomena and considering partial correlations between variables. It emphasizes the need for collaboration between data miners, statisticians and domain experts for successful knowledge discovery.
This document discusses bias and variance in machine learning models. It begins by introducing bias as a stronger force that is always present and harder to eliminate than variance. Several examples of bias are provided. Through simulations of sampling from a normal distribution, it is shown that sample statistics like the mean and standard deviation are always biased compared to the population parameters. Sample size also impacts bias, with larger samples having lower bias. Variance refers to a model's ability to generalize, with higher variance indicating overfitting. The tradeoff between bias and variance is that reducing one increases the other. Several techniques for optimizing this tradeoff are discussed, including cross-validation, bagging, boosting, dimensionality reduction, and changing the model complexity.
This document provides an overview of machine learning concepts, including:
- Machine learning involves finding patterns in data to perform tasks without being explicitly programmed.
- Supervised learning involves using labeled examples to learn a function that maps inputs to outputs. Classification is a common supervised learning task.
- Popular classification algorithms include logistic regression, naive Bayes, decision trees, and support vector machines. Ensemble methods like random forests can improve performance.
- It is important to properly prepare data and evaluate a model's performance using metrics like accuracy, precision, recall, and ROC curves. Both underfitting and overfitting can impact a model's ability to generalize.
This document provides an overview of linear and logistic regression models. It discusses that linear regression is used for numeric prediction problems while logistic regression is used for classification problems with categorical outputs. It then covers the key aspects of each model, including defining the hypothesis function, cost function, and using gradient descent to minimize the cost function and fit the model parameters. For linear regression, it discusses calculating the regression line to best fit the data. For logistic regression, it discusses modeling the probability of class membership using a sigmoid function and interpreting the odds ratios from the model coefficients.
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONijaia
Function Approximation is a popular engineering problems used in system identification or Equation
optimization. Due to the complex search space it requires, AI techniques has been used extensively to spot
the best curves that match the real behavior of the system. Genetic algorithm is known for their fast
convergence and their ability to find an optimal structure of the solution. We propose using a genetic
algorithm as a function approximator. Our attempt will focus on using the polynomial form of the
approximation. After implementing the algorithm, we are going to report our results and compare it with
the real function output.
Mc0079 computer based optimization methods--phpapp02Rabby Bhatt
This document discusses mathematical models and provides examples of different types of mathematical models. It begins by defining a mathematical model as a description of a system using mathematical concepts and language. It then classifies mathematical models in several ways, such as linear vs nonlinear, deterministic vs probabilistic, static vs dynamic, discrete vs continuous, and deductive vs inductive vs floating. The document provides examples and explanations of each type of model. It also discusses using finite queuing tables to analyze queuing systems with a finite population size. In summary, the document outlines different ways to classify mathematical models and provides examples of applying various types of models.
Machine Learning basics
The document provides an introduction to machine learning concepts including:
- Machine learning algorithms can learn from data to estimate functions and make predictions.
- Key components of machine learning systems include datasets, models, objective functions, and optimization algorithms.
- Popular machine learning tasks include classification, regression, clustering, and dimensionality reduction.
- Classical machine learning methods like decision trees, k-nearest neighbors, and support vector machines aim to generalize from training data but struggle with high-dimensional or complex problems.
- Modern deep learning methods address these challenges through representation learning, stochastic gradient descent, and the ability to learn from large amounts of data using many parameters.
A mathematical model is a description of a real-world phenomenon or fact using mathematical concepts and language. There are several steps to developing a mathematical model, beginning with a mental model and progressing to a graphic, verbal, and fully mathematical model. Common types of mathematical models include linear models, polynomial models, and differential equations used to model physical systems. Key components of a mathematical model include dependent and independent variables as well as parameters. Models can be used to simulate real-world systems and processes.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
This document provides an overview of regularized regression techniques including ridge regression and lasso regression. It discusses when to use regularization to prevent overfitting, the tradeoff between bias and variance, and different types of regularization. Ridge regression minimizes the sum of squared coefficients while lasso regression minimizes the sum of absolute values of coefficients, allowing it to perform variable selection. Cross-validation is described as a method for selecting the optimal regularization parameter lambda. Advantages of regularization include improved generalization and interpretability. The document also provides an example using different regression models to predict diamond prices based on other variables in a dataset.
Binary dependent variable classification model in context of large databases: interpretation via visual tools such as partial dependency plots for 1, 2, 3, and 4 variables and other plots. Presentation focuses on overall and not individual observation interpretation, and is still work in progress.
This document provides a summary of visual tools for interpreting machine learning models based on partial dependency plots and their variants. It introduces novel visual concepts such as overall, collapsed, and marginal partial dependency plots and shows how they can help with model interpretation. An example is provided using a simple dataset with 6 predictors and a binary target variable to classify fraudulent vs. valid insurance claims. Model interpretation focuses on identifying important variables and their effects rather than explaining individual predictions.
These slides will help you to crack interviews for product-based companies if you are planning your career in Data Science, Artificial Intelligence, etc.
The document provides information on interpreting statistical and machine learning models. It discusses how interpretation depends on the intended audience and context. Various types of model interpretation are categorized, including univariate, bivariate, and multivariate interpretation. Interpretation methods like partial dependency plots are presented on a fraud detection example to show their use and limitations.
The document discusses machine learning algorithms including logistic regression, random forests, support vector machines (SVM), and analysis of variance (ANOVA). It provides descriptions of how each algorithm works, its advantages, and examples of applications. Logistic regression uses a sigmoid function to predict binary outcomes. Random forests create an ensemble of decision trees to make classifications. SVM finds the optimal separating hyperplane between classes. ANOVA splits variability in a data set into systematic and random factors.
This document provides an overview of machine learning and linear regression. It defines machine learning as a segment of artificial intelligence that allows computers to learn from data without being explicitly programmed. The document then discusses linear regression as an algorithm that finds a linear relationship between variables to predict future outcomes. It provides the linear regression equation and describes simple, multiple, and non-linear regression. Examples of using linear regression in various industries are also given along with best practices.
A tour of the top 10 algorithms for machine learning newbiesVimal Gupta
The document summarizes the top 10 machine learning algorithms for machine learning newbies. It discusses linear regression, logistic regression, linear discriminant analysis, classification and regression trees, naive bayes, k-nearest neighbors, and learning vector quantization. For each algorithm, it provides a brief overview of the model representation and how predictions are made. The document emphasizes that no single algorithm is best and recommends trying multiple algorithms to find the best one for the given problem and dataset.
This document discusses various techniques for regularization in machine learning models to prevent overfitting. It defines overfitting and underfitting, and describes ways to detect and address each issue. Specifically, it covers regularization techniques like Lasso regression and Ridge regression that add penalties to reduce the influence of features and complexity. Other techniques discussed include dropout, early stopping, data augmentation, and cross-validation.
The document discusses the importance of data quality, proper use of statistics, and correct interpretation of results in statistical analysis. It provides a 3 step approach: 1) Ensuring high quality data by addressing issues like missing values and outliers. 2) Appropriate use of statistical techniques after defining the variables and objectives clearly. Considering issues like correlation, normality, and model assumptions. 3) Careful interpretation of results while preserving the multidimensional nature of phenomena and considering partial correlations between variables. It emphasizes the need for collaboration between data miners, statisticians and domain experts for successful knowledge discovery.
This document discusses bias and variance in machine learning models. It begins by introducing bias as a stronger force that is always present and harder to eliminate than variance. Several examples of bias are provided. Through simulations of sampling from a normal distribution, it is shown that sample statistics like the mean and standard deviation are always biased compared to the population parameters. Sample size also impacts bias, with larger samples having lower bias. Variance refers to a model's ability to generalize, with higher variance indicating overfitting. The tradeoff between bias and variance is that reducing one increases the other. Several techniques for optimizing this tradeoff are discussed, including cross-validation, bagging, boosting, dimensionality reduction, and changing the model complexity.
This document provides an overview of machine learning concepts, including:
- Machine learning involves finding patterns in data to perform tasks without being explicitly programmed.
- Supervised learning involves using labeled examples to learn a function that maps inputs to outputs. Classification is a common supervised learning task.
- Popular classification algorithms include logistic regression, naive Bayes, decision trees, and support vector machines. Ensemble methods like random forests can improve performance.
- It is important to properly prepare data and evaluate a model's performance using metrics like accuracy, precision, recall, and ROC curves. Both underfitting and overfitting can impact a model's ability to generalize.
This document provides an overview of linear and logistic regression models. It discusses that linear regression is used for numeric prediction problems while logistic regression is used for classification problems with categorical outputs. It then covers the key aspects of each model, including defining the hypothesis function, cost function, and using gradient descent to minimize the cost function and fit the model parameters. For linear regression, it discusses calculating the regression line to best fit the data. For logistic regression, it discusses modeling the probability of class membership using a sigmoid function and interpreting the odds ratios from the model coefficients.
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONijaia
Function Approximation is a popular engineering problems used in system identification or Equation
optimization. Due to the complex search space it requires, AI techniques has been used extensively to spot
the best curves that match the real behavior of the system. Genetic algorithm is known for their fast
convergence and their ability to find an optimal structure of the solution. We propose using a genetic
algorithm as a function approximator. Our attempt will focus on using the polynomial form of the
approximation. After implementing the algorithm, we are going to report our results and compare it with
the real function output.
Mc0079 computer based optimization methods--phpapp02Rabby Bhatt
This document discusses mathematical models and provides examples of different types of mathematical models. It begins by defining a mathematical model as a description of a system using mathematical concepts and language. It then classifies mathematical models in several ways, such as linear vs nonlinear, deterministic vs probabilistic, static vs dynamic, discrete vs continuous, and deductive vs inductive vs floating. The document provides examples and explanations of each type of model. It also discusses using finite queuing tables to analyze queuing systems with a finite population size. In summary, the document outlines different ways to classify mathematical models and provides examples of applying various types of models.
Machine Learning basics
The document provides an introduction to machine learning concepts including:
- Machine learning algorithms can learn from data to estimate functions and make predictions.
- Key components of machine learning systems include datasets, models, objective functions, and optimization algorithms.
- Popular machine learning tasks include classification, regression, clustering, and dimensionality reduction.
- Classical machine learning methods like decision trees, k-nearest neighbors, and support vector machines aim to generalize from training data but struggle with high-dimensional or complex problems.
- Modern deep learning methods address these challenges through representation learning, stochastic gradient descent, and the ability to learn from large amounts of data using many parameters.
A mathematical model is a description of a real-world phenomenon or fact using mathematical concepts and language. There are several steps to developing a mathematical model, beginning with a mental model and progressing to a graphic, verbal, and fully mathematical model. Common types of mathematical models include linear models, polynomial models, and differential equations used to model physical systems. Key components of a mathematical model include dependent and independent variables as well as parameters. Models can be used to simulate real-world systems and processes.
Similar to Interpretability in ML & Sparse Linear Regression (20)
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Population Growth in Bataan: The effects of population growth around rural pl...
Interpretability in ML & Sparse Linear Regression
1. Department of Mathematics unchitta@ucla.edu
Interpretability in ML
& Sparse Linear Models
Unchitta Kanjanasaratool
Mentor: Denali Molitor
2. Department of Mathematics unchitta@ucla.edu
Several components: model transparency, holistic model interpretability,
modular-level interpretability, local interpretability for a single prediction or a
group of predictions.
Proposed levels of evaluating interpretability (Doshi-Velez & Kim, 2017):
1. Application level
2. Human level
3. Function level
Definition (non-mathematical): Interpretability is the degree to which a
human can consistently explain or interpret why a model makes certain
decisions.
Interpretability in ML: Why It’s Important
3. Department of Mathematics unchitta@ucla.edu
We may not always need interpretability e.g. in low-risk or extensively studied
problems, but knowing the why can help us learn more about why a model
might fail.
Transparency & Ethics: The EU for example mandates that automated decisions
must be explainable and should respect fundamental rights.
Interpretability also satisfies human curiosity and learning.
Interpretability helps explain why a model makes certain predictions.
Interpretability in ML: Why It’s Important
4. Department of Mathematics unchitta@ucla.edu
Assume a linear relationship between the input and response variables, for example
Y = β0
+ β1
X1
+ … + βp
Xp
+ ϵ ,
we can predict
ŷ = + X1
+ Xp
,
where the ’s are the coefficients in the residual sum of squares (RSS) minimization
problem, namely
β^
= arg minβ0…βp
𝚺i=0,i=p
(yi
- ŷi
)2
.
The linear coefficients (the β’s) make this model easy to interpret on a modular
level, and help us see the level of influence each variable has on the prediction.
Interpretable Models: Linear Regression
5. Department of Mathematics unchitta@ucla.edu
In practice however, interpreting linear models can still be hard if there are too many
variables. There are certain variations of linear regression that can help with this
problem, such as the sparse models. An example would be the “least absolute
shrinkage and selection operator,” or lasso regression which minimizes
RSS subject to 𝚺j=0,j=p
|βj
| ≤ s.
This is equivalent to minimizing, with regularization, the quantity
RSS + ƛ 𝚺j=0,j=p
|βj
|,
where the second term in the quantity is some constant lambda times the L1 norm of
the coefficients. (Lambda normally chosen using cross-validation to yield the minimal
β’s).
Sparse Models & Lasso Regression
6. Department of Mathematics unchitta@ucla.edu
Intuitively, lasso penalizes large models and selects small coefficients. Unintuitively
(but we will see why), it also yields a model where a number of the coefficients are 0
or essentially close to 0. This is an example of a sparse regression model, which
penalizes large models (i.e. lots of features) and performs variable selection.
The penalization is controlled via lambda: the larger it is, the bigger the sparsity. If
lambda is small enough, lasso will yield the same results as the least square
estimates (standard linear regression).
Sparse Models & Lasso Regression
7. Department of Mathematics unchitta@ucla.edu
Having fewer variables often mean better interpretability.
Your model is explained by only a number of significant features, which reduces
complexity and increases explainability. This is especially important when your data
has hundreds or thousands of features; the complexity may be beyond human
comprehension and there may not be enough observations.
Other methods for introducing sparsity to linear models include feature selection
processes, subset selection and step-wise procedures e.g. forward and backward
selection, sparse PCA.
Sparsity & Interpretability
8. Department of Mathematics unchitta@ucla.edu
Sparse Property of Lasso
The variable selection property of lasso regression comes from the L1 regularizer.
Consider the following figure in 2D problems. In higher dimensions the constraint
region becomes polytopes (shapes with flat sides/sharp edges).
Courtesy of Introduction to Statistical Learning, G. James, et al.
The red ellipses represent the
contours of the RSS of lasso (left)
and ridge regression (right). The solid
green areas are the corresponding
constraint functions,
and .
9. Department of Mathematics unchitta@ucla.edu
Example: UCI Bike Sharing Dataset
Here we are trying to predict the
number of bike rentals as a function of
11 variables. As lambda increases the
response variable depends on smaller
number of predictors.
10. Department of Mathematics unchitta@ucla.edu
Example: UCI Bike Sharing Dataset
Likely the most
significant
features
11. Department of Mathematics unchitta@ucla.edu
Introduction to Statistical Learning with
Applications in R, Gareth James, et al.
Resources
Interpretable Machine Learning: A Guide for
Making Black Box Models Explainable,
Christopher Molnar