- Regression analysis is used to predict the value of a dependent variable based on one or more independent variables and explain the relationship between them.
- There are different types of regression depending on whether the dependent variable is continuous or binary. Ordinary least squares regression is used for continuous dependent variables while logistic regression is used for binary dependent variables.
- The simple linear regression model describes the relationship between one independent and one dependent variable as a linear equation. This can be extended to multiple linear regression with more than one independent variable.
Simple Linear Regression: Step-By-StepDan Wellisch
This presentation was made to our meetup group found here.: https://www.meetup.com/Chicago-Technology-For-Value-Based-Healthcare-Meetup/ on 9/26/2017. Our group is focused on technology applied to healthcare in order to create better healthcare.
Simple Regression presentation is a
partial fulfillment to the requirement in PA 297 Research for Public Administrators, presented by Atty. Gayam , Dr. Cabling and Mr. Cagampang
Simple Linear Regression: Step-By-StepDan Wellisch
This presentation was made to our meetup group found here.: https://www.meetup.com/Chicago-Technology-For-Value-Based-Healthcare-Meetup/ on 9/26/2017. Our group is focused on technology applied to healthcare in order to create better healthcare.
Simple Regression presentation is a
partial fulfillment to the requirement in PA 297 Research for Public Administrators, presented by Atty. Gayam , Dr. Cabling and Mr. Cagampang
Introduces and explains the use of multiple linear regression, a multivariate correlational statistical technique. For more info, see the lecture page at http://goo.gl/CeBsv. See also the slides for the MLR II lecture http://www.slideshare.net/jtneill/multiple-linear-regression-ii
An introduction to logistic regression for physicians, public health students and other health workers. Logistic regression is a way to look at effect of a numeric independent variable on a binary (yes-no) dependent variable. For example, you can analyze or model the effect of birth weight on survival.
Introduces and explains the use of multiple linear regression, a multivariate correlational statistical technique. For more info, see the lecture page at http://goo.gl/CeBsv. See also the slides for the MLR II lecture http://www.slideshare.net/jtneill/multiple-linear-regression-ii
An introduction to logistic regression for physicians, public health students and other health workers. Logistic regression is a way to look at effect of a numeric independent variable on a binary (yes-no) dependent variable. For example, you can analyze or model the effect of birth weight on survival.
Irrespective of background (be it business, career, academics or any field that requires one to convey/sell ideas, to make a pitch and in general to communicate to a group with people), this is a preparatory (beginner level) material on ever-essential presentation skills.
Agile has become one of today's often used methodology in delivering customer experience via enterprise software and services. This presentation gives an overview of why, what and how to leverage enterprise agile practice to deliver superior CX. Though this presentation targets all agile practitioners and enthusiasts, people responsible and driving agile adoption in their organization in different capacities, may find this a useful summary.
Enterprise Data Management - Data Lake - A PerspectiveSaurav Mukherjee
This document discusses the evolution of the enterprise data management over the years, the challenges of the current CTOs and chief enterprise architects, and the concept of the Data Lake as a means to tackle such challenges. It also talks about some reference architectures and recommended tool set in today’s context.
Tire Pressure Monitoring System (TPMS) - An IntroductionSaurav Mukherjee
The modern automobile comes with many safety & comfort features. Most of these features are implemented via dedicated software and hardware. TPMS is one such system, which aims to provide the tire pressure information driver via the dashboard. There are few variants of TPMS - direct, indirect & hybrid. This document provides an introduction to those system variants and some discussion on why TPMS was made commercially available on certain types of modern day automobiles.
Competitive positioning and routes to market for a high-technology innovation...Saurav Mukherjee
Successfully entering the highly competitive and mature B2B market is a major challenge for any start-up. It is even more challenging when the product involves complex technical innovations. Marketing such technical innovations needs some considerations which are typical to the high-tech environment – the uncertainties around the product and the market, and the competitive volatility. A product success does not only depend on the innovation and its diffusion (whether or not the innovation is eventually accepted or rejected by the industry ecosystem), but also on its routes to market.
This research publication deals with these issues in the context of a UK-based technology start-up, which designs, manufactures and supplies a novel flywheel-based system for transient storage of electrical energy in industrial/commercial transport and energy applications.
Academic and practitioner perspectives relevant to these issues were investigated and developed through the review of literature or current thinking in those topic areas.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
2. What &Why
1
What is Regression?
Formulation of a functional relationship between a set of Independent or
Explanatory variables (X’s) with a Dependent or Response variable (Y).
Y = f(X)
Why Regression?
Knowledge of Y is crucial for decision making.
• Will he/she buy or not?
• Shall I offer him/her the loan or not?
• ………
X is available at the time of decision making and is related to Y, thus making
it possible to have a prediction of Y.
3. 2
Types of Regression
Y
Continuous
E.g., SalesVolume, Claim
Amount, % of sales growth
etc.
Binary (0/1)
E.g., Buy/No-Buy, Survive/Not-
Survive,Win/Loss etc
Ordinary Least Square
(OLS) Regression
Logistic Regression
4. • Regression analysis is used to:
• Predict the value of a
dependent variable based on
the value of at least one
independent variable
• Explain the impact of changes
in an independent variable on
the dependent variable
• Dependent variable: the
variable we wish to explain,
usually denoted by Y.
• Independent variable: the
variable used to explain the
dependent variable. Usually
denoted by X.
3
Intro to RegressionAnalysis
7. • Only one independent
variable, x
• Relationship between x
and y is described by a
linear function
• Changes in y are
assumed to be caused
by changes in x
6
Simple Linear Regression Model
23. 22
PredictedValue
ofY for Xi
Intercept = β0
Random Error for this x value
Y
X
uXββY 10 ++=
xi
Slope = β1
ui
Individual
person's marks
Population Linear Regression
24. 23
Linear component
Population y
intercept
Population Slope
Coefficient
Random
Error term, or
residual
Dependent
Variable
Independent
Variable
Random Error
component
uXββY 10 ++=
But can we actually get this equation?
If yes what all information we will need?
Population Regression Function
25. 24
PredictedValue
ofY for Xi
Intercept = β0
Random Error for this x value
Y
Xxi
Slope = β1
exbby 10 ++=
ei
ObservedValue
of y for xi
Sample Regression Function
26. 25
exbby 10i ++=
Estimate of the
regression intercept
Estimate of the
regression slope
Independent
variable
Error term
Notice the similarity with the Population Regression Function
Can we do something of the error term?
Sample Regression Function
27. • Represents the influence of all the variable which
we have not accounted for in the equation
• It represents the difference between the actual y
values as compared the predicted y values from the
Sample Regression Line
• Wouldn't it be good if we were able to reduce this
error term?
• By the way - what are we trying to achieve by
Sample Regression?
26
The ErrorTerm (Residual)
31. • The sum of the residuals from the least squares regression line is
zero.
• The sum of the squared residuals is a minimum.
Minimize( )
• The simple regression line always passes through the mean of
the y variable and the mean of the x variable
• The least squares coefficients are unbiased estimates of β0 and
β1
30
0)ˆ( =−∑ yy
2
)ˆ( yy∑ −
OLS Regression Properties
32. • Parameter Instability - This happens in situations where
correlations change over a period of time.This is very
common in financial markets where economic, tax,
regulatory, and political factors change frequently.
• Public knowledge of a specific regression relation may
cause a large number of people to react in a similar fashion
towards the variables, negating its future usefulness.
• If any of the regression assumptions are violated,
predicted dependent variables and hypothesis tests will not
hold valid.
31
Limitations of RegressionAnalysis
33. • In simple linear regression, the dependent variable was assumed to be
dependent on only one variable (independent variable)
• In General Multiple Linear Regression model, the dependent variable derives its
value from two or more than two variable.
• General Multiple Linear Regression model take the following form:
where:
Yi = ith observation of dependent variableY
Xki = ith observation of kth independent variable X
b0 = intercept term
bk = slope coefficient of kth independent variable
εi = error term of ith observation
n = number of observations
k = total number of independent variables
32
ikikiii XbXbXbbY ε+++++= .........22110
General Multiple Linear Regression Model
34. • As we calculated the intercept and the slope coefficient in case of
simple linear regression by minimizing the sum of squared errors,
similarly we estimate the intercept and slope coefficient in multiple
linear regression.
• Sum of Squared Errors is minimized and the slope coefficient is
estimated.
• The resultant estimated equation becomes:
• Now the error in the ith observation can be written as:
33
∑=
n
i
i
1
2
ε
kikiii XbXbXbbY
∧∧∧∧∧
++++= .........22110
++++−=−=
∧∧∧∧∧
kikiiiiii XbXbXbbYYY .........22110ε
Estimated Regression Equation
35. 34
Assumptions of Multiple Regression Model
• There exists a linear relationship between the dependent and
independent variables.
• The expected value of the error term, conditional on the
independent variables is zero.
• The error terms are homoskedastic, i.e. the variance of the
error terms is constant for all the observations.
• The expected value of the product of error terms is always
zero, which implies that the error terms are uncorrelated with
each other.
• The error term is normally distributed.
• The independent variables doesn't have any linear
relationships between each other.