We present a robust solution to the classification and variable selection problem when the dimension of the data, or number of predictor variables, may greatly exceed the number of observations. When faced with the problem of classifying objects given many measured attributes of the objects, the goal is to build a model that makes the most accurate predictions using only the most meaningful subset of the available measurements. The introduction of L1 regularized model fitting has inspired many approaches that simultaneously do model fitting and variable selection. If parametric models are employed, the standard approach is some form of regularized maximum likelihood estimation. This is an asymptotically efficient procedure under very general conditions - provided that the model is specified correctly. Correctly specifying a model, however, is not trivial. Even a few outliers among data drawn from an otherwise pure sample of data can result in a very poor model. In contrast, minimizing the integrated square error, while less efficient, proves to be robust to a fair amount of contamination. We propose to fit logistic models using this alternative criterion to address the possibility of model misspecification. The resulting method may be considered a robust variant of regularized maximum likelihood methods for high dimensional data.
We are from the internet - we know the value of open source. Hardware and storage is unfortunately real, but you can outsource it all. This talk will guide you through how to exploit cloud computing today to make you happier and more efficient.
We are from the internet - we know the value of open source. Hardware and storage is unfortunately real, but you can outsource it all. This talk will guide you through how to exploit cloud computing today to make you happier and more efficient.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 3
Robust parametric classification and variable selection with minimum distance estimation
1. Robust parametric classification and variable
selection with minimum distance estimation
Eric Chia,b,1 with David W. Scotta,2
a Department of Statistics,
Rice University
b Baylor College of Medicine
June 17, 2010
1
DOE DE-FG02-97ER25308
2
NSF DMS-09-07491
2. Outline
The binary regression problem
The L2 E Method
Estimation
Variable Selection
Simulations
Conclusion
3. Outline
The binary regression problem
The L2 E Method
Estimation
Variable Selection
Simulations
Conclusion
4. Logistic Regression
Suppose we wish to predict y ∈ {0, 1}n using X ∈ Rn×p .
The number of features p could be very large.
7. MLE is sensitive to outliers
Likelihood based choice
Outlier or not, MLE puts mass wherever data lies.
Cost: MLE puts mass over regions where there is no data.
8. MLE is sensitive to outliers
1.0 q qq q q qq q q q q
q qq q q qq
q q q qq qq qq qqqq q qq q
qq q q qq
qq q qq
q
0.8
0.6
Pr(Y=1)
0.4
0.2
0.0 qq q q q q q qq q q q q q q
q q q q q q q qq
qq q q q qq
q q
q q q q q qq
q q q
q q
−6 −4 −2 0 2 4 6
X
There are no ’ones’ between -4 and -2.
But P(Y = 1|X ∈ (−4, −2)) ↑.
There are no ’zeros’ between 4 and 6.
But P(Y = 0|X ∈ (4, 6)) ↑.
9. Outline
The binary regression problem
The L2 E Method
Estimation
Variable Selection
Simulations
Conclusion
10. The L2 distance as an alternative to the deviance loss.
g : unknown true density.
fθ : putative parametric density.
Find θ that minimizes the ISE
ˆ
θ = argmin (fθ (x) − g (x))2 dx.
θ
11. The L2 E Method
The equivalent empirical criterion:
n
ˆ 2
θ = argmin fθ (x)2 dx − fθ (Xi ) ,
θ n
i=1
where Xi ∈ Rp is the covariate vector of the i th observation.
The L2 Estimator or L2 E [Scott, 2001].
Familar quantity: Smoothing parameter selection in
non-parametric density estimation.
12. Density-power divergence
The L2 E and MLE are empirical minimizers of two different points
in a spectrum divergence measures [Basu et al, 1998].
1 1 1+γ
dγ (g , fθ ) = fθ1+γ (z) − 1 + g (z)fθγ (z) + g (z) dz,
γ γ
γ > 0 trades off efficiency for robustness.
γ = 1 =⇒ L2 loss.
γ → 0 =⇒ Kullback - Leibler divergence.
13. Robustness of the L2 distance
ˆ
θ = argmin (fθ (x) − g (x))2 dx.
θ
The L2 distance is zero-forcing:
g (x) = 0 forces fθ (x) = 0.
Puts premium on avoiding “false positives”.
L2 E balances:
mass where data is
v.s.
no mass where data is absent.
14. Partial Densities: An extra degree of freedom
Expand the search space [Scott, 2001]:
(wfθ (x) − g (x))2 dx.
Fit a parametric model to only a fraction, w , of the data
(Hopefully the fraction described well by the parametric
model!)
n
ˆ ˆ 2 2 2w
(θ, w ) = argmin w fθ (x) dx − fθ (Xi ) .
θ,w n
i=1
15. Logistic L2 E loss
Let F (u) = 1/(1 + exp(−u)), logistic function, then
n
ˆ ˆ w2
(β, w ) = argmin F (xiT β)2 + (1 − F (xiT β))2
β,w ∈[0,1] n i=1
n
w
−2 yi F (xiT β) + (1 − yi )(1 − F (xiT β)) .
n
i=1
16. Two dimensional example
4
2
0
X2
−2
−4
−5 0 5 10
X1
n = 300 and p = 2.
Three clusters each of size 100
Two are labelled 0
One is labelled 1
18. 5 5
0 0
X2
X2
−5 −5
0 5 10 0 5 10
X1 X1
(c) L2 E B: w = 0.666
ˆ (d) L2 E C: w = 0.668
ˆ
19. Outline
The binary regression problem
The L2 E Method
Estimation
Variable Selection
Simulations
Conclusion
20. The optimization problem
Challenges
L2 E loss is not convex.
Hessian of the L2 E loss is non-definite.
Standard Newton-Raphson fails.
Scalability and stability as p increases?
Solution
Majorization-Minimization
21. Majorization-Minimization
Strategy
Minimize a surrogate function, majorization.
Choose surrogate such that
↓ surrogate =⇒ ↓ objective.
surrogate is easier to minimize than objective.
22. Majorization-Minimization
Definition
Given f and g , real-valued functions on Rp , g majorizes f at x if
1. g (x) = f (x) and
2. g (u) ≥ f (u) for all u.
23. More
Lack of fit
Less
very bad optimal less bad
The spectrum of logistic models
24. More
Lack of fit
Less
very bad optimal less bad
The spectrum of logistic models
25. More
Lack of fit
Less
very bad optimal less bad
The spectrum of logistic models
26. More
Lack of fit
Less
very bad optimal less bad
The spectrum of logistic models
27. More
Lack of fit
Less
very bad optimal less bad
The spectrum of logistic models
28. More
Lack of fit
Less
very bad optimal less bad
The spectrum of logistic models
29. More
Lack of fit
Less
very bad optimal less bad
The spectrum of logistic models
30. More
Lack of fit
Less
very bad optimal less bad
The spectrum of logistic models
31. More
Lack of fit
Less
very bad optimal less bad
The spectrum of logistic models
32. More
Lack of fit
Less
very bad optimal less bad
The spectrum of logistic models
33. Quadratic majorization of the logistic L2 E loss
The loss has bounded curvature with respect to β. Fix w .
Majorize the exact second order Taylor expansion.
1 T −1 T (m)
β (m+1) = β (m) − (X X ) X Z ,
K
where
1 3 4 w
K≥ max wz − z 3 − 2wz 2 + z + .
4 z∈[−1,1] 2 2
K controls the step size. Its lower bound is related to the
maximum curvature of the loss.
Z (m) is a working response that depends on Y and X β (m) .
34. Outline
The binary regression problem
The L2 E Method
Estimation
Variable Selection
Simulations
Conclusion
35. Continuous variable selection with the LASSO
Minimize
p
“L2 E loss ”+λ |βi |
i=1
Penalized majorization of loss majorizes the penalized loss.
Minimize
p
“majorization of L2 E loss ”+λ |βi |
i=1
36. Coordinate Descent
Suppose X is standardized, then
(m+1) (m) 1 T (m)
βk = S βk − X Z ,λ ,
K (k)
where S is the soft threshold function
S(x, λ) = sign(x) max(|x| − λ, 0).
Extension to elastic net is straightforward.
37. Heuristic Model Selection
Regularization Path
Calculate penalized regression coefficients for range of λ values.
Information Criterion
For each λ, calculate deviance loss using L2 E coefficients and
add correction term (AIC and BIC).
Select model with lowest AIC/BIC value.
Use number of non-zero penalized regression coefficients for
degrees of freedom [Zou et al, 2007].
46. Simulations: Variable Selection
n = 200, p = 1000
Xi | Group 1 ∼ i.i.d. N(µ, σ)
Xi | Group 2 ∼ i.i.d. N(−µ, σ)
β = (1, 1, 1, 1, 0, . . . , 0)
Yi |Xi ∼ i.d. Bern(F (XiT β))
1,000 replicates.
Single Outlier
Moved along ray starting at centroid of one group and moving
away along (1, 1, 1, 1, 0, . . . , 0).
47. Average number of correct variables selected
AIC BIC
4
3
method
Expectation
MLE
2
L2E: w = 1
L2E: w = wopt
1
0
0 1.06 2.11 3.17 4.22 5.28 6.33 7.39 8.44 9.5 0 1.06 2.11 3.17 4.22 5.28 6.33 7.39 8.44 9.5
Outlier Relative Position
48. Average number of incorrect variables selected
AIC BIC
140
120
100
method
Expectation
80
MLE
L2E: w = 1
60 L2E: w = wopt
40
20
0
0 1.06 2.11 3.17 4.22 5.28 6.33 7.39 8.44 9.5 0 1.06 2.11 3.17 4.22 5.28 6.33 7.39 8.44 9.5
Outlier Relative Position
50. Outline
The binary regression problem
The L2 E Method
Estimation
Variable Selection
Simulations
Conclusion
51. Summary
MLE logistic regression is sensitive to implosion breakdown.
Estimation and variable selection are affected: contaminants
reduce SNR.
L2 E is robust because it is zero forcing.
Majorization-Minimization + Coordinate Descent facilitate
fast and stable optimization.
52. Future work
Is w worth optimizing over?
What is the correct AIC or BIC formulation?
What are the degrees of freedom in the L2 E loss model?
53. References
D.W. Scott.
Parametric statistical modeling by minimum integrated square
error.
Technometrics, 43(3):274–285, 2001.
A. Basu et al.
Robust and efficient estimation by minimising a density power
divergence.
Biometrika, 85(3):549–559, 1998
H. Zou et al.
On the “degrees of freedom” of the lasso.
Annals of Statistics, 35(5):2173–2192, 2007