This document discusses automatic visualization of big data and outlier detection in high-dimensional data. It introduces several techniques for visualizing and analyzing big data, including aggregating data to reduce the number of rows/columns, projecting high-dimensional data to lower dimensions, and using core sets to represent large datasets. The document also discusses different types of outliers and algorithms for outlier detection in multivariate data. It notes challenges with visualizing outliers in high dimensions and outlines a process for outlier detection that includes normalization, dimensionality reduction if needed, and fitting an exponential distribution to distances between points.
Kaggle Higgs Boson Machine Learning ChallengeBernard Ong
What It Took to Score the Top 2% on the Higgs Boson Machine Learning Challenge. A journey into advanced machine learning models ensembles stacking methods.
H2O World 2015
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
This is the documentation of the study-meeting in lab.
Tha book title is "Hands-On Machine Learning with Scikit-Learn and TensorFlow" and this is the chapter 8.
What is boosting
Boosting algorithm
Building models using GBM
Algorithm main Parameters
Finetuning models
Hyper parameters in GBM
Validating GBM models
Winning Kaggle 101: Introduction to StackingTed Xiao
An Introduction to Stacking by Erin LeDell, from H2O.ai
Presented as part of the "Winning Kaggle 101" event, hosted by Machine Learning at Berkeley and Data Science Society at Berkeley. Special thanks to the Berkeley Institute of Data Science for the venue!
H2O.ai: http://www.h2o.ai/
ML@B: ml.berkeley.edu
DSSB: http://dssberkeley.org
BIDS: http://bids.berkeley.edu/
Abstract: This PDSG workshop introduces basic concepts of splitting a dataset for training a model in machine learning. Concepts covered are training, test and validation data, serial and random splitting, data imbalance and k-fold cross validation.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
In this talk, Dmitry shares his approach to feature engineering which he used successfully in various Kaggle competitions. He covers common techniques used to convert your features into numeric representation used by ML algorithms.
Kaggle Higgs Boson Machine Learning ChallengeBernard Ong
What It Took to Score the Top 2% on the Higgs Boson Machine Learning Challenge. A journey into advanced machine learning models ensembles stacking methods.
H2O World 2015
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
This is the documentation of the study-meeting in lab.
Tha book title is "Hands-On Machine Learning with Scikit-Learn and TensorFlow" and this is the chapter 8.
What is boosting
Boosting algorithm
Building models using GBM
Algorithm main Parameters
Finetuning models
Hyper parameters in GBM
Validating GBM models
Winning Kaggle 101: Introduction to StackingTed Xiao
An Introduction to Stacking by Erin LeDell, from H2O.ai
Presented as part of the "Winning Kaggle 101" event, hosted by Machine Learning at Berkeley and Data Science Society at Berkeley. Special thanks to the Berkeley Institute of Data Science for the venue!
H2O.ai: http://www.h2o.ai/
ML@B: ml.berkeley.edu
DSSB: http://dssberkeley.org
BIDS: http://bids.berkeley.edu/
Abstract: This PDSG workshop introduces basic concepts of splitting a dataset for training a model in machine learning. Concepts covered are training, test and validation data, serial and random splitting, data imbalance and k-fold cross validation.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
In this talk, Dmitry shares his approach to feature engineering which he used successfully in various Kaggle competitions. He covers common techniques used to convert your features into numeric representation used by ML algorithms.
Neural network Intuition
Neural network and vocabulary
Neural network algorithm
Math behind neural network algorithm
Building the neural networks
Validating the neural network model
Neural network applications
Image recognition using neural networks
Jan vitek distributedrandomforest_5-2-2013Sri Ambati
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
H2O World - Top 10 Deep Learning Tips & Tricks - Arno CandelSri Ambati
H2O World 2015 - Arno Candel
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Machine learning and linear regression programmingSoumya Mukherjee
Overview of AI and ML
Terminology awareness
Applications in real world
Use cases within Nokia
Types of Learning
Regression
Classification
Clustering
Linear Regression Single Variable with python
This talk was given at H2O World 2018 NYC and can be viewed here: https://youtu.be/xlTSk72QHbs
Speaker's Bio:
Leland Wilkinson is Chief Scientist at H2O and Adjunct Professor of Computer Science at the University of Illinois Chicago. He received an A.B. degree from Harvard in 1966, an S.T.B. degree from Harvard Divinity School in 1969, and a Ph.D. from Yale in 1975. Wilkinson wrote the SYSTAT statistical package and founded SYSTAT Inc. in 1984. After the company grew to 50 employees, he sold SYSTAT to SPSS in 1994 and worked there for ten years on research and development of visualization systems. Wilkinson subsequently worked at Skytree and Tableau before joining H2O. Wilkinson is a Fellow of the American Statistical Association, an elected member of the International Statistical Institute, and a Fellow of the American Association for the Advancement of Science. He has won best speaker award at the National Computer Graphics Association and the Youden prize for best expository paper in the statistics journal Technometrics. He has served on the Committee on Applied and Theoretical Statistics of the National Research Council and is a member of the Boards of the National Institute of Statistical Sciences (NISS) and the Institute for Pure and Applied Mathematics (IPAM). In addition to authoring journal articles, the original SYSTAT computer program and manuals, and patents in visualization and distributed analytic computing, Wilkinson is the author (with Grant Blank and Chris Gruber) of Desktop Data Analysis with SYSTAT. He is also the author of The Grammar of Graphics, the foundation for several commercial and opensource visualization systems (IBMRAVE, Tableau, Rggplot2, and PythonBokeh).
ODSC India 2018: Topological space creation & Clustering at BigData scaleKuldeep Jiwani
Every data has an inherent natural geometry associated with it. We are generally influenced by how the world visually appears to us and apply the same flat Euclidean geometry to data. The data geometry could be curved, may have holes, distances cannot be defined in all cases. But if we still impose Euclidean geometry on it, then we may be distorting the data space and also destroying the information content inside it.
In the space of BigData world we have to regularly handle TBs of data and extract meaningful information from it. We have to apply many Unsupervised Machine Learning techniques to extract such information from the data. Two important steps in this process is building a topological space that captures the natural geometry of the data and then clustering in that topological space to obtain meaningful clusters.
This talk will walk through "Data Geometry" discovery techniques, first analytically and then via applied Machine learning methods. So that the listeners can take back, hands on techniques of discovering the real geometry of the data. The attendees will be presented with various BigData techniques along with showcasing Apache Spark code on how to build data geometry over massive data lakes.
Neural network Intuition
Neural network and vocabulary
Neural network algorithm
Math behind neural network algorithm
Building the neural networks
Validating the neural network model
Neural network applications
Image recognition using neural networks
Jan vitek distributedrandomforest_5-2-2013Sri Ambati
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
H2O World - Top 10 Deep Learning Tips & Tricks - Arno CandelSri Ambati
H2O World 2015 - Arno Candel
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Machine learning and linear regression programmingSoumya Mukherjee
Overview of AI and ML
Terminology awareness
Applications in real world
Use cases within Nokia
Types of Learning
Regression
Classification
Clustering
Linear Regression Single Variable with python
This talk was given at H2O World 2018 NYC and can be viewed here: https://youtu.be/xlTSk72QHbs
Speaker's Bio:
Leland Wilkinson is Chief Scientist at H2O and Adjunct Professor of Computer Science at the University of Illinois Chicago. He received an A.B. degree from Harvard in 1966, an S.T.B. degree from Harvard Divinity School in 1969, and a Ph.D. from Yale in 1975. Wilkinson wrote the SYSTAT statistical package and founded SYSTAT Inc. in 1984. After the company grew to 50 employees, he sold SYSTAT to SPSS in 1994 and worked there for ten years on research and development of visualization systems. Wilkinson subsequently worked at Skytree and Tableau before joining H2O. Wilkinson is a Fellow of the American Statistical Association, an elected member of the International Statistical Institute, and a Fellow of the American Association for the Advancement of Science. He has won best speaker award at the National Computer Graphics Association and the Youden prize for best expository paper in the statistics journal Technometrics. He has served on the Committee on Applied and Theoretical Statistics of the National Research Council and is a member of the Boards of the National Institute of Statistical Sciences (NISS) and the Institute for Pure and Applied Mathematics (IPAM). In addition to authoring journal articles, the original SYSTAT computer program and manuals, and patents in visualization and distributed analytic computing, Wilkinson is the author (with Grant Blank and Chris Gruber) of Desktop Data Analysis with SYSTAT. He is also the author of The Grammar of Graphics, the foundation for several commercial and opensource visualization systems (IBMRAVE, Tableau, Rggplot2, and PythonBokeh).
ODSC India 2018: Topological space creation & Clustering at BigData scaleKuldeep Jiwani
Every data has an inherent natural geometry associated with it. We are generally influenced by how the world visually appears to us and apply the same flat Euclidean geometry to data. The data geometry could be curved, may have holes, distances cannot be defined in all cases. But if we still impose Euclidean geometry on it, then we may be distorting the data space and also destroying the information content inside it.
In the space of BigData world we have to regularly handle TBs of data and extract meaningful information from it. We have to apply many Unsupervised Machine Learning techniques to extract such information from the data. Two important steps in this process is building a topological space that captures the natural geometry of the data and then clustering in that topological space to obtain meaningful clusters.
This talk will walk through "Data Geometry" discovery techniques, first analytically and then via applied Machine learning methods. So that the listeners can take back, hands on techniques of discovering the real geometry of the data. The attendees will be presented with various BigData techniques along with showcasing Apache Spark code on how to build data geometry over massive data lakes.
Prepared as a conference tutorial, MIC-Electrical, Athens, Greece, 5th April 2014, updated and delivered again in Beijing, China, 27 January 2015 to students from Complex Systems Group, CSRC and Dept. of Engineering Physics, Tsinghua University
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
20180526@Taiwan AI Academy, Professional Managers Class.
Covering important concepts of classical machine learning, in preparation for deep learning topics to follow. Topics include regression (linear, polynomial, gaussian and sigmoid basis functions), dimension reduction (PCA, LDA, ISOMAP), clustering (K-means, GMM, Mean-Shift, DBSCAN, Spectral Clustering), classification (Naive Bayes, Logistic Regression, SVM, kNN, Decision Tree, Classifier Ensembles, Bagging, Boosting, Adaboost) and Semi-Supervised learning techniques. Emphasis on sampling, probability, curse of dimensionality, decision theory and classifier generalizability.
Transition-based Dependency Parsing with Selectional BranchingJinho Choi
We present a novel approach, called selectional branching, which uses confidence estimates to decide when to employ a beam, providing the accuracy of beam search at speeds close to a greedy transition-based dependency parsing approach. Selectional branching is guaranteed to perform a fewer number of transitions than beam search yet performs as accurately. We also present a new transition-based dependency parsing algorithm that gives a complexity of O(n) for projective parsing and an expected linear time speed for non-projective parsing. With the standard setup, our parser shows an unlabeled attachment score of 92.96% and a parsing speed of 9 milliseconds per sentence, which is faster and more accurate than the current state-of-the-art transition- based parser that uses beam search.
Cluster analysis is a major tool in a number of applications in many fields of Business, Engineering & etc.(The odoridis and Koutroubas, 1999):
Data reduction.
Hypothesis generation.
Hypothesis testing.
Prediction based on groups.
Mathematics (from Greek μάθημα máthēma, “knowledge, study, learning”) is the study of topics such as quantity (numbers), structure, space, and change. There is a range of views among mathematicians and philosophers as to the exact scope and definition of mathematics
Similar to Automatic Visualization - Leland Wilkinson, Chief Scientist, H2O.ai (20)
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
Sandeep Singh, Head of Applied AI Computer Vision, Beans.ai
H2O Open Source GenAI World SF 2023
In the modern era of machine learning, leveraging both open-source and closed-source solutions has become paramount for achieving cutting-edge results. This talk delves into the intricacies of seamlessly integrating open-source Large Language Model (LLM) solutions like Vicuna, Falcon, and Llama with industry giants such as ChatGPT and Google's Palm. As the demand for fine-tuned and specialized datasets grows, it is imperative to understand the synergy between these tools. Attendees will gain insights into best practices for building and enriching datasets tailored for fine-tuning tasks, ensuring that their LLM projects are both robust and efficient. Through real-world examples and hands-on demonstrations, this talk will equip attendees with the knowledge to harness the power of both open and closed-source tools in a coherent and effective manner.
Patrick Hall, Professor, AI Risk Management, The George Washington University
H2O Open Source GenAI World SF 2023
Language models are incredible engineering breakthroughs but require auditing and risk management before productization. These systems raise concerns about toxicity, transparency and reproducibility, intellectual property licensing and ownership, disinformation and misinformation, supply chains, and more. How can your organization leverage these new tools without taking on undue or unknown risks? While language models and associated risk management are in their infancy, a small number of best practices in governance and risk are starting to emerge. If you have a language model use case in mind, want to understand your risks, and do something about them, this presentation is for you!
Dr. Alexy Khrabrov, Open Source Science Community Director, IBM
H2O Open Source GenAI World SF 2023
In this talk, Dr. Alexy Khrabrov, recently elected Chair of the new Generative AI Commons at Linux Foundation for AI & Data, outlines the OSS AI landscape, challenges, and opportunities. With new models and frameworks being unveiled weekly, one thing remains constant: community building and validation of all aspects of AI is key to reliable and responsible AI we can use for business and society needs. Industrial AI is one key area where such community validation can prove invaluable.
Michelle Tanco, Head of Product, H2O.ai
H2O Open Source GenAI World SF 2023
Learn how the makers at H2O.ai are building internal tools to solve real use cases using H2O Wave and h2oGPT. We will walk through an end-to-end use case and discuss how to incorporate business rules and generated content to rapidly develop custom AI apps using only Python APIs.
Applied Gen AI for the Finance Vertical Sri Ambati
Megan Kurka, Vice President, Customer Data Scientist, H2O.ai
H2O Open Source GenAI World SF 2023
Discover the transformative power of Applied Gen AI. Learn how the H2O team builds customized applications and workflows that integrate capabilities of Gen AI and AutoML specifically designed to address and enhance financial use cases. Explore real world examples, learn best practices, and witness firsthand how our innovative solutions are reshaping the landscape of finance technology.
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
Pascal Pfeiffer, Principal Data Scientist, H2O.ai
H2O Open Source GenAI World SF 2023
This talk dives into the expansive ecosystem of Large Language Models (LLMs), offering practitioners an insightful guide to various relevant applications, from natural language understanding to creative content generation. While exploring use cases across different industries, it also honestly addresses the current limitations of LLMs and anticipates future advancements.
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
En esta reunión virtual, damos una introducción a la plataforma de aprendizaje automático de código abierto número 1, H2O-3 y te mostramos cómo puedes usarla para desarrollar modelos para resolver diferentes casos de uso.
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
Numerai is an open, crowd-sourced hedge fund powered by predictions from data scientists around the world. In return, participants are rewarded with weekly payouts in crypto.
In this talk, Joe will give an overview of the Numerai tournament based on his own experience. He will then explain how he automates the time-consuming tasks such as testing different modelling strategies, scoring new datasets, submitting predictions to Numerai as well as monitoring model performance with H2O Driverless AI and R.
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
In this session, you will learn about what you should do after you’ve taken an AI transformation baseline. Over the span of this session, we will discuss the next steps in moving toward AI readiness through alignment of talent and tools to drive successful adoption and continuous use within an organization.
To find additional videos on AI courses, earn badges, join the courses at H2O.ai Learning Center: https://training.h2o.ai/products/ai-foundations-course
To find the Youtube video about this presentation: https://youtu.be/K1Cl3x3rd8g
Speaker:
Chemere Davis (H2O.ai - Senior Data Scientist Training Specialist)
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
2. Visualizing Big Data
• Complexity: Many functions are polynomial or exponential
• Curse of Dimensionality: distances tend toward constant as
• Chokepoint: Cannot send big data over the wire
• Real Estate: Cannot plot big data on the client
• Cheesy solutions in 2D
• Pixelate (too complex for higher dimensions)
• Project (usually violates triangle inequality for )
• Image maps (OK for popups and simple links, not for EDA)
• Viable solutions
• Aggregate (big n) to a few thousand rows
• Project (big p) to a few dozen columns
8. • An anomaly is an observation inconsistent with a set of beliefs.
• The anomaly depends on these beliefs
• An outlier is an observation inconsistent with a set of points.
• The points are presumed generated by a probabilistic process in a vector space.
• All outliers are anomalies but not all anomalies are outliers
• Some anomalies are logical or mathematical
• Outliers are probabilistic
• Outlier detection has more than a 200 year history.
• The goal was to reduce bias in models
• The goal today is to learn interesting stuff from examining outliers
• Statisticians no longer delete outliers. They use robust methods.
Outliers
9. Outliers
• Barnett & Lewis (1994), Outliers in Statistical Data.
• Rousseeuw & Leroy (1987). Robust Regression & Outlier Detection.
• Hartigan (1975) Clustering Algorithms.
Beauty is truth, truth beauty,—that is all
Ye know on earth, and all ye need to know.
12. Outliers
1. Map categorical variables to continuous values (SVD).
2. If p large, use random projections to reduce dimensionality.
3. Normalize columns on [0, 1]
4. If n large, aggregate
• If p = 2, you could use gridding or hex binning
• But general solution is based on Hartigan’s Leader algorithm
5. Compute nearest neighbor distances between points.
6. Fit exponential distribution to largest distances.
7. Reject points in upper tail of this distribution.
16. Scagnostics
• We characterize a scatterplot (2D point set) with nine measures.
• We base our measures on three geometric graphs.
• Convex Hull
• Alpha Shape
• Minimum Spanning Tree
18. Scagnostics
X
Shape
13
Shape
2) Convex: ratio of area of alpha shape to the area of convex hull.
3) Skinny: ratio of perimeter to area of the alpha shape.
4) Stringy: ratio of diameter of MST to length of MST. Similar to skinny.
The diameter of a graph is the longest shortest path between a pair of its vertices.
Convex: area of alpha shape divided by area of convex hull
Skinny: ratio of perimeter to area of the alpha shape
Stringy: ratio of 2-degree vertices in MST to number of vertices > 1-degree
19. Scagnostics
X
Density
Skewed: ratio of (Q90 - Q50) / (Q90 - Q10),
where quantiles are on MST edge lengths
15
Density
7) Skewed: ratio of (Q90 - Q50) / (Q90 - Q10), where the quantiles are taken from the
MST edge lengths.
8) Clumpy: 1 minus the ratio of the longest edge in the largest runt (blue) to the
length of runt cutting edge (red).
The Hartigan RUNT statistic for a node of a hierarchical clustering tree is the
smaller of the number of leaves owned by each of its two children. We derive this
for each vertex in the MST using an edge-cutting algorithm.
largest runt
longest edge
in runt
Clumpy: 1 minus the ratio of the longest edge in the largest runt (blue) to the
length of runt-cutting edge (red)
15
Density
7) Skewed: ratio of (Q90 - Q50) / (Q90 - Q10), where the quantiles are taken from the
MST edge lengths.
8) Clumpy: 1 minus the ratio of the longest edge in the largest runt (blue) to the
length of runt cutting edge (red).
The Hartigan RUNT statistic for a node of a hierarchical clustering tree is the
smaller of the number of leaves owned by each of its two children. We derive this
for each vertex in the MST using an edge-cutting algorithm.
largest runt
longest edge
in runt
Outlying: proportion of total MST length due to edges adjacent to outliers
20. Scagnostics
X
Density
Sparse: 90th percentile of distribution of edge lengths in MST
Striated: proportion of all vertices in the MST that are degree-2 and have a
cosine between adjacent edges less than -.75
26. Future Plans
1. Add brushing to graphics
2. Create case-weight vector for DAI (0 = exclude)
3. Suggest additional features to pass to DAI
4. Animate visualizations
5. Add natural language explanations to graphics.