The document discusses preparing data for predictive modeling questions regarding an e-commerce site's visitors. It describes cleaning and enhancing the raw data by creating new variables and aggregating or grouping existing variables. This includes adding session and click history, as well as registration information. The pre-processing increases the number of variables from 220 to 450 while dividing the large dataset into more targeted segments for analysis. Preliminary CART modeling finds the first click in a session is highly predictive, highlighting the value of the extensive data preparation.
[WSO2Con USA 2018] Patterns for Building Streaming AppsWSO2
This slide deck explains how to enable digital transformation through streaming analytics and how easily streaming applications can be implemented.
Watch video: https://wso2.com/library/conference/2018/07/wso2con-usa-2018-patterns-for-building-streaming-apps/
As the world moves to an era where data is the most valuable asset, being able to efficiently process large volumes of data in real time can help to gain a competitive advantage for businesses. Then, making business decision within milliseconds has become a mandatory need in many domains. Streaming analytics play a key role in making these decisions and is also a vital part of the digital transformation of businesses. WSO2 Stream Processor provides a high performance, lean, enterprise-ready streaming solution to solve data integration and analytics challenges. It provides real-time, interactive, predictive and batch processing technologies to deal with large volumes of data and generate meaningful decisions/output from it. This session explains how to enable digital transformation through streaming analytics and how easily streaming applications can be implemented.
- The Architecture of WSO2 Stream Processor
- Understanding streaming constructs
- Patterns of processing data in real time, incremental and with intelligence
- Applying patterns when building streaming apps
- Deployment patterns
[WSO2Con USA 2018] Patterns for Building Streaming AppsWSO2
This slide deck explains how to enable digital transformation through streaming analytics and how easily streaming applications can be implemented.
Watch video: https://wso2.com/library/conference/2018/07/wso2con-usa-2018-patterns-for-building-streaming-apps/
As the world moves to an era where data is the most valuable asset, being able to efficiently process large volumes of data in real time can help to gain a competitive advantage for businesses. Then, making business decision within milliseconds has become a mandatory need in many domains. Streaming analytics play a key role in making these decisions and is also a vital part of the digital transformation of businesses. WSO2 Stream Processor provides a high performance, lean, enterprise-ready streaming solution to solve data integration and analytics challenges. It provides real-time, interactive, predictive and batch processing technologies to deal with large volumes of data and generate meaningful decisions/output from it. This session explains how to enable digital transformation through streaming analytics and how easily streaming applications can be implemented.
- The Architecture of WSO2 Stream Processor
- Understanding streaming constructs
- Patterns of processing data in real time, incremental and with intelligence
- Applying patterns when building streaming apps
- Deployment patterns
Designing Exploding Websites (Euro IA 2009)Peter Boersma
More and more, online information is scattered over multiple online channels and websites.
At Info.nl, we have translated this development into our model of the Exploding Website. The model describes what strategic steps you need to take to define the right part of the service in the right context. We also redefined our methodology for user experience design based on the vision of the Exploding Website.
Building Social Enterprise with Ruby and SalesforceRaymond Gao
This was my presentation at the Oct 4th, Dallas Ruby Brigade night. It covers Lean Methodology and using DatabaseDotCom and Ruby
Source Code
https://github.com/raygao/DallasRubyPresentation
Why use big data tools to do web analytics? And how to do it using Snowplow a...yalisassoon
There are a number of mature web analytics products that have been on the market for ~20 years. Big data tools have only really taken off in the last 5 years. So why use big data tools mine web analytics data?
In this presentation, I explore the limitations of traditional approaches to web analytics, and explain how big data tools can be used to address those limitations and drive more value from the underlying data. I explain how a combination of Snowplow and Qubole can be used to do this in practice
Creating a Single Source of Truth: Leverage all of your data with powerful an...Looker
With a centralized data store, the entire spectrum of analytics is at your fingertips. Using Looker & Segment, you can collect, store and analyze everything from click-stream and event data to transactional and behavioral data in your data warehouse.
Some of the topics this webinar will include:
-The advantages of a centralized data warehouse with Segment Warehouses
-Creating a data model to get your company on the same page with Looker Blocks
-Putting it all together: Best practices for making your data accessible to your end users
1,2,3 … testing : is this thing on(line)? Meet your new Microsoft Testing toolsNETUsergroupZentrals
Is your environment acting the way you intended it to be, as in do your users see what you wanted them to see?
Is your app breaking under stress or even worse going down when components are acting up (or down in this case)?
In the past people were using Azure Devops Load Testing and related. But we all know some of these services have been deprecated. In this session you will be guided though all the options you have today lining out all the testing capabilities you have in the Microsoft Coding Universe.
Let’s take a stroll through the various options for load, chaos and automated testing in all things Microsoft devops and Azure. In doing so you will get to learn which services to use to improve reliability, performance usability and resilience of the applications you are building.
Mike Martin
As a Microsoft Technical Evangelist, Mike is an Azure goto for ISV’s (independent software vendors). He’s been active in the IT industry for more than 20 years and has performed almost all types of job profiles, going from coaching and leading a team to architecting and systems design and training. Today he’s primarily into the Microsoft Cloud Platform and Application Lifecycle Management. He’s not a stranger to both dev and IT Pro topics, they even call him the perfect hybrid solution.
Advance sql - window functions patterns and tricksEyal Trabelsi
This session hold data about sql in general and specifically window function patterns like:
- cumulitive sum
- Finding Series length
- sessionization
- Join on time interval
- deduplication
- time decay
Deep.bi - Real-time, Deep Data Analytics Platform For EcommerceDeep.BI
Deep.bi It helps ecommerce teams improve their performance by providing current and detailed insights.
It bring operational excellence and performance for:
- Category Managers / Merchandisers
- Marketers
- Customer service
- UX / Design Team
- Tech / IT
- Executives / Managers
Design Patterns for Building 360-degree Views with HBase and KijiHBaseCon
Speaker: Jonathan Natkins (WibiData)
Many companies aspire to have 360-degree views of their data. Whether they're concerned about customers, users, accounts, or more abstract things like sensors, organizations are focused on developing capabilities for analyzing all the data they have about these entities. This talk will introduce the concept of entity-centric storage, discuss what it means, what it enables for businesses, and how to develop an entity-centric system using the open-source Kiji framework and HBase. It will also compare and contrast traditional methods of building a 360-degree view on a relational database versus building against a distributed key-value store, and why HBase is a good choice for implementing an entity-centric system.
Functional Domain Modeling - The ZIO 2 WayDebasish Ghosh
Principled way to design and implement functional domain models using some of the patterns of domain driven design. DDD, as the name suggests, is focused towards the domain model and the patterns of architecture that it encourages are also based on how we think of interactions amongst the basic abstractions of the domain. Of course the primary goal of the talk is to discuss how Scala and Zio 2 can be a potent combination in realizing the implementation of such models. This is not a talk on FP, the focus will be on how to structure and modularise an application based on some of the patterns of DDD.
Gabriele Pividori presents to us the importance of communication within Domain-Driven Design, how the Ubiquitous Language can aid our software designs and the value the Aggregate pattern can bring.
A 101 level introduction to Web Analytics, and the concepts that underlie the technical aspects of digital measurement. Aimed at professionals new to the space, or business people looking to understand Web Analytics at a more detailed level.
srs of the e-commerce web store.
The customer will typically be required to provide or choose a billing address, a mailing address, a delivery option, and payment details like a credit card number. As soon as the order is placed, a customer notification email is delivered.
Improve Your Regression with CART and RandomForestsSalford Systems
Why You Should Watch: Learn the fundamentals of tree-based machine learning algorithms and how to easily fine tune and improve your Random Forest regression models.
Abstract: In this webinar we'll introduce you to two tree-based machine learning algorithms, CART® decision trees and RandomForests®. We will discuss the advantages of tree based techniques including their ability to automatically handle variable selection, variable interactions, nonlinear relationships, outliers, and missing values. We'll explore the CART algorithm, bootstrap sampling, and the Random Forest algorithm (all with animations) and compare their predictive performance using a real world dataset.
Designing Exploding Websites (Euro IA 2009)Peter Boersma
More and more, online information is scattered over multiple online channels and websites.
At Info.nl, we have translated this development into our model of the Exploding Website. The model describes what strategic steps you need to take to define the right part of the service in the right context. We also redefined our methodology for user experience design based on the vision of the Exploding Website.
Building Social Enterprise with Ruby and SalesforceRaymond Gao
This was my presentation at the Oct 4th, Dallas Ruby Brigade night. It covers Lean Methodology and using DatabaseDotCom and Ruby
Source Code
https://github.com/raygao/DallasRubyPresentation
Why use big data tools to do web analytics? And how to do it using Snowplow a...yalisassoon
There are a number of mature web analytics products that have been on the market for ~20 years. Big data tools have only really taken off in the last 5 years. So why use big data tools mine web analytics data?
In this presentation, I explore the limitations of traditional approaches to web analytics, and explain how big data tools can be used to address those limitations and drive more value from the underlying data. I explain how a combination of Snowplow and Qubole can be used to do this in practice
Creating a Single Source of Truth: Leverage all of your data with powerful an...Looker
With a centralized data store, the entire spectrum of analytics is at your fingertips. Using Looker & Segment, you can collect, store and analyze everything from click-stream and event data to transactional and behavioral data in your data warehouse.
Some of the topics this webinar will include:
-The advantages of a centralized data warehouse with Segment Warehouses
-Creating a data model to get your company on the same page with Looker Blocks
-Putting it all together: Best practices for making your data accessible to your end users
1,2,3 … testing : is this thing on(line)? Meet your new Microsoft Testing toolsNETUsergroupZentrals
Is your environment acting the way you intended it to be, as in do your users see what you wanted them to see?
Is your app breaking under stress or even worse going down when components are acting up (or down in this case)?
In the past people were using Azure Devops Load Testing and related. But we all know some of these services have been deprecated. In this session you will be guided though all the options you have today lining out all the testing capabilities you have in the Microsoft Coding Universe.
Let’s take a stroll through the various options for load, chaos and automated testing in all things Microsoft devops and Azure. In doing so you will get to learn which services to use to improve reliability, performance usability and resilience of the applications you are building.
Mike Martin
As a Microsoft Technical Evangelist, Mike is an Azure goto for ISV’s (independent software vendors). He’s been active in the IT industry for more than 20 years and has performed almost all types of job profiles, going from coaching and leading a team to architecting and systems design and training. Today he’s primarily into the Microsoft Cloud Platform and Application Lifecycle Management. He’s not a stranger to both dev and IT Pro topics, they even call him the perfect hybrid solution.
Advance sql - window functions patterns and tricksEyal Trabelsi
This session hold data about sql in general and specifically window function patterns like:
- cumulitive sum
- Finding Series length
- sessionization
- Join on time interval
- deduplication
- time decay
Deep.bi - Real-time, Deep Data Analytics Platform For EcommerceDeep.BI
Deep.bi It helps ecommerce teams improve their performance by providing current and detailed insights.
It bring operational excellence and performance for:
- Category Managers / Merchandisers
- Marketers
- Customer service
- UX / Design Team
- Tech / IT
- Executives / Managers
Design Patterns for Building 360-degree Views with HBase and KijiHBaseCon
Speaker: Jonathan Natkins (WibiData)
Many companies aspire to have 360-degree views of their data. Whether they're concerned about customers, users, accounts, or more abstract things like sensors, organizations are focused on developing capabilities for analyzing all the data they have about these entities. This talk will introduce the concept of entity-centric storage, discuss what it means, what it enables for businesses, and how to develop an entity-centric system using the open-source Kiji framework and HBase. It will also compare and contrast traditional methods of building a 360-degree view on a relational database versus building against a distributed key-value store, and why HBase is a good choice for implementing an entity-centric system.
Functional Domain Modeling - The ZIO 2 WayDebasish Ghosh
Principled way to design and implement functional domain models using some of the patterns of domain driven design. DDD, as the name suggests, is focused towards the domain model and the patterns of architecture that it encourages are also based on how we think of interactions amongst the basic abstractions of the domain. Of course the primary goal of the talk is to discuss how Scala and Zio 2 can be a potent combination in realizing the implementation of such models. This is not a talk on FP, the focus will be on how to structure and modularise an application based on some of the patterns of DDD.
Gabriele Pividori presents to us the importance of communication within Domain-Driven Design, how the Ubiquitous Language can aid our software designs and the value the Aggregate pattern can bring.
A 101 level introduction to Web Analytics, and the concepts that underlie the technical aspects of digital measurement. Aimed at professionals new to the space, or business people looking to understand Web Analytics at a more detailed level.
srs of the e-commerce web store.
The customer will typically be required to provide or choose a billing address, a mailing address, a delivery option, and payment details like a credit card number. As soon as the order is placed, a customer notification email is delivered.
Improve Your Regression with CART and RandomForestsSalford Systems
Why You Should Watch: Learn the fundamentals of tree-based machine learning algorithms and how to easily fine tune and improve your Random Forest regression models.
Abstract: In this webinar we'll introduce you to two tree-based machine learning algorithms, CART® decision trees and RandomForests®. We will discuss the advantages of tree based techniques including their ability to automatically handle variable selection, variable interactions, nonlinear relationships, outliers, and missing values. We'll explore the CART algorithm, bootstrap sampling, and the Random Forest algorithm (all with animations) and compare their predictive performance using a real world dataset.
Using CART For Beginners with A Teclo Example DatasetSalford Systems
Familiarize yourself with CART Decision Tree technology in this beginner's tutorial using a telecommunications example dataset from the 1990s. By the end of this tutorial you should feel comfortable using CART on your own with sample or real-world data.
TreeNet Tree Ensembles & CART Decision Trees: A Winning CombinationSalford Systems
Understand CART decision tree pros/cons, how TreeNet stochastic gradient boosting ca n help overcome single-tree challenges, and what the advantages are when using CART and TreeNet in combination for predictive modeling success.
When building a predictive model in SPM, you'll want to know exactly what you did to get your results. This short slide deck will show you how to review your work in the session logs.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
2. Question 1: Given a set of page views, will the visitor
view another page on the site or will the visitor leave?
Question 2: Given a set of page views, which product
will the visitor view in the remainder of the session?
Question 3: Given a set of purchases over a period of
time, characterize visitors who spend more than $12
(order amount) on an average order at the site?
Question 4 and 5: insight versions of questions 1 and
2
3. Gazelle.com is a leg-wear and leg-care web
retailer
Soft-launch: Jan 30, 2000
Hard-launch: Feb 29, 2000
◦ With an Ally McBeal TV ad on 28th and strong $10 off
promotion
Training set: 2 months
Test sets: one month (split into two test sets)
6. Web Application Server:
◦ Takes care of sessionizing (unique session ID is assigned
to each user’s session)
◦ Takes care of registration and logging in (unique
customer ID is assigned to each registered user)
◦ Uses dynamic HTML unique page view is identified via
a combination of page view template (*.jhtml or *.jsp)
and query parameters (product ID, vendor
ID, assortment ID, etc.)
All data supplied come directly from the web
application server logs
7. Acxiom enhancements: age, gender, marital status, vehicle
lifestyle, own/rent, etc.
Keynote records (about 250,000) removed. They hit the
home page 3 times a minute, 24 hours
Personal information was removed, including:
Names, addresses, login, credit card, phones, host
name/IP, verification question/answer. Cookie, e-mail
were obfuscated.
Test users were removed based on multiple criteria (e.g.
credit card number) not available to competitors
Original data and aggregated data (to session level) were
provided
8. CLICKS
◦ Contains click-stream information
◦ Each record is a page view
◦ Basis for questions 1 and 2
◦ Each sequence of clicks forms a session
◦ Session continues for any page view except for the last
ORDER LINES
◦ Contains order information
◦ Each record is an order line
◦ Order is a collection of order lines with the same order ID
◦ Basis for question 3
9. Session Session ID
Sequence Sequence number of the click
SessckID Session cookie ID
Visitnum Session visit count (from the
cookie)
Proctime Request processing time
Npage Session length (in clicks)
Sesslen Session length (in seconds)
Usragent Session user agent
Sessref Session referrer
Date and time variables
10. Contlvl* Page view template
Prodlvl* Product for product templates
Asslvl* Assortment for other than product
templates
Final Last page in this session
Refcont* Referring page content,
Refasrt* assortment, product
Refprod*
Weekday, hour, date Day, hour, and date variables
Other auxiliary variables
11. Brand Brand name (leg-wear products)
Maker Product maker
Audience Product audience
Basorfas Basic or fashion
Prodform Product form
Look Product look
Length, size Length, size, depth, etc.
Collect Collection
Texture Texture
Over 40 different variables, all
highly missing
12. CustID Customer ID
Nfail Number of failed logins
Sesslcnt Session login count
Account creation date/time
variables
13. Email User’s e-mail address
Freqwear What do you wear most frequently?
Howfind How did you find us?
Legcare Your favorite leg care brand
Sendmail Allow sending solicitation e-mails
Nadult Number of adults
Nkids Number of kids
State Residency state
19 variables in total, all
significantly missing
14. Owntruck, Own*** Truck Owner, RV owner, etc.
Ownbkcrd, Own*Crd Bank card holder, gas card holder,
etc.
Age Age
Marital Marital status
Mailresp Mail responder
Income Estimated income
Pool Presence of pool
61 variables in total, all highly
missing
15. Detailed understanding of all initial variables (this took
nearly 50% of total project time!!!)
Creating new predictors (features):
◦ Slicing a variable into a set of key dimensions
◦ Combining different levels into logical groups to reduce the total
number of categories
◦ Combining a set of variables into one informative dimension
◦ Creating new features to account for different layers of
aggregation (CLICKS vs. SESSIONS vs. ORDERS vs. USERS)
Developing the master KEEP list:
◦ Separating “illegal” predictors from “legitimate” ones
◦ Removing “useless” predictors (duplicates, nearly unary, extremely
missing)
16. Possibly dividing the large CLICKS data base into logical
segments (Registered Users vs. Unregistered, Short Sessions vs.
Long Sessions) with subsequent separate analyses and KEEP lists
within each segment
Defining the right CART model set-up (especially for PRIORS and
COSTS)
Running different CART models, analyzing the
performance, revisiting all of the steps above to
develop/test/reject new features
For questions 1 and 2 choose the models with the highest overall
score (adjusted for the evaluation criteria)
For question 3 learn as much as possible from all of the above
17. SESSION REFERRER (SESSREF)
◦ Carries on extremely useful information regarding where the user
was immediately before initiating a GAZELLE session
◦ In its raw form practically useless (too many levels)
SESSION USER AGENT (USRAGENT)
◦ Provides detailed information about the user’s browser, including
operating system and AOL/MSN connection
◦ Helps in identifying “artificial” users (ROBOTS)
◦ Again, practically useless in its raw form
18. Referring Host (REFWEB) is one of the dimensions
extracted after slicing the referrer
◦ Still has thousands of distinct levels (How many web-servers are
out there?!!)
◦ Want to simplify for a more informative use
◦ Same services may have a variety of different host names
New logical groups of REFWEB:
◦ Search engines (yahoo, excite, Google, etc.)
◦ Fashion sites (Fashion Mall, Shop Now)
◦ Bargain sites (Free Gifts, My Coupons, etc.)
◦ “Specialty Sites” (Winnie-Cooper!!!)
◦ NULL (session was initiated via a bookmark or direct typing in)
19. Answer: Winnie-Cooper is a 31 year old guy
who wears pantyhose and has a pantyhose
site. 8,700 visitors came from his site(!)
We might and we should expect different
behavior of “Winnie-Cooper” users from
everyone else
20. All PRODLVL*, CONTLVL*, and ASSLVL* variables turned
out to be nearly useless for direct modeling and awkward
for interpretation
PRODLVL1-PRODLVL3 represent different path levels in
the file system that point into individual product
information
Reasonable to combine all three paths into a unique
product descriptor PRODP
Similarly, generate unique assortment and content
descriptors CONTP and ASSP
Finally, combine all three descriptors into a single page
view descriptor (static equivalent of dynamic HTML)
VIEWCAT- an extremely useful interpretation variable
22. Adding clicks history
◦ 1-page back, 2-pages back, 3-pages back, etc.
◦ Dummies indicating if a given “epoch” page (home page,
registration page, Donna Karan, etc.) has already been viewed
prior to this click in the current session
◦ Counting the number of views up to this click in the session for
the selected “epoch” pages
Adding session history
◦ Identifying previous sessions based on either USERID (registered
users) or COOKIE (unregistered users)
◦ Collect history features from the previous sessions (first visit,
ordered ever, ordered previously, viewed Donna Karan products
before, etc.)
23. Adding registration history
◦ CUSTID is only defined for the session in which the user logged in
explicitly
◦ Using COOKIEID, it is possible to approximately identify
anonymous sessions that belong to a registered user
◦ Define REGISTEV=YES for any session that was initiated by a
registered user (even prior to the registration event)
◦ This also gives rise to additional related features (registered
previously , have yet to register, etc.)
Aggregating order lines
◦ Mostly for question 3: summarizing order-line characteristics to
the ORDERS and USER levels (buy socks, buy leg-care, buy black,
buy fashion, etc.)
24. Initial CLICKS data base had about 900,000
records and 220 variables
After the filtration and adding new features the
number of variables grew up to 450
Dividing CLICKS into segments seems justifiable
A CART run with DEPTH=2 reveals that
SEQUENCE=1 is the root splitter for both
question 1 and 2
There is something special about the first click!
25. (insert tables)
Conclusion: usually the first click also
becomes the last (come and leave!)
26. Again, running CART DEPTH=2 on SEQUENCE>1
shows that the next split separates registered ever
users from non-registered
Median session length (after removing lengths 1):
◦ Never registered 8
◦ Registered at some point 26
Naturally, a registered user will have a longer session
than a non-registered user
Similarly, CART finds additional splits on
SEQUENCE=2, SEQUENCE=[3,4,5], and SEQUENCE>5
28. Complete CLICKS data set should be used for training
to exploit all available information
However, the evaluation criterion for question 1 is
referring to the SESSION level: will the SESSION
continue?
Prior Probabilities should be set manually to SESSION
level values to adjust CART to the evaluation criterion
Since we have 5 different partitions of the CLICKS
database, 5 different sets of Prior Probabilities must
be specified
29. (insert image)
The “majority rule” is very hard to beat!
30. Checking rules for the right child of the root
split
(insert image)
Root split separates crawlers, robots, and
unusual browsers
31. Node Report- Further insight into the root
splitter
(insert image)
The root splitter is very powerful
The root splitter is also quite “unique”
32. Checking the second split
(insert image)
Second split distinguishes ever registered
users from anonymous users
33. (insert image)
This node has the largest probability of exit
This segment gives the best predictive power
35. (insert image)
Still quite difficult to predict!
36. Again, root split separates “killer” pages from
“killing” pages!
(insert image)
This variable might be difficult to interpret
CONTP1 could be used instead- much easier
to interpret
37. (insert image)
The tree is large, yet it is extremely difficult
to predict!
38. (insert image)
Still a very hard prediction problem
39. QUESTION: Given a set of page views, which product brand
(Hanes, Donna Karan, American Essentials, or None) will the
visitor view in the remainder of the session?
Evaluation Criterion:
◦ 2 units if the session visited the predicted brand;
◦ 1 unit if the session did not visit any of the three brands
and the prediction was none;
◦ O units otherwise;
◦ All sessions of length 1 will be excluded
40. For the given (truncated) session only “Single event”
8 outcomes are possible in the Will use
remainder of the session: directly
-O
◦ None brands are visited
-H
◦ Only Hanes visited
-D
◦ Only Donna Karan visited
-A
◦ Only American Essentials visited
◦ Only Hanes and Donna Karan -HD
◦ Only Donna and American Essentials -DA
◦ Only Hanes and American Essentials -HA
◦ All three visited -AHD
“Double or
Thus we have 8- level target that Triple” Must
should be mapped into 4 distinct convert to
levels for final prediction and scoring “single”
41. Outcome # sessions • Number of sessions in
the clipped click stream
O 72,269 •Only a few sessions
H 4,417 result to “double or
triple”
D 3,964
A 2,644 D
HD 325 D
DA 20 H
D
HA 153 Conversion rules
AHD 8 (defined by the
dominant class)
Total 90,800
42. Costs must be used to incorporate the
evaluation criterion
(insert table)
43. The segmentation is done using the same
technique that was used in Question 1
segmentation
(insert image)
44. First we try GINI splitting rule
(insert image)
The tree is big, but the accuracy is low
45. Now let’s try TWOING
(insert image)
All red nodes predict NONE
Smaller tree, better accuracy
46. Now focus on DONNA views
(insert image)
Now all red nodes predict DONNA
47. Variable Importance clarifies which variables
have the largest predictive power
(insert image)
48. Using TWOING splitting rule
(insert image)
Short sessions are the easiest to predict
49. For SEQUENCE=5 and above
(insert image)
Longer sessions are becoming quite
challenging
50. In the evaluation, each session with at least 2 clicks is
randomly clipped to a shorter length
This means that a session of length T>1 is clipped to
length S with probability 1/(T-1) for S=1,…,T-1
For each terminal node in a CART tree the training cases
must be weighted by the appropriate clipping probability
when calculating the within-node probabilities
Predict OTHER is its revised probability was more than
twice that of the highest probability brand; otherwise, the
highest probability brand was predicted
51. Characterize visitors who spend more than $12 on an
average order at the site
Small dataset of 3,465 purchases 1,831 customers
Insight question- no test set
Submission requirement:
◦ Report of up to 1,000 words and 10 graphs
◦ Business users should be able to understand report
◦ Observations should be correct and interesting average order
◦ tax>$2 implies heavy spender is not interesting nor actionable
58. Orders come from different cities:
◦ 80% of orders coming from San Francisco and Chicago are
heavy spenders
◦ 40% of orders coming from New York are heavy spenders
◦ Orders coming from elsewhere have only 25% of heavy
spenders
Color makes the difference- buying black products
implies heavy spender
Color is also related to city: orders from large cities
have higher percent of black color
59. Leg-care products are more expensive
◦ 75% of leg-care orders were above $12 threshold
◦ Only 25% of leg-wear orders were above $12
Pantyhose are more expensive than socks
Hanes and Donna Karan imply heavy
spenders
American Essentials imply low spenders
60. Referrals from Shopnow or Fashion Mall imply heavy
spenders, whereas MyCoupons are low spenders
Work Dress business casual or business imply heavy
spender
Sunday and Monday are heavy spender days
Income makes the difference, but not much:
◦ 40% of very high income users are high spenders
◦ 32% of very low income are also high spenders
◦ Only 25% are high spenders for everyone else
61. AOL users tend to spend less
◦ 20% of AOL users are high spenders
◦ 29% of the remaining users are high spenders
◦ This might also be explained by the lack of testing
GAZELLE site on the AOL browsers (incompatibility
issues)
Luxury vehicle implies heavy spender
(slightly)
65. The parts marked in red might safely be removed
Normally want to remove all graphic content
queries like
◦ GIF and JPG files
◦ Other unnecessary content
May reduce the size of the raw web log up to 5
times
The resulting web log now contains only the
most important pieces of information
67. The clean web log is still not suitable for any
data processing since each row basically
represents a set of characters
Need to convert each legitimate line into a
delimited list of data fields
Want to choose a delimiter that never occurs
in the raw web log
Will have to drop all corrupt log entries
69. Each line (entry) in a web log corresponds to a
single resource request
A user normally issues a set of logically
connected requests called SESSION
Multiple users may share the same time frame
intermixed log entries
HTTP protocol is MEMORYLESS need to solve
the problem of identifying different sessions
70. Using COOKIES to mark client’s station
◦ Might be disabled by “paranoid” clients
◦ Might be deleted or “exhausted”
Using URL encoding
◦ Requires dynamic HTML (ASP, JSP, Servlets)
Using pure web log heuristics
◦ Mostly matching on IP-address, user agent, and referrer
fields
◦ May be done on any server that supports extended log
format
◦ Somewhat imprecise in identifying sessions under certain
“unfavorable” conditions
71. Identifying END OF SESSION event
◦ Widely used 30-minute standard does not always work
Proxy Servers
◦ Multiple users may share the same IP address
◦ Cached requests are “forever lost” for the server’s log
◦
Dynamic IP addresses
◦ A single user might have different IP address within the same
session
Spiders and Robots
◦ Completely violate any human “logic” and may generate a lot of
“false” or “huge” sessions
Smart heuristic programming may reduce the ambiguity
down to as low as 5%
73. The “referrer” field provides extremely valuable
information about the user
“Referrer” links back to the previous request
Empty “referrer” indicates that the request was initiated
from a bookmark or by direct typing of the URL
Non empty “referrer” either links back to the previous
resource requested from the server or gives the URL of the
“outside” resource that the user was accessing
immediately before initiating the current session
Not easy to use directly: too many distinct values
74. Referrer just like any other URL might be decomposed into the
following pieces
◦ Protocol used (http, https, etc.)
◦ Site (domain name of the server)
◦ Domain (com, edu, uk, etc.)
◦ Resource (including path relative to the server)
◦ Port (usually missing for default assignment)
◦ Query string
Should consider grouping “sites into logical segments (search
engines, specialty sites, etc.)
May require further processing of the “resource” and “query
string” (key-words, categories, etc.)