Sound, Search, and Semantics: How Form Follows FunctionUpasna Gautam
It’s not breaking news that voice search is the emerging technology of greatest interest, but what hasn’t been demystified is how it works. This session will uncover how the algorithm functions at a structural level by dissecting Google’s Automatic Speech Recognition, Google's Quality Metrics for voice search, and deciphering the nuances of the spoken word as they apply to semantic search.
At the MnSearch Snippet #13 event held at Spyder Trap in Minneapolis, MN on April 30, 2014, Joe Wilebski presented his slidedeck "Panda, Penguin, Hummingbird."
For e-commerce applications, matching users with the items they want is the name of the game. If they can't find what they want then how can they buy anything?! Typically this functionality is provided through search and browse experience. Search allows users to type in text and match against the text of the items in the inventory. Browse allows users to select filters and slice-and-dice the inventory down to the subset they are interested in. But with the shift toward mobile devices, no one wants to type anymore - thus browse is becoming dominant in the e-commerce experience.
But there's a problem! What if your inventory is not categorized? Perhaps your inventory is user generated or generated by external providers who don't tag and categorize the inventory. No categories and no tags means no browse experience and missed sales. You could hire an army of taxonomists and curators to tag items - but training and curation will be expensive. You can demand that your providers tag their items and adhere to your taxonomy - but providers will buck this new requirement unless they see obvious and immediate benefit. Worse, providers might use tags to game the system - artificially placing themselves in the wrong category to drive more sales. Worst of all, creating the right taxonomy is hard. You have to structure a taxonomy to realistically represent how your customers think about the inventory.
Eventbrite is investigating a tantalizing alternative: using a combination of customer interactions and machine learning to automatically tag and categorize our inventory. As customers interact with our platform - as they search for events and click on and purchase events that interest them - we implicitly gather information about how our users think about our inventory. Search text effectively acts like a tag and a click on an event card is a vote for that clicked event is representative of that tag. We are able to use this stream of information as training data for a machine learning classification model; and as we receive new inventory, we can automatically tag it with the text that customers will likely use when searching for it. This makes it possible to better understand our inventory, our supply and demand, and most importantly this allows us to build the browse experience that customers demand.
In this talk I will explain in depth the problem space and Eventbrite's approach in solving the problem. I will describe how we gathered training data from our search and click logs, and how we built and refined the model. I will present the output of the model and discuss both the positive results of our work as well as the work left to be done. Those attending this talk will leave with some new ideas to take back to their own business.
Mobile-first goes beyond simply indexing in a search engine. It has several meanings, which traverse user-behaviour, web design, adoption in different territories, adoption amongst user segments, adoption in different verticals. We need to be aware of these fundamentals changes in search behaviour and adapt quickly.
Google Hummingbird - What does it mean for SEO?Chris Schweppe
What is the Google Hummingbird algorithm's impact on SEO? Understanding the intent behind Hummingbird is key to predicting what it will mean to SEO today and tomorrow. In this POV presentation from Ogilvy subsidiary Global Strategies, Ken Shults outlines the implications of Google's focus shift from keywords to entities.
Sound, Search, and Semantics: How Form Follows FunctionUpasna Gautam
It’s not breaking news that voice search is the emerging technology of greatest interest, but what hasn’t been demystified is how it works. This session will uncover how the algorithm functions at a structural level by dissecting Google’s Automatic Speech Recognition, Google's Quality Metrics for voice search, and deciphering the nuances of the spoken word as they apply to semantic search.
At the MnSearch Snippet #13 event held at Spyder Trap in Minneapolis, MN on April 30, 2014, Joe Wilebski presented his slidedeck "Panda, Penguin, Hummingbird."
For e-commerce applications, matching users with the items they want is the name of the game. If they can't find what they want then how can they buy anything?! Typically this functionality is provided through search and browse experience. Search allows users to type in text and match against the text of the items in the inventory. Browse allows users to select filters and slice-and-dice the inventory down to the subset they are interested in. But with the shift toward mobile devices, no one wants to type anymore - thus browse is becoming dominant in the e-commerce experience.
But there's a problem! What if your inventory is not categorized? Perhaps your inventory is user generated or generated by external providers who don't tag and categorize the inventory. No categories and no tags means no browse experience and missed sales. You could hire an army of taxonomists and curators to tag items - but training and curation will be expensive. You can demand that your providers tag their items and adhere to your taxonomy - but providers will buck this new requirement unless they see obvious and immediate benefit. Worse, providers might use tags to game the system - artificially placing themselves in the wrong category to drive more sales. Worst of all, creating the right taxonomy is hard. You have to structure a taxonomy to realistically represent how your customers think about the inventory.
Eventbrite is investigating a tantalizing alternative: using a combination of customer interactions and machine learning to automatically tag and categorize our inventory. As customers interact with our platform - as they search for events and click on and purchase events that interest them - we implicitly gather information about how our users think about our inventory. Search text effectively acts like a tag and a click on an event card is a vote for that clicked event is representative of that tag. We are able to use this stream of information as training data for a machine learning classification model; and as we receive new inventory, we can automatically tag it with the text that customers will likely use when searching for it. This makes it possible to better understand our inventory, our supply and demand, and most importantly this allows us to build the browse experience that customers demand.
In this talk I will explain in depth the problem space and Eventbrite's approach in solving the problem. I will describe how we gathered training data from our search and click logs, and how we built and refined the model. I will present the output of the model and discuss both the positive results of our work as well as the work left to be done. Those attending this talk will leave with some new ideas to take back to their own business.
Mobile-first goes beyond simply indexing in a search engine. It has several meanings, which traverse user-behaviour, web design, adoption in different territories, adoption amongst user segments, adoption in different verticals. We need to be aware of these fundamentals changes in search behaviour and adapt quickly.
Google Hummingbird - What does it mean for SEO?Chris Schweppe
What is the Google Hummingbird algorithm's impact on SEO? Understanding the intent behind Hummingbird is key to predicting what it will mean to SEO today and tomorrow. In this POV presentation from Ogilvy subsidiary Global Strategies, Ken Shults outlines the implications of Google's focus shift from keywords to entities.
State of Search 2017 - Semantics and Science - Upasna GautamUpasna Gautam
What is latent semantic indexing, how does Google use it, and how understanding this core functionality of the Google algorithm will help you create better content.
Conductor C3 2019 - A Sound Advantage: How Voice Search Works & Works For YouConductor
Upasna Gautam, Manager, Search, Ziff Davis
Become fluent in voice search form, function, and success. Learn how Google processes sound and conducts speech modeling; the four voice search quality metrics Google applies; and how to enhance your own strategy with tactics for targeting content by searcher need states.
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...Paul Shapiro
For a detailed recap: http://pshapi.ro/SemanticKWR
My BrightonSEO presentation...
1st Half: What is semantic search and why does it matter to SEOs.
2nd Half: Using KNIME to do semantic keyword research using SERP and Twitter data.
The New Content SEO - Sydney SEO Conference 2023Amanda King
Amanda King of FLOQ's deck for the Sydney SEO conference run by Prosperity Media in April of 2023 on content, entity SEO and Google's history (or lack thereof) with keywords. We also go through natural language processing, what it is and how quickly Google goes from queries to entities based on their patent application history. And of course, no good conference session would go without actionable suggestions, which you can find at the end of the deck.
For another angle on content and strategy and how to approach them, read more at https://floq.co/seo-strategy/tactics-strategy/
This tutorial gives an overview of how search engines and machine learning techniques can be tightly coupled to address the need for building scalable recommender or other prediction based systems. Typically, most of them architect retrieval and prediction in two phases. In Phase I, a search engine returns the top-k results based on constraints expressed as a query. In Phase II, the top-k results are re-ranked in another system according to an optimization function that uses a supervised trained model. However this approach presents several issues, such as the possibility of returning sub-optimal results due to the top-k limits during query, as well as the prescence of some inefficiencies in the system due to the decoupling of retrieval and ranking.
To address this issue the authors created ML-Scoring, an open source framework that tightly integrates machine learning models into Elasticsearch, a popular search engine. ML-Scoring replaces the default information retrieval ranking function with a custom supervised model that is trained through Spark, Weka, or R that is loaded as a plugin in Elasticsearch. This tutorial will not only review basic methods in information retrieval and machine learning, but it will also walk through practical examples from loading a dataset into Elasticsearch to training a model in Spark, Weka, or R, to creating the ML-Scoring plugin for Elasticsearch. No prior experience is required in any system listed (Elasticsearch, Spark, Weka, R), though some programming experience is recommended.
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
Search engines have focused on solving the document retrieval problem, so their scoring functions do not handle naturally non-traditional IR data types, such as numerical or categorical. Therefore, on domains beyond traditional search, scores representing strengths of associations or matches may vary widely. As such, the original model doesn’t suffice, so relevance ranking is performed as a two-phase approach with 1) regular search 2) external model to re-rank the filtered items. Metrics such as click-through and conversion rates are associated with the users’ response to items served. The predicted selection rates that arise in real-time can be critical for optimal matching. For example, in recommender systems, predicted performance of a recommended item in a given context, also called response prediction, is often used in determining a set of recommendations to serve in relation to a given serving opportunity. Similar techniques are used in the advertising domain. To address this issue the authors have created ML-Scoring, an open source framework that tightly integrates machine learning models into a popular search engine (SOLR/Elasticsearch), replacing the default IR-based ranking function. A custom model is trained through either Weka or Spark and it is loaded as a plugin used at query time to compute custom scores.
Mike King examines the state of the SEO industry and talks through knowing information retrieval will help improve our understanding of Google. This talk debuted at MozCon
Staff study talk/ on search engine & internet in 2008Sujit Chandak
It was in the Year 2008, when many in our staff did not in detail know as to how a internet search works, although everybody knew and used internet. A very basic session i took and which was very interesting for the senior faculties...they wanted to know the most...
Vectors in Search - Towards More Semantic MatchingSimon Hughes
With the advent of deep learning and algorithms like word2vec and doc2vec, vectors-based representations are increasingly being used in search to represent anything from documents to images and products. However, search engines work with documents made of tokens, and not vectors, and are typically not designed for fast vector matching out of the box. In this talk, I will give an overview of how vectors can be derived from documents to produce a semantic representation of a document that can be used to implement semantic / conceptual search without hurting performance. I will then I will describe a few different techniques for efficiently searching vector-based representations in an inverted index, such as learning sparse representations of vectors, clustering, and learning binary vectors. Finally, I will discuss some of the pitfalls of vector-based search, and how to get the best of both worlds by combining vector-based scoring with traditional relevancy metrics such as BM25.
With the advent of deep learning and algorithms like word2vec and doc2vec, vectors-based representations are increasingly being used in search to represent anything from documents to images and products. However, search engines work with documents made of tokens, and not vectors, and are typically not designed for fast vector matching out of the box. In this talk, I will give an overview of how vectors can be derived from documents to produce a semantic representation of a document that can be used to implement semantic / conceptual search without hurting performance. I will then describe a few different techniques for efficiently searching vector-based representations in an inverted index, including LSH, vector quantization and k-means tree, and compare their performance in terms of speed and relevancy. Finally, I will describe how each technique can be implemented efficiently in a lucene-based search engine such as Solr or Elastic Search.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
More Related Content
Similar to Semantics and Search by Upasna Gautam at PubCon Austin 2018
State of Search 2017 - Semantics and Science - Upasna GautamUpasna Gautam
What is latent semantic indexing, how does Google use it, and how understanding this core functionality of the Google algorithm will help you create better content.
Conductor C3 2019 - A Sound Advantage: How Voice Search Works & Works For YouConductor
Upasna Gautam, Manager, Search, Ziff Davis
Become fluent in voice search form, function, and success. Learn how Google processes sound and conducts speech modeling; the four voice search quality metrics Google applies; and how to enhance your own strategy with tactics for targeting content by searcher need states.
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...Paul Shapiro
For a detailed recap: http://pshapi.ro/SemanticKWR
My BrightonSEO presentation...
1st Half: What is semantic search and why does it matter to SEOs.
2nd Half: Using KNIME to do semantic keyword research using SERP and Twitter data.
The New Content SEO - Sydney SEO Conference 2023Amanda King
Amanda King of FLOQ's deck for the Sydney SEO conference run by Prosperity Media in April of 2023 on content, entity SEO and Google's history (or lack thereof) with keywords. We also go through natural language processing, what it is and how quickly Google goes from queries to entities based on their patent application history. And of course, no good conference session would go without actionable suggestions, which you can find at the end of the deck.
For another angle on content and strategy and how to approach them, read more at https://floq.co/seo-strategy/tactics-strategy/
This tutorial gives an overview of how search engines and machine learning techniques can be tightly coupled to address the need for building scalable recommender or other prediction based systems. Typically, most of them architect retrieval and prediction in two phases. In Phase I, a search engine returns the top-k results based on constraints expressed as a query. In Phase II, the top-k results are re-ranked in another system according to an optimization function that uses a supervised trained model. However this approach presents several issues, such as the possibility of returning sub-optimal results due to the top-k limits during query, as well as the prescence of some inefficiencies in the system due to the decoupling of retrieval and ranking.
To address this issue the authors created ML-Scoring, an open source framework that tightly integrates machine learning models into Elasticsearch, a popular search engine. ML-Scoring replaces the default information retrieval ranking function with a custom supervised model that is trained through Spark, Weka, or R that is loaded as a plugin in Elasticsearch. This tutorial will not only review basic methods in information retrieval and machine learning, but it will also walk through practical examples from loading a dataset into Elasticsearch to training a model in Spark, Weka, or R, to creating the ML-Scoring plugin for Elasticsearch. No prior experience is required in any system listed (Elasticsearch, Spark, Weka, R), though some programming experience is recommended.
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
Search engines have focused on solving the document retrieval problem, so their scoring functions do not handle naturally non-traditional IR data types, such as numerical or categorical. Therefore, on domains beyond traditional search, scores representing strengths of associations or matches may vary widely. As such, the original model doesn’t suffice, so relevance ranking is performed as a two-phase approach with 1) regular search 2) external model to re-rank the filtered items. Metrics such as click-through and conversion rates are associated with the users’ response to items served. The predicted selection rates that arise in real-time can be critical for optimal matching. For example, in recommender systems, predicted performance of a recommended item in a given context, also called response prediction, is often used in determining a set of recommendations to serve in relation to a given serving opportunity. Similar techniques are used in the advertising domain. To address this issue the authors have created ML-Scoring, an open source framework that tightly integrates machine learning models into a popular search engine (SOLR/Elasticsearch), replacing the default IR-based ranking function. A custom model is trained through either Weka or Spark and it is loaded as a plugin used at query time to compute custom scores.
Mike King examines the state of the SEO industry and talks through knowing information retrieval will help improve our understanding of Google. This talk debuted at MozCon
Staff study talk/ on search engine & internet in 2008Sujit Chandak
It was in the Year 2008, when many in our staff did not in detail know as to how a internet search works, although everybody knew and used internet. A very basic session i took and which was very interesting for the senior faculties...they wanted to know the most...
Vectors in Search - Towards More Semantic MatchingSimon Hughes
With the advent of deep learning and algorithms like word2vec and doc2vec, vectors-based representations are increasingly being used in search to represent anything from documents to images and products. However, search engines work with documents made of tokens, and not vectors, and are typically not designed for fast vector matching out of the box. In this talk, I will give an overview of how vectors can be derived from documents to produce a semantic representation of a document that can be used to implement semantic / conceptual search without hurting performance. I will then I will describe a few different techniques for efficiently searching vector-based representations in an inverted index, such as learning sparse representations of vectors, clustering, and learning binary vectors. Finally, I will discuss some of the pitfalls of vector-based search, and how to get the best of both worlds by combining vector-based scoring with traditional relevancy metrics such as BM25.
With the advent of deep learning and algorithms like word2vec and doc2vec, vectors-based representations are increasingly being used in search to represent anything from documents to images and products. However, search engines work with documents made of tokens, and not vectors, and are typically not designed for fast vector matching out of the box. In this talk, I will give an overview of how vectors can be derived from documents to produce a semantic representation of a document that can be used to implement semantic / conceptual search without hurting performance. I will then describe a few different techniques for efficiently searching vector-based representations in an inverted index, including LSH, vector quantization and k-means tree, and compare their performance in terms of speed and relevancy. Finally, I will describe how each technique can be implemented efficiently in a lucene-based search engine such as Solr or Elastic Search.
Similar to Semantics and Search by Upasna Gautam at PubCon Austin 2018 (20)
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
4. #pubcon
SEO: Then & Now
Back then:
•Keyword-focused:
• Text retrieval system relied on exact match keywords
• Weighted documents by keyword frequency
•Unable to distinguish synonyms and homographs
• Synonym: Words that share the same meaning (e.g. car and
automobile)
• Homograph: More than one meaning depending on context
(e.g. “charge)
5. #pubcon
SEO: Then & Now
Now:
•Driven by intent and context
•Provide relevant answers to
complex and vague queries
11. #pubcon
What is Semantic Search
Semantics:
A branch of linguistics that studies the relationship between words and
sentences and their actual meanings.
Semantic Search:
The improvement of search accuracy by understanding intent and
context, using various on-site elements to crawl, index, and serve
relevant results.
12. #pubcon
What is Semantic Search
•Entity Optimization
•Knowledge Graph
•Structured Data
•Information Architecture
•Co-occurrence and Clustering
13. #pubcon
What is Semantic Search:
Entity Optimization
Paul Haahr – Google Ranking Engineer – SMX 2016
14. #pubcon
What is Semantic Search:
Knowledge Graph
•Understands relationships between things
•Stores and understands the intelligence between
different entities
•Not just a catalog of objects, but a data model for
inter-relationships
15. #pubcon
What is Semantic Search:
Structured Data
•Google is a data-driven machine that needs to be
fed in order for it to learn
•Feed it structured data – it’s a piece of intelligence
the crawler uses to build semantic relevance and
authority
•This is how entities are indexed!
16. #pubcon
What is Semantic Search:
Information Architecture
•Allows for a crawler to clearly understand content and how it’s connected
•Provide a clear and hierarchical path of information
•Lends to a good UX
•The RIGHT approach is the most LOGICAL approach
•Must read: Information Architecture for the World Wide Web [3rd Edition, by
Peter Morville]: https://www.amazon.com/Information-Architecture-World-Wide-
Web/dp/0596527349
17. #pubcon
What is Semantic Search:
Co-Occurrence and Clustering
Word Co-Occurrence Clustering
• Generates topics from words frequently occurring together
Weighted Bigraph Clustering
• Uses URLs from Google search results to induce query similarity and
generate topics
The combination of these two methods demonstrated greater usefulness
and accuracy when compared to Latent Semantic Analysis.
Read the patent here:
https://pdfs.semanticscholar.org/dcf7/05ba07ee1b73fda0c94e9d01b2474173e470.pdf
18. #pubcon
What is Semantic Search:
Co-Occurrence and Clustering
Word Co-Occurrence
• A set of words anchors serve as initial topics, which are then
generalized to other words co-appearing with the same queries.
• Topics are created using hierarchical clustering on query
similarity, which measures to what extent two queries agree on their
intersections with the list of words in each topic.
Bigraph Clustering
• Uses organic results to create a bigraph with a set of queries and a set
of URLs as nodes. Weights of the graph are computed with the
impression and click data.
• Bigraph clustering works very well even if the queries do not share
common words
21. #pubcon
• Learning the mathematical relevance helps to understand search
on a functional level
• LSI uses Singular Value Decomposition which is a linear algebraic
factorization for many of our modern algorithms
• It is not a way to “do SEO”
• LSI KEYWORDS ARE NOT A THING
22. #pubcon
Latent Semantic Indexing
Latent Semantic Indexing (LSI):
•Mathematical algorithm based on Singular Value Decomposition (SVD)
•Text indexing and retrieval method
•How terms and concepts are related
23. #pubcon
Latent Semantic Indexing
•LSI works by projecting a large multi-
dimensional space down into a smaller
number of dimensions
•Semantically similar words get
bunched together
•Boundary blurring allows LSI to go
beyond exact keyword matching
24. #pubcon
Latent Semantic Indexing
•LSI uses Singular Value Decomposition (SVD) to decompose this matrix
•Preserves information about relative distances between document vectors
•Collapsed into smaller dimensions
•Information is lost and words are superimposed on one another
25. #pubcon
Latent Semantic Indexing
•Noise reduction
•Reveal similarities that were latent
•Similar terms become more similar, while dissimilar things remain distinct
This method is a widely used technique to unveil latent themes in text
data, as these models learn the hidden topics by understanding
document level word co-occurrence patterns.
26. #pubcon
Latent Semantic Indexing
Short texts, such as search queries, tweets or instant messages suffer from
data sparsity, which causes problems for traditional topic modeling
techniques. Unlike proper documents, short text snippets do not provide
enough word counts for models to learn how words are related and to
disambiguate multiple meanings of a single word.
*This is why the binary co-occurrence/clustering model works better*
28. #pubcon
Key Takeaways
•Craft and optimize content for topics and concepts, not just
keywords
•Use structured data to feed crawler the semantic intelligence it
needs to understand your site better
•Align the information architecture of your website to the
consumer journey
•Navigation, sitemaps, page structure, content organization
•Stop saying/using “LSI keywords”
•The best approach is the most logical approach!