Search + Big Data: It's (still) All About the User- Grant Ingersoll

•

1 like•713 views

See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011 Apache Hadoop has rapidly become the primary framework of choice for enterprises that need to store, process and manage large data sets. It helps companies to derive more value from existing data as well as collect new data, including unstructured data from server logs, social media channels, call center systems and other data sets that present new opportunities for analysis. This keynote will provide insight into how Apache Hadoop is being leveraged today and how it evolving to become a key component of tomorrow's enterprise data architecture. This presentation will also provide a view into the important intersection between Apache Hadoop and search.

Technology

Search + Big Data:
It’s (still) All About the User
Grant Ingersoll, Chief Scientist – Lucid Imagination
grant@lucidimagination.com
October 19, 2011

Promise and Reality

“Data is increasingly digital air: the oxygen we
breathe and the carbon dioxide that we exhale. It
can be a source of both sustenance and pollution.”
Six Provocations for Big Data
by Danah Boyd and Kate Crawford

“The truth is, I spend most of my time trying to
reduce the size of my data so it can be analyzed.”
Hilary Mason, Chief Scientist, Bitly @ Strata

Evolution
Documents
• Models
• Feature Selection

User
Interaction
Content
• Clicks
Relationships • Ratings/
• Page Rank, etc. Reviews
• Organization • Learning to
Rank
• Social Graph

Queries
• Phrases
• NLP

Minding the Intersection

Search

Analytics Discovery

Benefits
§  End users
•  Better relevance/conversion
•  Serendipity
•  Better/faster insight

§  Business:
•  ROI
•  Awareness across organization
•  Enablement
•  Agility

Needs
§  Fast, efficient, scalable search
§  Large scale, cost effective storage
§  Processing Power:
•  Large scale distributed for whole data consumption
•  Streaming/In Memory for real time needs
•  Ability to learn

§  Willingness to ask questions

Search
§  Good scalable, search a given
•  Talks: Chitouras, Sturlese, Binns, Miller

§  Custom Relevancy via function queries, boosts
§  Explore other relevance models
•  Talks: Muir, Pugh
•  Lucene/Solr trunk has pluggable scoring (BM25, etc.)

§  NRT for timeliness
•  Talks: Busch

Discovery
Facets
•  Talks: Yonik
•  Classification, Taxonomy
Clustering
•  Talk: Frank S.
Suggestions
•  Auto-suggest, Spelling,
More Like This,
Related Searches, search trails
Visualization

Analytics for End Users
Offline Online
•  Popularity/Click •  Trends/Stats
•  Link Analysis
•  Search Trails •  Social/Personal
•  Recommendations
•  Spellchecking weights •  Location
•  Collocations

STORM

Analytics for Internal Users
Offline Online
•  Top X •  Trends
•  Zero results
•  MRR, MAP •  Operational alerts
•  User segmentation (QPS,
•  Location, conversions DPS, etc)
•  Ad hoc Analysis

GIRAPH

What’s Missing?
§  The glue is up to you (us?)
•  Lucene Index -> Pig/Others
•  Mahout -> Pig/Others
•  Mahout -> Lucene/Solr
•  Logs -> Pig/Others

§  Nice to have:
•  More in-index functionality (that performs)
§  Aggregations
§  Arbitrary stats
§  Complex Joins

What’s Next?

“I can have all the data I want to have – but I still
have to communicate it to our players. It has to
get into their minds. And they have to utilize it. ”
Brad Stevens, Head Basketball Coach,
Butler University in Oct. ‘11 McKinsey Quarterly

Thanks!

§  http://www.lucidimagination.com

§  @gsingers

§  grant@lucidimagination.com

§  stump@lucene-eurocon.com

Machine learning is at the core of Pinterest. Pinterest personalizes and ranks 1B+ pins, 700+ million boards for 100M+ users all over the world, using data gathered from collaborative filtering, user curation, web crawling, and more. At Pinterest we model relationships between pins, handle cold-start problems and deal with real-time recommendations. In this presentation Jure gave an overview of the problems and effective solutions developed at Pinterest. He focused on systems and effective engineering choices made to enable productive machine learning development and enable multiple engineers effectively develop, test, and deploy machine-learned models.

Чираг Шах «Коллективный поиск, взаимодействие пользователей: подходы к изучен...

Yandex

Building Effective Frameworks for Social Media AnalysisOpen Analytics

Introduction to Information Architecture & Design - 3/19/16

Robert Stribley

Introduction to Information Retrieval

Roi Blanco

Introduction to Information Architecture & Design - 6/25/16

Robert Stribley

Warming Up to Analytics

Lewandog, Inc,

Introduction to Information Architecture & Design - 2/13/16

Robert Stribley

Introduction to Information Architecture & Design - 6/24/17

Robert Stribley

Share point 2013 the way to go...

K.Mohamed Faizal

Take this opportunity to learn more about SP 2013 and find out about the plans other organizations have for SP 2013. Some of the common concerns now include: Should I wait for SP 2013 or move on with SP 2010? How do I justify for SP 2013's investment? With great improvements in features and usability, the SP product team now says that the web/intranet team can focus more on engaging with users needs rather than vendors' implementation. So what role do we play in SP 2013, and what role do vendors play? What does it mean for migration from earlier versions of SharePoint?

Alla ricerca della User Story perdutaEdoardo Schepis

Alla ricerca della user story perduta

Better Software

ASA conference Feb 2013

mrkwr

IA breakfast briefing apr12 uploadRoss Philip

UXD v. Analytics - eMetrics 2013 San Francisco

Chris Farnum

***This version of the presentation is for the eMetrics Summit in SF, April 17 2013. 11:15am in Room: Salon 7*** Chris and Farris expose the differences between how user experience designers and analytics practitioners think. While UXD weave best practices and user research into their designs, digital analysts spend their time confirming or refuting hypotheses in a data driven way. One approach is decidedly qualitative, the other decidedly quantitative. Learn how it is possible to leverage both enlightened design and deep data to continuously optimize user experiences. If you work on either side of this debate, this is how to better state your case… and get along with the other side.

Evaluating search engines

Phil Bradley

Text Classification Powered by Apache Mahout and Lucene

lucenerevolution

Presented by Isabel Drost-Fromm, Software Developer, Apache Software Foundation/Nokia Gate 5 GmbH at Lucene/Solr Revolution 2013 Dublin Text classification automates the task of filing documents into pre-defined categories based on a set of example documents. The first step in automating classification is to transform the documents to feature vectors. Though this step is highly domain specific Apache Mahout provides you with a lot of easy to use tooling to help you get started, most of which relies heavily on Apache Lucene for analysis, tokenisation and filtering. This session shows how to use facetting to quickly get an understanding of the fields in your document. It will walk you through the steps necessary to convert your text documents into feature vectors that Mahout classifiers can use including a few anecdotes on drafting domain specific features. Configure

State of the Art Logging. Kibana4Solr is Here!

lucenerevolution

Presented by Markus Klose, Search + Big Data Consultant SHI Elektronische Medien GmbH at Lucene/Solr Revolution 2013 Dublin Kibana4Solr is search-driven, scalable, browser based and extremely user friendly (also for non-technical users). Logs are everywhere. Any device, system or human can potentially produce a huge amount of information saved in logs. The amount of available logs and their semi-structured nature make a meaningful processing in real-time quite a difficult task. Thus, valuable business insights stored in logs might be not found. Kibana4Solr is a search-driven approach to handle that challenge. It offers user-friendly and browser-based dashboard which can be easily customized to particular needs. In the session the Kibana4Solr will be introduced. Some light will be shed on the architectural features of Kibana4Solr. Some ideas will be given in terms of possible business uses cases. And finally a live demo of Kibana4Solr will be shown. Configure

Similar to Search + Big Data: It's (still) All About the User- Grant Ingersoll

Duncan product tankMind the Product

IxDA UX Research Mentoring Circle - 4. Analysing Data and Presenting Findings

Jieyun Yang

Introduction to Information Architecture & Design - SVA Workshop 10/04/14

Robert Stribley

Introduction to Information Architecture & Design - 12/06/14

Robert Stribley

The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec

The Hive

Чираг Шах «Коллективный поиск, взаимодействие пользователей: подходы к изучен...

Yandex

Building Effective Frameworks for Social Media AnalysisOpen Analytics

Introduction to Information Architecture & Design - 3/19/16

Robert Stribley

Introduction to Information Retrieval

Roi Blanco

Introduction to Information Architecture & Design - 6/25/16

Robert Stribley

Warming Up to Analytics

Lewandog, Inc,

Introduction to Information Architecture & Design - 2/13/16

Robert Stribley

Introduction to Information Architecture & Design - 6/24/17

Robert Stribley

Share point 2013 the way to go...

K.Mohamed Faizal

Alla ricerca della User Story perdutaEdoardo Schepis

Alla ricerca della user story perduta

Better Software

ASA conference Feb 2013

mrkwr

IA breakfast briefing apr12 uploadRoss Philip

UXD v. Analytics - eMetrics 2013 San Francisco

Chris Farnum

Evaluating search engines

Phil Bradley

Similar to Search + Big Data: It's (still) All About the User- Grant Ingersoll (20)

Duncan product tank

IxDA UX Research Mentoring Circle - 4. Analysing Data and Presenting Findings

Introduction to Information Architecture & Design - SVA Workshop 10/04/14

Introduction to Information Architecture & Design - 12/06/14

The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec

Чираг Шах «Коллективный поиск, взаимодействие пользователей: подходы к изучен...

Building Effective Frameworks for Social Media Analysis

Introduction to Information Architecture & Design - 3/19/16

Introduction to Information Retrieval

Introduction to Information Architecture & Design - 6/25/16

Warming Up to Analytics

Introduction to Information Architecture & Design - 2/13/16

Introduction to Information Architecture & Design - 6/24/17

Share point 2013 the way to go...

Alla ricerca della User Story perduta

Alla ricerca della user story perduta

ASA conference Feb 2013

IA breakfast briefing apr12 upload

UXD v. Analytics - eMetrics 2013 San Francisco

Evaluating search engines

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene

lucenerevolution

State of the Art Logging. Kibana4Solr is Here!

lucenerevolution

Search at Twitterlucenerevolution

Building Client-side Search Applications with Solr

lucenerevolution

Presented by Daniel Beach, Search Application Developer, OpenSource Connections Solr is a powerful search engine, but creating a custom user interface can be daunting. In this fast paced session I will present an overview of how to implement a client-side search application using Solr. Using open-source frameworks like SpyGlass (to be released in September) can be a powerful way to jumpstart your development by giving you out-of-the box results views with support for faceting, autocomplete, and detail views. During this talk I will also demonstrate how we have built and deployed lightweight applications that are able to be performant under large user loads, with minimal server resources.

Integrate Solr with real-time stream processing applications

lucenerevolution

Presented by Timothy Potter, Founder, Text Centrix Storm is a real-time distributed computation system used to process massive streams of data. Many organizations are turning to technologies like Storm to complement batch-oriented big data technologies, such as Hadoop, to deliver time-sensitive analytics at scale. This talk introduces on an emerging architectural pattern of integrating Solr and Storm to process big data in real time. There are a number of natural integration points between Solr and Storm, such as populating a Solr index or supplying data to Storm using Solr’s real-time get support. In this session, Timothy will cover the basic concepts of Storm, such as spouts and bolts. He’ll then provide examples of how to integrate Solr into Storm to perform large-scale indexing in near real-time. In addition, we'll see how to embed Solr in a Storm bolt to match incoming tuples against pre-configured queries, commonly known as percolator. Attendees will come away from this presentation with a good introduction to stream processing technologies and several real-world use cases of how to integrate Solr with Storm.

Scaling Solr with SolrCloud

lucenerevolution

Configure your Solr cluster to handle hundreds of millions of documents without even noticing, handle queries in milliseconds, use Near Real Time indexing and searching with document versioning. Scale your cluster both horizontally and vertically by using shards and replicas. In this session you'll learn how to make your indexing process blazing fast and make your queries efficient even with large amounts of data in your collections. You'll also see how to optimize your queries to leverage caches as much as your deployment allows and how to observe your cluster with Solr administration panel, JMX, and third party tools. Finally, learn how to make changes to already deployed collections —split their shards and alter their schema by using Solr API.

Administering and Monitoring SolrCloud Clusters

lucenerevolution

Presented by Rafal Kuć, Consultant and Software engineer, , Sematext Group, Inc. Even though Solr can run without causing any troubles for long periods of time it is very important to monitor and understand what is happening in your cluster. In this session you will learn how to use various tools to monitor how Solr is behaving at a high level, but also on Lucene, JVM, and operating system level. You'll see how to react to what you see and how to make changes to configuration, index structure and shards layout using Solr API. We will also discuss different performance metrics to which you ought to pay extra attention. Finally, you'll learn what to do when things go awry - we will share a few examples of troubleshooting and then dissect what was wrong and what had to be done to make things work again.

Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled

lucenerevolution

In a recent project with the United States Patent and Trademark Office, Opensource Connections was asked to prototype the next generation of patent search - using Solr and Lucene. An important aspect of this project was the implementation of BRS, a specialized search syntax used by patent examiners during the examination process. In this fast paced session we will relate our experiences and describe how we used a combination of Parboiled (a Parser Expression Grammar [PEG] parser), Lucene Queries and SpanQueries, and an extension of Solr's QParserPlugin to build BRS search functionality in Solr. First we will characterize the patent search problem and then define the BRS syntax itself. We will then introduce the Parboiled parser and discuss various considerations that one must make when designing a syntax parser. Following this we will describe the methodology used to implement the search functionality in Lucene/Solr. Finally, we will include an overview our syntactic and semantic testing strategies. The audience will leave this session with an understanding of how Solr, Lucene, and Parboiled may be used to implement their own custom search parser.

Using Solr to Search and Analyze Logs

lucenerevolution

Many of us tend to hate or simply ignore logs, and rightfully so: they’re typically hard to find, difficult to handle, and are cryptic to the human eye. But can we make logs more valuable and more usable if we index them in Solr, so we can search and run real-time statistics on them? Indeed we can, and in this session you’ll learn how to make that happen. In the first part of the session we’ll explain why centralized logging is important, what valuable information one can extract from logs, and we’ll introduce the leading tools from the logging ecosystems everyone should be aware of - from syslog and log4j to LogStash and Flume. In the second part we’ll teach you how to use these tools in tandem with Solr. We’ll show how to use Solr in a SolrCloud setup to index large volumes of logs continuously and efficiently. Then, we'll look at how to scale the Solr cluster as your data volume grows. Finally, we'll see how you can parse your unstructured logs and convert them to nicely structured Solr documents suitable for analytical queries.

Enhancing relevancy through personalization & semantic searchlucenerevolution

Real-time Inverted Search in the Cloud Using Lucene and Storm

lucenerevolution

Building real-time notification systems is often limited to basic filtering and pattern matching against incoming records. Allowing users to query incoming documents using Solr's full range of capabilities is much more powerful. In our environment we needed a way to allow for tens of thousands of such query subscriptions, meaning we needed to find a way to distribute the query processing in the cloud. By creating in-memory Lucene indices from our Solr configuration, we were able to parallelize our queries across our cluster. To achieve this distribution, we wrapped the processing in a Storm topology to provide a flexible way to scale and manage our infrastructure. This presentation will describe our experiences creating this distributed, real-time inverted search notification framework.

Solr's Admin UI - Where does the data come from?

lucenerevolution

Like many Web-Applications in the past, the Solr Admin UI up until 4.0 was entirely server based. It used separate code on the server to generate their Dashboards, Overviews and Statistics. All that code had to be maintained and still ... you weren't really able to use that kind of data for the things you needed it for. It was wrapped into HTML, most of the time difficult to extract and changed the structure from time to time w/o announcement. After a short look back, we're going to look into the current state of the Solr Admin UI - a client-side application, running completely in your browser. We'll see how it works, where it gets its data from and how you can get the very same data and wire that into your own custom applications, dashboards and/oder monitoring systems.

Schemaless Solr and the Solr Schema REST API

lucenerevolution

High Performance JSON Search and Relational Faceted Browsing with Lucene

lucenerevolution

Presented by Renaud Delbru, Co-Founder, SindiceTech In this presentation, we will discuss how Lucene and Solr can be used for very efficient search of tree-shaped schemaless document, e.g. JSON or XML, and can be then made to address both graph and relational data search. We will discuss the capabilities of SIREn, a Lucene/Solr plugin we have developed to deal with huge collections of tree-shaped schemaless documents, and how SIREn is built using Lucene extensibility capabilities (Analysis, Codec, Flexible Query Parser). We will compare it with Lucene's BlockJoin Query API in nested schemaless data intensive scenarios. We will then go through use cases that show how relational or graph data can be turned into JSON documents using Hadoop and Pig, and how this can be used in conjunction with SIREn to create relational faceting systems with unprecedented performance. Take-away lessons from this session will be awareness about using Lucene/Solr and Hadoop for relational and graph data search, as well as the awareness that it is now possible to have relational faceted browsers with sub-second response time on commodity hardware.

Text Classification with Lucene/Solr, Apache Hadoop and LibSVM

lucenerevolution

In this session we will show how to build a text classifier using the Apache Lucene/Solr with libSVM libraries. We classify our corpus of job offers into a number of predefined categories. Each indexed document (a job offer) then belongs to zero, one or more categories. Known machine learning techniques for text classification include naïve bayes model, logistic regression, neural network, support vector machine (SVM), etc. We use Lucene/Solr to construct the features vector. Then we use the libsvm library known as the reference implementation of the SVM model to classify the document. We construct as many one-vs-all svm classifiers as there are classes in our setting, then using the Hadoop MapReduce Framework we reconcile the result of our classifiers. The end result is a scalable multi-class classifier. Finally we outline how the classifier is used to enrich basic solr keyword search.

Faceted Search with Lucene

lucenerevolution

Faceted search is a powerful technique to let users easily navigate the search results. It can also be used to develop rich user interfaces, which give an analyst quick insights about the documents space. In this session I will introduce the Facets module, how to use it, under-the-hood details as well as optimizations and best practices. I will also describe advanced faceted search capabilities with Lucene Facets.

Recent Additions to Lucene Arsenal

lucenerevolution

Presented by Shai Erera, Researcher, IBM Lucene's arsenal has recently expanded to include two new modules: Index Sorting and Replication. Index sorting lets you keep an index consistently sorted based on some criteria (e.g. modification date). This allows for efficient search early-termination as well as achieve better index compression. Index replication lets you replicate a search index to achieve high-availability, fault tolerance as well as take hot index backups. In this talk we will introduce these modules, discuss implementation and design details as well as best practices.

Turning search upside down

lucenerevolution

As part of their work with large media monitoring companies, Flax has developed a technique for applying tens of thousands of stored Lucene queries to a document in under a second. We'll talk about how we built intelligent filters to reduce the number of actual queries applied and how we extended Lucene to extract the exact hit positions of matches, the challenges of implementation, and how it can be used, including applications that monitor hundreds of thousands of news stories every day.

Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...

lucenerevolution

Presented by Xavier Sanchez Loro, Ph.D, Trovit Search SL This session aims to explain the implementation and use case for spellchecking in Trovit search engine. Trovit is a classified ads search engine supporting several different sites, one for each on country and vertical. Our search engine supports multiple indexes in multiple languages, each with several millions of indexed ads. Those indexes are segmented in several different sites depending on the type of ads (homes, cars, rentals, products, jobs and deals). We have developed a multi-language spellchecking system using solr and lucene in order to help our users to better find the desired ads and avoid the dreaded 0 results as much as possible. As such our goal is not pure orthographic correction, but also suggestion of correct searches for a certain site.

Shrinking the haystack wes caldwell - finallucenerevolution

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene

State of the Art Logging. Kibana4Solr is Here!

Search at Twitter

Building Client-side Search Applications with Solr

Integrate Solr with real-time stream processing applications

Scaling Solr with SolrCloud

Administering and Monitoring SolrCloud Clusters

Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled

Using Solr to Search and Analyze Logs

Enhancing relevancy through personalization & semantic search

Real-time Inverted Search in the Cloud Using Lucene and Storm

Solr's Admin UI - Where does the data come from?

Schemaless Solr and the Solr Schema REST API

High Performance JSON Search and Relational Faceted Browsing with Lucene

Text Classification with Lucene/Solr, Apache Hadoop and LibSVM

Faceted Search with Lucene

Recent Additions to Lucene Arsenal

Turning search upside down

Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...

Shrinking the haystack wes caldwell - final

Recently uploaded

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance

Key Trends Shaping the Future of Infrastructure.pdf

Cheryl Hung

How world-class product teams are winning in the AI era by CEO and Founder, P...

Product School

Knowledge engineering: from people to machines and back

Elena Simperl

The Art of the Pitch: WordPress Relationships and Sales

Laura Byrne

Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes? All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.

FIDO Alliance Osaka Seminar: Overview.pdf

FIDO Alliance

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

91mobiles

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Sri Ambati

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

Product School

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Product School

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Ramesh Iyer

In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

Inflectra

In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring. Learn about: • The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks. • Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective. • Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification. • Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process. Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.

When stars align: studies in data quality, knowledge graphs, and machine lear...

Elena Simperl

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

BookNet Canada

The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more. Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/ Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

FIDO Alliance

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

Product School

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...

James Anderson

Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management. The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM). Speakers: Bob Boule Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle. Gopinath Rebala Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...

Product School

Leading Change strategies and insights for effective change management pdf 1.pdf

OnBoard

Recently uploaded (20)

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

Key Trends Shaping the Future of Infrastructure.pdf

How world-class product teams are winning in the AI era by CEO and Founder, P...

Knowledge engineering: from people to machines and back

The Art of the Pitch: WordPress Relationships and Sales

FIDO Alliance Osaka Seminar: Overview.pdf

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

When stars align: studies in data quality, knowledge graphs, and machine lear...

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

GraphRAG is All You need? LLM & Knowledge Graph

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...

Leading Change strategies and insights for effective change management pdf 1.pdf

Search + Big Data: It's (still) All About the User- Grant Ingersoll

1. Search + Big Data: It’s (still) All About the User Grant Ingersoll, Chief Scientist – Lucid Imagination grant@lucidimagination.com October 19, 2011

2. Promise and Reality “Data is increasingly digital air: the oxygen we breathe and the carbon dioxide that we exhale. It can be a source of both sustenance and pollution.” Six Provocations for Big Data by Danah Boyd and Kate Crawford “The truth is, I spend most of my time trying to reduce the size of my data so it can be analyzed.” Hilary Mason, Chief Scientist, Bitly @ Strata

3. Pragmatism

4. Evolution Documents • Models • Feature Selection User Interaction Content • Clicks Relationships • Ratings/ • Page Rank, etc. Reviews • Organization • Learning to Rank • Social Graph Queries • Phrases • NLP

5. Minding the Intersection Search Analytics Discovery

6. Benefits §  End users •  Better relevance/conversion •  Serendipity •  Better/faster insight §  Business: •  ROI •  Awareness across organization •  Enablement •  Agility

7. Needs §  Fast, efficient, scalable search §  Large scale, cost effective storage §  Processing Power: •  Large scale distributed for whole data consumption •  Streaming/In Memory for real time needs •  Ability to learn §  Willingness to ask questions

8. The Good News

9. Search §  Good scalable, search a given •  Talks: Chitouras, Sturlese, Binns, Miller §  Custom Relevancy via function queries, boosts §  Explore other relevance models •  Talks: Muir, Pugh •  Lucene/Solr trunk has pluggable scoring (BM25, etc.) §  NRT for timeliness •  Talks: Busch

10. Discovery Facets •  Talks: Yonik •  Classification, Taxonomy Clustering •  Talk: Frank S. Suggestions •  Auto-suggest, Spelling, More Like This, Related Searches, search trails Visualization

11. Analytics

12. Analytics for End Users Offline Online •  Popularity/Click •  Trends/Stats •  Link Analysis •  Search Trails •  Social/Personal •  Recommendations •  Spellchecking weights •  Location •  Collocations STORM

13. Analytics for Internal Users Offline Online •  Top X •  Trends •  Zero results •  MRR, MAP •  Operational alerts •  User segmentation (QPS, •  Location, conversions DPS, etc) •  Ad hoc Analysis GIRAPH

14. What’s Missing? §  The glue is up to you (us?) •  Lucene Index -> Pig/Others •  Mahout -> Pig/Others •  Mahout -> Lucene/Solr •  Logs -> Pig/Others §  Nice to have: •  More in-index functionality (that performs) §  Aggregations §  Arbitrary stats §  Complex Joins

15. What’s Next? “I can have all the data I want to have – but I still have to communicate it to our players. It has to get into their minds. And they have to utilize it. ” Brad Stevens, Head Basketball Coach, Butler University in Oct. ‘11 McKinsey Quarterly

16. Thanks! §  http://www.lucidimagination.com §  @gsingers §  grant@lucidimagination.com §  stump@lucene-eurocon.com

17. Lucene Ecosystem Spark Storm Giraph

18. Lucene Ecosystem Spark Storm Giraph

Search + Big Data: It's (still) All About the User- Grant Ingersoll

Recommended

Recommended

More Related Content

Similar to Search + Big Data: It's (still) All About the User- Grant Ingersoll

Similar to Search + Big Data: It's (still) All About the User- Grant Ingersoll (20)

More from lucenerevolution

More from lucenerevolution (20)

Recently uploaded

Recently uploaded (20)

Search + Big Data: It's (still) All About the User- Grant Ingersoll